>s <- "EndMemo.com R Language Tutorial" >substr(s,0,7)
[1] "EndMemo"
Get string length:
>nchar(s)
[1] 31
To uppercase:
>x <- toupper(s) >x
[1] "ENDMEMO.COM R LANGUAGE TUTORIAL"
To lowercase:
>x <- tolower(s) >x
[1] "endmemo.com r language tutorial"
Split the string at letter "o":
>x <- strsplit(s,"o")
[[1]] [1] "EndMem" ".c" "m R Language Tut" "rial"
Concatenate two strings:
>x <- paste(x," -- String Functions",sep="") >x
[1] "endmemo.com r language tutorial -- String Functions"
Substring replacement:
>x <- sub("Tutorial","Examples",s) >x
[1] "EndMemo.com R Language Examples"
Use regular expression:
>x <- sub("n.+e","XXX",s) >x
[1] "EXXX Tutorial
#############
Help of String in R
Split the Elements of a Character Vector
Description
Split the elements of a character vectorx
into substrings
according to the matches to substring split
within them.
Usage
strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)Arguments
x |
character vector, each element of which is to be split. Other
inputs, including a factor, will give an error.
|
split |
character vector (or object which can be coerced to such)
containing regular expression(s) (unless fixed = TRUE )
to use for splitting. If empty matches occur, in particular if
split has length 0, x is split into single characters.
If split has length greater than 1, it is re-cycled along
x .
|
fixed |
logical. If TRUE match split exactly, otherwise
use regular expressions. Has priority over perl .
|
perl |
logical. Should perl-compatible regexps be used? |
useBytes |
logical. If TRUE the matching is done
byte-by-byte rather than character-by-character, and inputs with
marked encodings are not converted. This is forced (with a warning)
if any input is found which is marked as "bytes" . |
Details
Argumentsplit
will be coerced to character, so
you will see uses with split = NULL
to mean
split = character(0)
, including in the examples below.
Note that splitting into single characters can be done via
split = character(0)
or split = ""
; the two are
equivalent. The definition of ‘character’ here depends on the
locale: in a single-byte locale it is a byte, and in a multi-byte
locale it is the unit represented by a ‘wide character’ (almost
always a Unicode point).
A missing value of
split
does not split the corresponding
element(s) of x
at all.
The algorithm applied to each input string is
repeat { if the string is empty break. if there is a match add the string to the left of the match to the output. remove the match and all to the left of it. else add the string to the output. break. } Note that this means that if there is a match at the beginning of a (non-empty) string, the first element of the output is
""
, but
if there is a match at the end of the string, the output is the same
as with the match removed.
Value
A list of the same length asx
, the i
-th element of which
contains the vector of splits of x[i]
.
If any element of
x
or split
is declared to be in UTF-8
(see Encoding
), all non-ASCII character strings in the
result will be in UTF-8 and have their encoding declared as UTF-8. As
from R 2.10.0, for perl = TRUE, useBytes = FALSE
all non-ASCII
strings in a multibyte locale are translated to UTF-8.
Note
Prior to R 2.11.0 there was an argumentextended
which could
be used to select ‘basic’ regular expressions: this was often
used when fixed = TRUE
would be preferable. In the actual
implementation (as distinct from the POSIX standard) the only
difference was that ?, +, {, |, (,
and ) were not interpreted as metacharacters.
See Also
paste
for the reverse,
grep
and sub
for string search and
manipulation; also nchar
, substr
.
‘regular expression’ for the details of the pattern specification.
No comments:
Post a Comment