String Function in R

>s <- "EndMemo.com R Language Tutorial"
>substr(s,0,7)
[1] "EndMemo"

Get string length:
>nchar(s)
[1] 31

To uppercase:
>x <- toupper(s)
>x
[1] "ENDMEMO.COM R LANGUAGE TUTORIAL"

To lowercase:
>x <- tolower(s)
>x
[1] "endmemo.com r language tutorial"

Split the string at letter "o":
>x <- strsplit(s,"o")
[[1]]
[1] "EndMem"           ".c"               "m R Language Tut" "rial"

Concatenate two strings:
>x <- paste(x," -- String Functions",sep="")
>x
[1] "endmemo.com r language tutorial -- String Functions"

Substring replacement:
>x <- sub("Tutorial","Examples",s)
>x
[1] "EndMemo.com R Language Examples"

Use regular expression:
>x <- sub("n.+e","XXX",s)
>x
[1] "EXXX Tutorial
 
#############
Help of String in R
 

Split the Elements of a Character Vector

Description

Split the elements of a character vector x into substrings according to the matches to substring split within them.

Usage

strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)

Arguments

x character vector, each element of which is to be split. Other inputs, including a factor, will give an error.
split character vector (or object which can be coerced to such) containing regular expression(s) (unless fixed = TRUE) to use for splitting. If empty matches occur, in particular if split has length 0, x is split into single characters. If split has length greater than 1, it is re-cycled along x.
fixed logical. If TRUE match split exactly, otherwise use regular expressions. Has priority over perl.
perl logical. Should perl-compatible regexps be used?
useBytes logical. If TRUE the matching is done byte-by-byte rather than character-by-character, and inputs with marked encodings are not converted. This is forced (with a warning) if any input is found which is marked as "bytes".

Details

Argument split will be coerced to character, so you will see uses with split = NULL to mean split = character(0), including in the examples below.
Note that splitting into single characters can be done via split = character(0) or split = ""; the two are equivalent. The definition of ‘character’ here depends on the locale: in a single-byte locale it is a byte, and in a multi-byte locale it is the unit represented by a ‘wide character’ (almost always a Unicode point).
A missing value of split does not split the corresponding element(s) of x at all.
The algorithm applied to each input string is
repeat { if the string is empty break. if there is a match add the string to the left of the match to the output. remove the match and all to the left of it. else add the string to the output. break. } Note that this means that if there is a match at the beginning of a (non-empty) string, the first element of the output is "", but if there is a match at the end of the string, the output is the same as with the match removed.

Value

A list of the same length as x, the i-th element of which contains the vector of splits of x[i].
If any element of x or split is declared to be in UTF-8 (see Encoding), all non-ASCII character strings in the result will be in UTF-8 and have their encoding declared as UTF-8. As from R 2.10.0, for perl = TRUE, useBytes = FALSE all non-ASCII strings in a multibyte locale are translated to UTF-8.

Note

Prior to R 2.11.0 there was an argument extended which could be used to select ‘basic’ regular expressions: this was often used when fixed = TRUE would be preferable. In the actual implementation (as distinct from the POSIX standard) the only difference was that ?, +, {, |, (, and ) were not interpreted as metacharacters.

See Also

paste for the reverse, grep and sub for string search and manipulation; also nchar, substr.
regular expression’ for the details of the pattern specification.

Examples

noquote(strsplit("A text I want to display with spaces", NULL)[[1]]) x <- c(as = "asfef", qu = "qwerty", "yuiop[", "b", "stuff.blah.yech") # split x on the letter e strsplit(x, "e") unlist(strsplit("a.b.c", ".")) ## [1] "" "" "" "" "" ## Note that 'split' is a regexp! ## If you really want to split on '.', use unlist(strsplit("a.b.c", "[.]")) ## [1] "a" "b" "c" ## or unlist(strsplit("a.b.c", ".", fixed = TRUE)) ## a useful function: rev() for strings strReverse <- function(x) sapply(lapply(strsplit(x, NULL), rev), paste, collapse = "") strReverse(c("abc", "Statistics")) ## get the first names of the members of R-core a <- readLines(file.path(R.home("doc"),"AUTHORS"))[-(1:8)] a <- a[(0:2)-length(a)] (a <- sub(" .*","", a)) # and reverse them strReverse(a)

No comments:

Post a Comment