RHadoop : Reading CSV using rhdfs
rhdfs uses rJava and the buffersize is limited by the heapsize. By default the size of the buffer is set to 5Mb in rhdfs. The source code for rhdfs can be found here.
HADOOP_CMD environment should point to the hadoop.
Sys.setenv(HADOOP_CMD="/bin/hadoop") library(rhdfs) hdfs.init() f = hdfs.file("fulldata.csv","r",buffersize=104857600) m = hdfs.read(f) c = rawToChar(m) data = read.table(textConnection(c), sep = ",") ## Alternatively You can use hdfs.line.reader() reader = hdfs.line.reader("fulldata.csv") x = reader$read() typeof(x) ## [1] "character"
Could you please give me some hint? Following is my code snippet:
==========================================================
library(rmr2);
library(rhdfs);
library(lubridate);
hdfs.init();
f = hdfs.file("/bigdata/rawdata/201312.csv","r",buffersize=104857600);
m = hdfs.read(f);
c = rawToChar(m);
data = read.table(textConnection(c), sep = ",");
==========================================================
thanks in advance.