This is error because the dataset which you are considering consist of na value. i.e null value
while performing matrix multipication and other any operation with null value create error.
To overcome this problem you need to select one of the follow method:
i) Either use rm=na while taking input. i.e mean remove null values.
ii) Take the dataset which doesn't consist of null value. i.e clean dataset.
No need to go through the document below: It is the reference i followed and search and
paste for the future reference at that time.
I saw high view in this page so i share my experience with you ppl.
I am trying the below R script to built logistic regression model using RHadoop (rmr2, rhdfs packages) on an HDFS data file located at "hdfs://:/somnath/merged_train/part-m-00000" and then testing the model using a test HDFS data file at "hdfs://:/somnath/merged_test/part-m-00000".
We are using CDH4 distribution with Yarn/MR2 running parallel to MR1 supported by Hadoop-0.20. And using the hadoop-0.20 mapreduce and hdfs versions to run the below RHadoop script as Sys.setenv commands shown below.
However, whenever I am running the script, I am facing the below error with very little luck to bypass it. I would appreciate if somebody point me to the possible cause of this error which seems to be due to wrong way of lapply call in R without handling NA arguments.
Below is my R-script :
NOTE: I have set following environment variables for HADOOP as follows in root ~/.bash_profile
SAMPLE TRAIN DATASET
SAMPLE TEST DATASET