Rscript - hadoop streaming failed with error code 1

Hi,
I have a 4 haddop (v1.2.1) cluster on EC2, R 3.1.2 and Rstudio running. I have installed all the packages from rhadoop as per many examples over the net.
I can run hadoop and mapreduce jobs through linux for example:
hadoop jar hadoop-examples-1.2.1.jar pi 10 100000
sucessufully runs.
I'm having an issue while running rhadoop, which is not new over the net, however I tried a lot of things and still don't work. To be more specific, this I what i wrote:
#Renviroment.site file has the following enviroment variables:
export HADOOP_PREFIX="/home/ubuntu/hadoop"
export HADOOP_CMD="/home/ubuntu/hadoop/bin/hadoop"
HADOOP_STREAMING="/home/ubuntu/hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar"
In Rstudio (or R):
#Loading all the rhadoop packages - please note that I have installed correctly all dependencies and packages
require(rhdfs)
require(ravro)
require(rmr2)
require(plyrmr)
#what works fine
hdfs.init()
hdfs.ls("/tmp")
bind.cols(mtcars, carb.per.cyl = carb/cyl)
small.ints <- to.dfs(keyval(1, 1:100))
#what doesn't work fine
bind.cols(input("/tmp/mtcars"), carb.per.cyl = carb/cyl)
hadoop streaming failed with error code 5
out <- mapreduce(
input = small.ints,
map = function(k, v) cbind(v, v^2))
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
I investigated this issue quite a lot over the internet and ended up looking at the log files from hadoop job tracker:
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:186)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
... 24 more
Again, as many other over the internet, is seems to be a problem with Rscript.
"Cannot run program "Rscript": error=2, No such file or directory"
I look at other examples where this happened and checked that Rscript was under usr/bin as suggested in some topics but still no luck. I don't have R installed on secondary node or slaves. Would this be the problem?
Could you help suggesting alternatives routes?
Thanks in advance,
Eliano Marques
Antonio Piccolboni
Owner
ElianoMarques
Hi thanks for your repply. I have installed R in all nodes and the error indeed changed.
Currently when I run the example:
out <- mapreduce(
input = small.ints,
map = function(k, v) cbind(v, v^2))
I get the following error:
15/01/02 17:13:57 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201412221800_0007_m_000000
15/01/02 17:13:57 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
Looking at the logs I get this:
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Can you suggest any help?
Thanks in advance,
Eliano
Antonio Piccolboni
Owner
ElianoMarques
Hi thanks for your reply. I believe I have followed the instructions in full.
I read again all the instructions and checked if I have proceed accordingly and couldn't spot any issue.
Your help is appreciated,
Eliano
ElianoMarques
Hi again,
I have explored the error a lit bit further and realize the issue may be related with the fact that R can't load the packages that rmr2 requires.
Error in library(functional) : there is no package called ‘functional’ No traceback available Error during wrapup
As suggested in some other posts, the library with the dependencies packages have to be on a system directory and not on a standard r library. However I have all the packages under /home/rlib (because I have read the post before).
R has the .libPaths()
"/home/rlib" "/usr/lib/R/site-library" "/usr/lib/R/library"
What can be happening is that when rmr2 tries to load the dependencies packages he is searching on the other libraries first (which don't have the dependencies).
I'll check if this is the case and update here for future references.
Eliano
Antonio Piccolboni
Owner
ElianoMarques
Hi,
Sorry for not replying earlier. I have solved this issue a couple of days ago. The second error was purely related to the location of the libraries. After reading the documentation about this package, I thought we had to create a library in a new folder (not the default one) and give permission to all users. However, (you probably already know this) the library that we need to use is the one under /usr/lib/R/library. This is the library that Rscript uses, so if the rmr2 packages and dependencies are not under this location, it will return an error. Same principle as when you use crontab in linux for Rscripts.
Thanks for your help,
Eliano
ElianoMarques ElianoMarques closed this
Antonio Piccolboni
Owner
Great, thanks for reporting back

1 comment:

  1. awesome post presented by you..your writing style is fabulous and keep update with your blogs Big data hadoop online Course Hyderabad

    ReplyDelete