R & hadoop: RHadoop not working: Java heap space error

Wednesday, 17 December 2014

RHadoop not working: Java heap space error

Hello!

I am trying to get RHadoop working on a Hadoop cluster.

This is a test cluster for development / proof of concept.
It has 5 nodes, virtualized in VMware.
The OS on all nodes is centos 6.4

To install R on Hadoop, I followed these instructions:

https://github.com/RevolutionAnalytics/RHadoop/wiki

I did not encounter any errors during installation.
However, the following example analysis is constantly failing with a Java heap size error:

groups = rbinom(100, n = 500, prob = 0.5)
tapply(groups, groups, length)
require(‘rmr2′)
groups = rbinom(100, n = 500, prob = 0.5)
groups = to.dfs(groups)
result = mapreduce(
input = groups,
map = function(k,v) keyval(v, 1),
reduce = function(k,vv) keyval(k, length(vv)))
print(result())
print(from.dfs(result, to.data.frame=T))

The code above is from this repo:

https://github.com/hortonworks/HDP-Public-Utilities/tree/master/Installation/r

Please find more information here:

https://raw.githubusercontent.com/manuel-at-coursera/mixedStuff/master/RHadoop_bugReport.md

Any help to get this solved would be very much appreciated!

Best,

Manuel

Viewing 3 replies - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.

Author
Replies
October 1, 2014 at 4:17 am #61153

Manuel
Participant

Yes, the issue seems to be resolved (or at least a work-around available):

https://groups.google.com/forum/#!topic/rhadoop/E1-riwegvD4

Basically, it boils down to that rmr2 uses its own settings, independent from what was set in Hadoop itself.
The feedback from Antonio (link above) was very helpful.

The following information might be helpfull as well.

Within Hadoop I used these settings:

Number of containers: 2
RAM per container: 2048 MB

Configuration Setting Value Calculation Value (MB)
yarn.nodemanager.resource.memory-mb = containers * RAM-per-container 4096
yarn.scheduler.minimum-allocation-mb = RAM-per-container 2048
yarn.scheduler.maximum-allocation-mb = containers * RAM-per-container 4096
mapreduce.map.memory.mb = RAM-per-container 2048
mapreduce.reduce.memory.mb = 2 * RAM-per-container 4096
mapreduce.map.java.opts = 0.8 * RAM-per-container 1638
mapreduce.reduce.java.opts = 0.8 * 2 * RAM-per-container 3277
yarn.app.mapreduce.am.resource.mb = 2 * RAM-per-container 4096
yarn.app.mapreduce.am.command-opts = 0.8 * 2 * RAM-per-container 3277

For rmr2, I used this code to change all in one go (starting from the prompt in centOS 6.4):

R;
library(rmr2);
bp = rmr.options(“backend.parameters”);
bp$hadoop[1] = “mapreduce.map.java.opts=-Xmx1024M”;
bp$hadoop[2] = “mapreduce.reduce.java.opts=-Xmx2048M”;
bp$hadoop[3] = “mapreduce.map.memory.mb=1280″;
bp$hadoop[4] = “mapreduce.reduce.memory.mb=2560″;
rmr.options(backend.parameters = bp);
rmr.options(“backend.parameters”)

1 comment:

veera cynixit28 July 2020 at 23:41
Very nice blog,keep updating more posts.

big data hadoop training
ReplyDelete
Replies