Thursday 30 October 2014

Error in FUN(X[[2L]], …) : Sorry, parameter type `NA' is ambiguous or not supported

This is error because the dataset which you are considering consist of na value. i.e null value
while performing matrix multipication and other any operation with null value create error.
 To overcome this problem you need to select one of the follow method:
i) Either use rm=na while taking input. i.e mean remove null values.
ii) Take the dataset which doesn't consist of null value. i.e clean dataset.

No need to go through the document below: It is the reference i followed and search and
paste for the future reference at that time.

I saw high view in this page so i share my experience with you ppl.




I am trying the below R script to built logistic regression model using RHadoop (rmr2, rhdfs packages) on an HDFS data file located at "hdfs://:/somnath/merged_train/part-m-00000" and then testing the model using a test HDFS data file at "hdfs://:/somnath/merged_test/part-m-00000".
We are using CDH4 distribution with Yarn/MR2 running parallel to MR1 supported by Hadoop-0.20. And using the hadoop-0.20 mapreduce and hdfs versions to run the below RHadoop script as Sys.setenv commands shown below.
However, whenever I am running the script, I am facing the below error with very little luck to bypass it. I would appreciate if somebody point me to the possible cause of this error which seems to be due to wrong way of lapply call in R without handling NA arguments.
[root@kkws029 logreg_template]# Rscript logreg_test.R
Loading required package: methods
Loading required package: rJava

HADOOP_CMD=/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin/hadoop

Be sure to run hdfs.init()
14/08/11 11:59:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
NULL
NULL
[1] "Starting to build logistic regression model..."
Error in FUN(X[[2L]], ...) :
  Sorry, parameter type `NA' is ambiguous or not supported.
Calls: logistic.regression ... .jrcall -> ._java_valid_objects_list -> lapply -> FUN
Execution halted
Below is my R-script :
#!/usr/bin/env Rscript


Sys.setenv(HADOOP_HOME="/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce")
Sys.setenv(HADOOP_CMD="/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin/hadoop")


Sys.setenv(HADOOP_BIN="/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin");
Sys.setenv(HADOOP_CONF_DIR="/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/conf");
Sys.setenv(HADOOP_STREAMING="/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.3.0.jar")
Sys.setenv(LD_LIBRARY_PATH="/usr/lib64/R/library/rJava/jri")


library(rmr2)
library(rhdfs)

.jinit()
.jaddClassPath("/opt/cloudera/parcels/CDH/lib/hadoop/hadoop-auth-2.0.0-cdh4.3.0.jar")
.jaddClassPath("/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/hadoop-hdfs-2.0.0-cdh4.3.0.jar")
.jaddClassPath("/opt/cloudera/parcels/CDH/lib/hadoop/hadoop-common-2.0.0-cdh4.3.0.jar")


hdfs.init()
rmr.options( backend = "hadoop", hdfs.tempdir = "/tmp" )

logistic.regression =
        function(hdfsFilePath, iterations, dims, alpha) {
                r.file <- hdfs.file(hdfsFilePath,"r")

                #hdfsFilePath <- to.dfs(hdfsFilePath)

                lr.map =
                  function(.,M) {
                    Y = M[,1]
                    X = M[,-1]
                    keyval(
                        1,
                        Y * X *
                          g(-Y * as.numeric(X %*% t(plane))))}

                lr.reduce =
                  function(k, Z)
                    keyval(k, t(as.matrix(apply(Z,2,sum))))

                plane = t(rep(0, dims))
                g = function(z) 1/(1 + exp(-z))
                for (i in 1:iterations) {
                        gradient =
                                values(
                                        from.dfs(
                                          mapreduce(
                                            input = as.matrix(hdfs.read.text.file(r.file)),
                                            #input = from.dfs(hdfsFilePath),
                                            map = function(.,M) {
                                                Y = M[,1]
                                                X = M[,-1]
                                                keyval(
                                                        1,
                                                        Y * X *
                                                                g(-Y * as.numeric(X %*% t(plane))))},
                                            reduce = lr.reduce,
                                            combine = T)))
                        plane = plane + alpha * gradient

                        #trace(print(plane),quote(browser()))
                 }
                return(plane) }

#validate logistic regression
logistic.regression.test =
        function(hdfsFilePath, weight) {
                r.file <- hdfs.file(hdfsFilePath,"r")
                lr.test.map =
                        function(.,M) {
                          keyval(
                             1,
                             lapply(as.numeric(M[,-1] %*% t(weight)),function(z) 1/(1 + exp(-z))))}

                probabilities =
                     values(
                             from.dfs(
                                mapreduce(
                                  input = as.matrix(hdfs.read.text.file(r.file)),
                                  map = function(.,M) {
                                        keyval(
                                                1,
                                                lapply(as.numeric(M[,-1] %*% t(weight)), function(z) 1/(1 + exp(-z))))}
                        )))
        return(probabilities) }

out = list()
prob = list()
rmr.options( backend = "hadoop", hdfs.tempdir = "/tmp" )

print("Starting to build logistic regression model...")

  out[['hadoop']] =
## @knitr logistic.regression-run
    logistic.regression(
       "hdfs://XX.XX.XX.XX:NNNN/somnath/merged_train/part-m-00000", 5, 5, 0.05)
  write.csv(as.vector(out[['hadoop']]), "/root/somnath/logreg_data/weights.csv")


print("Building logistic regression model completed.")


  prob[['hadoop']] =
    logistic.regression.test(
       "hdfs://XX.XX.XX.XX:NNNN/somnath/merged_test/part-m-00000", out[['hadoop']])
  write.csv(as.vector(prob[['hadoop']]), "/root/somnath/logreg_data/probabilities.csv")


stopifnot(
  isTRUE(all.equal(out[['local']], out[['hadoop']], tolerance = 1E-7)))
NOTE: I have set following environment variables for HADOOP as follows in root ~/.bash_profile
# Hadoop-specific environment and commands

export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce
export HADOOP2_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
#export HADOOP_CMD=${HADOOP_HOME}/bin/hadoop
#export HADOOP_STREAMING=/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming.jar
#export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

export LD_LIBRARY_PATH=${R_HOME}/library/rJava/jri #:${HADOOP_HOME}/../hadoop-0.20-mapreduce/lib/native/Linux-amd64-64


# Add hadoop-common jar to classpath for PlatformName and FsShell classes; Add hadoop-auth and hadoop-hdfs jars

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${HADOOP2_HOME}/client-0.20/* #:${HADOOP_HOME}/*.jar:${HADOOP_HOME}/lib/*.jar:${HADOOP2_HOME}/hadoop-common-2.0.0-cdh4.3.0.jar:${HADOOP_HOME}/../hadoop-hdfs/hadoop-hdfs-2.0.0-cdh4.3.0.jar:${HADOOP_HOME}/hadoop-auth-2.0.0-cdh4.3.0.jar:$HADOOP_STREAMING

PATH=$PATH:$R_HOME/bin:$JAVA_HOME/bin:$LD_LIBRARY_PATH:/opt/cloudera/parcels/CDH/lib/mahout:/opt/cloudera/parcels/CDH/lib/hadoop:/opt/cloudera/parcels/CDH/lib/hadoop-hdfs:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce:/var/lib/storm-0.9.0-rc2/lib #:$HADOOP_CMD:$HADOOP_STREAMING:$HADOOP_CONF_DIR

export PATH

SAMPLE TRAIN DATASET

0,-4.418,-2.0658,1.2193,-0.68097,0.90894
0,-2.7466,-2.9374,-0.87562,-0.65177,0.53182
0,-0.98846,0.66962,-0.20736,-0.2895,0.002313
0,-2.277,2.492,0.47936,0.4673,-1.5075
0,-5.4391,1.8447,-1.6843,1.465,-0.71099
0,-0.12843,0.066968,0.02678,-0.040851,0.0075902
0,-2.0796,2.4739,0.23472,0.86423,0.45094
0,-3.1796,-0.15429,1.4814,-0.94316,-0.52754
0,-1.9429,1.3111,0.31921,-1.202,0.8552
0,-2.3768,1.9301,0.096005,-0.51971,-0.17544
0,-2.0336,1.991,0.82029,0.018232,-0.33222
0,-3.6388,-3.2903,-2.1076,0.73341,0.75986
0,-2.9146,0.53163,0.49182,-0.38562,-0.76436
0,-3.3816,1.0954,0.25552,-0.11564,-0.01912
0,-1.7374,-0.63031,-0.6122,0.022664,0.23399
0,-1.312,-0.54935,-0.68508,-0.072985,0.036481
0,-3.991,0.55278,0.38666,-0.56128,-0.6748
....

SAMPLE TEST DATASET

0,-0.66666,0.21439,0.041861,-0.12996,-0.36305
0,-1.3412,-1.1629,-0.029398,-0.13513,0.49758
0,-2.6776,-0.40194,-0.97336,-1.3355,0.73202
0,-6.0203,-0.61477,1.5248,1.9967,2.697
0,-4.5663,-1.6632,-1.2893,-1.7972,1.4367
0,-7.2339,2.4589,0.61349,0.39094,2.19
0,-4.5683,-1.3066,1.1006,-2.8084,0.3172
0,-4.1223,-1.5059,1.3063,-0.18935,1.177
0,-3.7135,-0.26283,1.6961,-1.3499,-0.18553
0,-2.7993,1.2308,-0.42244,-0.50713,-0.3522
0,-3.0541,1.8173,0.96789,-0.25138,-0.36246
0,-1.1798,1.0478,-0.29168,-0.26261,-0.21527
0,-2.6459,2.9387,0.14833,0.24159,-2.4811
0,-3.1672,2.479,-1.2103,-0.48726,0.30974
1,-0.90706,1.0157,0.32953,-0.11648,-0.47386
...
share|improve this question

    
This is quite convoluted code. It will take a highly motivated person to go through all this. Please break it down to the offending line. –  Roman Luštrik Aug 11 at 8:35
    
@RomanLuštrik: Since the error is specific to .jrcall in rJava package which I am not calling directly in my code, I am assuming that the problem is in mapreduce(input = as.matrix(hdfs.read.text.file(r.file)),map = function(.,M) { keyval(1,lapply(as.numeric(M[,-1] %*% t(weight)), function(z) 1/(1 + exp(-z))))} ))) where the input to map is a matrix M read from a file stored in HDFS. Most probably the call to lapply may not be getting the expected input from the matrix. I have added sample train and test data inputs from HDFS files to explain better –  somnathchakrabarti Aug 11 at 8:57
    
@RomanLuštrik: I tried running the below input command from R console and received the same error. So the way I am reading HDFS file as matrix is giving the error: > input = as.matrix(hdfs.read.text.file(r.file)) Error in FUN(X[[2L]], ...) : Sorry, parameter type `NA' is ambiguous or not supported. –  somnathchakrabarti Aug 11 at 10:17
    
In R, traceback(), debug(), debugonce() and browser() can yield insightful. –  Roman Luštrik Aug 11 at 10:25
    
yes I replicated the code line-by-line in R console to understand which specific line was giving the error and found out that the input line is giving the error message. If you can provide some insights on how to read a HDFS file as input matrix that would be helpful. –  somnathchakrabarti Aug 11 at 10:27

Linear Regression in R Mapreduce(RHadoop)

I m new to RHadoop and also to RMR... I had an requirement to write a Mapreduce Job in R Mapreduce. I have Tried writing but While executing this it gives an Error. Tring to read the file from hdfs

Error:

Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
   hadoop streaming failed with error code 1

Code :

Sys.setenv(HADOOP_HOME="/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop")
Sys.setenv(HADOOP_CMD="/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/bin/hadoop")

Sys.setenv(HADOOP_STREAMING="/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.7.0.jar")
library(rmr2)
library(rhdfs)
hdfs.init()
day_file = hdfs.file("/hdfs/bikes_LR/day.csv","r")
day_read = hdfs.read(day_file)
c = rawToChar(day_read)

XtX =
  values(from.dfs(
    mapreduce(
      input = "/hdfs/bikes_LR/day.csv",
      map=
        function(.,Xi){
         yi =c[Xi[,1],]
         Xi = Xi[,-1]
         keyval(1,list(t(Xi)%*%Xi))
       },
  reduce = function(k,v )
  {
    vals =as.numeric(v)
    keyval(k,sum(vals))
  } ,
  combine = TRUE)))[[1]]

XtY =
 values(from.dfs(
    mapreduce(
     input = "/hdfs/bikes_LR/day.csv",
     map=
       function(.,Xi){
         yi =c[Xi[,1],]
         Xi = Xi[,-1]
        keyval(1,list(t(Xi)%*%yi))
       },
     reduce = TRUE ,
     combine = TRUE)))[[1]]
solve(XtX,XtY)



Input:
------------

instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600
6,2011-01-06,1,0,1,0,4,1,1,0.204348,0.233209,0.518261,0.0895652,88,1518,1606
7,2011-01-07,1,0,1,0,5,1,2,0.196522,0.208839,0.498696,0.168726,148,1362,1510
8,2011-01-08,1,0,1,0,6,0,2,0.165,0.162254,0.535833,0.266804,68,891,959
9,2011-01-09,1,0,1,0,0,0,1,0.138333,0.116175,0.434167,0.36195,54,768,822
10,2011-01-10,1,0,1,0,1,1,1,0.150833,0.150888,0.482917,0.223267,41,1280,1321



 Please Suggest me any mistakes.
shareimprove this question

    
You need to find the log output from your R script, which would indicate the error. "hadoop streaming failed with error code 1" just means "the script failed for some reason" –  Sean Owen Jul 3 at 11:19
    
Sometimes the folder where you write must be deleted before writing (if it exists). Check that out. –  adesantos Jul 3 at 11:41
    
thank u for ur answer ... but i have check all the possibilites what u have mentioned... i doubt that there is some problem with code itself...can someone please rectify... –  user3782364 Jul 3 at 15:19

Thursday 9 October 2014

select

> example(select)

select> #data.frame
select> where(mtcars, cyl>4 & mpg > 15)
                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C         17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE        16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL        17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC       15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Dodge Challenger  15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin       15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Pontiac Firebird  19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Ford Pantera L    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino      19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6

select> #pipe
select> as.data.frame(where(input(mtcars), cyl > 4 & mpg > 15))
                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C         17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE        16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL        17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC       15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Dodge Challenger  15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin       15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Pontiac Firebird  19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Ford Pantera L    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino      19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6

select> # select two columns
select> as.data.frame(transmute(input(mtcars), cyl, mpg))
   cyl  mpg
1    6 21.0
2    6 21.0
3    4 22.8
4    6 21.4
5    8 18.7
6    6 18.1
7    8 14.3
8    4 24.4
9    4 22.8
10   6 19.2
11   6 17.8
12   8 16.4
13   8 17.3
14   8 15.2
15   8 10.4
16   8 10.4
17   8 14.7
18   4 32.4
19   4 30.4
20   4 33.9
21   4 21.5
22   8 15.5
23   8 15.2
24   8 13.3
25   8 19.2
26   4 27.3
27   4 26.0
28   4 30.4
29   8 15.8
30   6 19.7
31   8 15.0
32   4 21.4

select> # create additional column
select> as.data.frame(transmute(input(mtcars), ratio = cyl/mpg, .cbind = TRUE))
                     mpg cyl  disp  hp drat    wt  qsec vs am gear
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4
                    carb     ratio
Mazda RX4              4 0.2857143
Mazda RX4 Wag          4 0.2857143
Datsun 710             1 0.1754386
Hornet 4 Drive         1 0.2803738
Hornet Sportabout      2 0.4278075
Valiant                1 0.3314917
Duster 360             4 0.5594406
Merc 240D              2 0.1639344
Merc 230               2 0.1754386
Merc 280               4 0.3125000
Merc 280C              4 0.3370787
Merc 450SE             3 0.4878049
Merc 450SL             3 0.4624277
Merc 450SLC            3 0.5263158
Cadillac Fleetwood     4 0.7692308
Lincoln Continental    4 0.7692308
Chrysler Imperial      4 0.5442177
Fiat 128               1 0.1234568
Honda Civic            2 0.1315789
Toyota Corolla         1 0.1179941
Toyota Corona          1 0.1860465
Dodge Challenger       2 0.5161290
AMC Javelin            2 0.5263158
Camaro Z28             4 0.6015038
Pontiac Firebird       2 0.4166667
Fiat X1-9              1 0.1465201
Porsche 914-2          2 0.1538462
Lotus Europa           2 0.1315789
Ford Pantera L         4 0.5063291
Ferrari Dino           6 0.3045685
Maserati Bora          8 0.5333333
Volvo 142E             2 0.1869159

select> # summaries
select> as.data.frame(transmute(input(mtcars), mean(cyl), mean(mpg)))
  mean.cyl. mean.mpg.
1    6.1875  20.09062

select> # summaries by groups
select> as.data.frame(transmute(group(input(mtcars), cyl), mean(mpg)))
    cyl mean.mpg.
1     6  19.74286
1.1   4  26.66364
1.2   8  15.10000