Error in reading in data set in R

When reading in my data set in R as follows:
Dataset.df <- read.table("C:\\dataset.txt", header=T)
I get the following error message:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
   line 1 did not have 145 elements
What does this mean and can somebody tell me how to fix it?
share|improve this question

2 Answers

up vote 8 down vote accepted
This error is pretty self-explanatory. There seem to be data missing in the first line of your data file (or second line, as the case may be since you're using header = TRUE).
Here's a mini example:
## Create a small dataset to play with
cat("V1 V2\nFirst 1 2\nSecond 2\nThird 3 8\n", file="test.txt")
R automatically detects that it should expect rownames plus two columns (3 elements), but it doesn't find 3 elements on line 2, so you get an error:
read.table("test.txt", header = TRUE)
# Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
#   line 2 did not have 3 elements
Look at the data file and see if there is indeed a problem:
cat(readLines("test.txt"), sep = "\n")
# V1 V2
# First 1 2
# Second 2
# Third 3 8
Manual correction might be needed, or we can assume that the value first value in the "Second" row line should be in the first column, and other values should be NA. If this is the case, fill = TRUE is enough to solve your problem.
read.table("test.txt", header = TRUE, fill = TRUE)
#        V1 V2
# First   1  2
# Second  2 NA
# Third   3  8

R is also smart enough to figure it out how many elements it needs even if rownames are missing:
cat("V1 V2\n1\n2 5\n3 8\n", file="test2.txt")
cat(readLines("test2.txt"), sep = "\n")
# V1 V2
# 1
# 2 5
# 3 8
read.table("test2.txt", header = TRUE)
# Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
#   line 1 did not have 2 elements
read.table("test2.txt", header = TRUE, fill = TRUE)
#   V1 V2
# 1  1 NA
# 2  2  5
# 3  3  8
share|improve this answer

    
Thanks for that man! :) –  REnthusiast Aug 10 '13 at 10:48
When running into this error and reviewing my dataset which appeared to have no missing data, I discovered that a few of my entries had the special character "#" which derailed importing the data. Once I removed the "#" from the offending cells, the data imported without issue.

No comments:

Post a Comment