R & hadoop: How hadoop io.file.buffer.size works?

How hadoop io.file.buffer.size works?

Dibyendu K. PG Student at Jadavpur University

Hi all,
I want to know how the parameter io.file.buffer.size works in hadoop?
My understanding is it's related with file read/write.

Whatever I have found, is that -------

In LineReader.java,it's used as the default buffer size for each line.
In BlockReader.newBlockReader(), it's used as the internal buffer size of the
BufferedInputStream. Also, in compression related classes, it's used as
default buffer size.

I want to know how this buffer size affects the hadoop file system and it's performance?
And it's internal working?
Help needed..
Thanks

Like
Comment (2)
Share
February 23, 2013

Comments

sam E. likes this

2 comments

sudhakara

sudhakara S.

Member Of Technical staff/Data Architect/Hadoop Engineer at Oracle India

The buffer size for I/O(read/write) operation on sequence files stored in disk files ie. it determines how much data is buffered in I/O pipes before from transferring to other operation during read and write operations, I should be multiple of OS file system block size(4KB normally).
Yes, it has significantly effect on performance, too low or too high cause much performance issue
it normally good have 32KB,64KB,128KB
Janardhanan

Janardhanan P.

Technical Consultant at SunTec Business Solutions

We need to understand one thing. Native OS filesystems like EXT4 has fixed file system buffer size. HDFS is an user space implementation of a file system and thus it allows the flexibility to specify a block size of our own choice. Another interesting aspect is that it is possible to change the buffer size without changing the HDFS or reinstallation. As usual, if you have large number of small files, specify a low value for the buffer size and if you have small number of large files, specify large values for the buffer size.

R & hadoop

How hadoop io.file.buffer.size works?

How hadoop io.file.buffer.size works?

Comments

No comments:

Post a Comment