How hadoop io.file.buffer.size works?

How hadoop io.file.buffer.size works?

PG Student at Jadavpur University
Hi all,
I want to know how the parameter io.file.buffer.size works in hadoop?
My understanding is it's related with file read/write.

Whatever I have found, is that -------

In LineReader.java,it's used as the default buffer size for each line.
In BlockReader.newBlockReader(), it's used as the internal buffer size of the
BufferedInputStream. Also, in compression related classes, it's used as
default buffer size.

I want to know how this buffer size affects the hadoop file system and it's performance?
And it's internal working?
Help needed..
Thanks

Comments

  • sudhakara S.
    sudhakara
    Member Of Technical staff/Data Architect/Hadoop Engineer at Oracle India
    The buffer size for I/O(read/write) operation on sequence files stored in disk files ie. it determines how much data is buffered in I/O pipes before from transferring to other operation during read and write operations, I should be multiple of OS file system block size(4KB normally).
    Yes, it has significantly effect on performance, too low or too high cause much performance issue
    it normally good have 32KB,64KB,128KB
  • Janardhanan P.
    Janardhanan
    Technical Consultant at SunTec Business Solutions
    We need to understand one thing. Native OS filesystems like EXT4 has fixed file system buffer size. HDFS is an user space implementation of a file system and thus it allows the flexibility to specify a block size of our own choice. Another interesting aspect is that it is possible to change the buffer size without changing the HDFS or reinstallation. As usual, if you have large number of small files, specify a low value for the buffer size and if you have small number of large files, specify large values for the buffer size.

No comments:

Post a Comment