In TextInputFormat in hadoop mapreduce what is byte offset? and how key is as byte offset and value is as content of line? -

- July 15, 2014

while going through custominputformat topics came know have default inputformats textinputformat, keyvalueinputformat,sequencefileinputformat , nlineinputformat.

for textinputformat line read records , byte offset of line used key , content used value. byte offset , how content of line considered value please suggest.

textinputformat default inputformat . each record line of input. key, longwritable , byte offset within file of beginning of line. value contents of line, excluding line terminators (e.g., newline or carriage return), , packaged text object. file containing following text:

on top of crumpetty tree quangle wangle sat, face not see, on account of beaver hat.

is divided 1 split of 4 records. records interpreted following

key-value pairs:

(0, on top of crumpetty tree) (33, quangle wangle sat,) (57, face not see,) (89, on account of beaver hat.)

clearly, keys not line numbers. impossible implement in general, in file broken splits @ byte, not line, boundaries. splits processed independently. line numbers sequential notion. have keep count of lines consume them, knowing line number within split possible, not within file

however, offset within file of each line known each split independently of other splits, since each split knows size of preceding splits , adds onto offsets within split produce global file offset. offset sufficient applications need unique identifier each line. combined file’s name, unique within filesystem. of course, if lines fixed width, calculating line number matter of dividing offset width.

Search This Blog

Core code

In TextInputFormat in hadoop mapreduce what is byte offset? and how key is as byte offset and value is as content of line? -

Comments

Post a Comment

Popular posts from this blog

php - Admin SDK -- get information about the group -

Python Error - TypeError: input expected at most 1 arguments, got 3 -

qt - Passing a QObject to an Script function with QJSEngine? -