hdfs - Hadoop 2.0 data write operation acknowledgement -

- May 15, 2010

i have small query regarding hadoop data writes

from apache documentation

for common case, when replication factor three, hdfs’s placement policy put 1 replica on 1 node in local rack, on node in different (remote) rack, , last on different node in same remote rack. policy cuts inter-rack write traffic improves write performance. chance of rack failure far less of node failure;

in below image, when write acknowledge treated successful?

1) writing data first data node?

2) writing data first data node + 2 other data nodes?

i asking question because, have heard 2 conflicting statements in youtube videos. 1 video quoted write successful once data written 1 data node & other video quoted acknowledgement sent after writing data 3 nodes.

step 1: client creates file calling create() method on distributedfilesystem.

step 2: distributedfilesystem makes rpc call namenode create new file in filesystem’s namespace, no blocks associated it.

the namenode performs various checks make sure file doesn’t exist , client has right permissions create file. if these checks pass, namenode makes record of new file; otherwise, file creation fails , client thrown ioexception. thedistributedfilesystem returns fsdataoutputstream client start writing data to.

step 3: client writes data, dfsoutputstream splits packets, writes internal queue, called data queue. data queue consumed datastreamer, responsible asking namenode allocate new blocks picking list of suitable datanodes store replicas. list of datanodes forms pipeline, , here we’ll assume replication level three, there 3 nodes in pipeline. thedatastreamer streams packets first datanode in pipeline, stores packet , forwards second datanode in pipeline.

step 4: similarly, second datanode stores packet , forwards third (and last) datanode in pipeline.

step 5: dfsoutputstream maintains internal queue of packets waiting acknowledged datanodes, called ack queue. packet removed ack queue when has been acknowledged datanodes in pipeline.

step 6: when client has finished writing data, calls close() on stream.

step 7: action flushes remaining packets datanode pipeline , waits acknowledgments before contacting namenode signal file complete namenode knows blocks file made of , has wait blocks minimally replicated before returning successfully.

Search This Blog

Core code

hdfs - Hadoop 2.0 data write operation acknowledgement -

Comments

Post a Comment

Popular posts from this blog

php - Admin SDK -- get information about the group -

Python Error - TypeError: input expected at most 1 arguments, got 3 -

qt - Passing a QObject to an Script function with QJSEngine? -