hadoop - In which part/class of mapreduce is the logic of stopping reduce tasks implemented -

May 15, 2010

in hadoop mapreduce no reducer starts before mappers finished. can please explain me @ part/class/codeline logic implemented? talking hadoop mapreduce version 1 (not yarn). have searched map reduce framework there many classes , don't understand method calls , ordering.

in other words need (first test purposes) let reducers start reducing if there still working mappers. know way getting false results job know start of work changing parts of framework. should start , make changes?

this done in shuffle phase. hadoop 1.x, take @ org.apache.hadoop.mapred.reducetask.reducecopier, implements shuffleconsumerplugin. may want read "breaking mapreduce stage barrier" research paper verma et al.

edit:

after reading @chris-white 's answer, realized answer needed explanation. in mapreduce model, need wait mappers finish, since keys need grouped , sorted; plus, may have speculative mappers running , not know yet of duplicate mappers finish first. however, "breaking mapreduce stage barrier" paper indicates, applications, may make sense not wait of output of mappers. if want implement sort of behavior (most research purposes), should take @ classes mentioned above.

Search This Blog

Detect

hadoop - In which part/class of mapreduce is the logic of stopping reduce tasks implemented -

Comments

Post a Comment

Popular posts from this blog

javascript - addthis share facebook and google+ url -

ios - Show keyboard with UITextField in the input accessory view -

c++ - importing crypto++ in QT application and occurring linker errors? -