hadoop - In which part/class of mapreduce is the logic of stopping reduce tasks implemented -


in hadoop mapreduce no reducer starts before mappers finished. can please explain me @ part/class/codeline logic implemented? talking hadoop mapreduce version 1 (not yarn). have searched map reduce framework there many classes , don't understand method calls , ordering.

in other words need (first test purposes) let reducers start reducing if there still working mappers. know way getting false results job know start of work changing parts of framework. should start , make changes?

this done in shuffle phase. hadoop 1.x, take @ org.apache.hadoop.mapred.reducetask.reducecopier, implements shuffleconsumerplugin. may want read "breaking mapreduce stage barrier" research paper verma et al.

edit:

after reading @chris-white 's answer, realized answer needed explanation. in mapreduce model, need wait mappers finish, since keys need grouped , sorted; plus, may have speculative mappers running , not know yet of duplicate mappers finish first. however, "breaking mapreduce stage barrier" paper indicates, applications, may make sense not wait of output of mappers. if want implement sort of behavior (most research purposes), should take @ classes mentioned above.


Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

jquery - Fancybox - apply a function to several elements -

An easy way to program an Android keyboard layout app -