hadoop - Duplicate value in part-r-00000 -


while processing xml file(https://github.com/studhadoop/xml/blob/master/rpt) getting duplicate values output.

bin/hadoop fs -text /user/root/t-output1/part-r-00000 st17925 1.02 st17925 1.02 st17926 3.00 st17926 3.00 st17927 3.00 st17927 3.00 

my mapred https://github.com/studhadoop/xml/blob/master/xmlparser11.java

why so? whether depends on size of xml file? whwn having large xml file, iget duplicated values. if small xml file ,the output ok. updated 1

one more doubt. instead of listing this

studentid grade st17925 1.02 st17926 3.00 st17927 3.00 

what change should make in program?

update 2 how make output in csv format?

because in reducer implementation, write key every value output collector:

   (text value : values) {       context.write(key, value);     } 

what wanted this:

 stringbuilder sb = new stringbuilder();  (text value : values) {     sb.append(value.tostring());     sb.append(" ");  }  context.write(key, new text(sb.tostring()); 

which generate space separated list of every value per key.


Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

jquery - Fancybox - apply a function to several elements -

An easy way to program an Android keyboard layout app -