hadoop - generating unique ids in hive -


i have been trying generate unique ids each row of table (30 million+ rows).

  • using sequential numbers not not work due parallel nature of hadoop.
  • the built in udfs rand() , hash(rand(),unixtime()) seem generate collisions.

there has simple way generate row ids, , wondering of has solution.

  • my next step creating java map reduce job generate real hash string secure random + host ip + current time seed. figure i'd ask here before doing ;)

use reflect udf generate uuids.

reflect("java.util.uuid", "randomuuid") 

Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

javascript - addthis share facebook and google+ url -

ios - Show keyboard with UITextField in the input accessory view -