hadoop - generating unique ids in hive -


i have been trying generate unique ids each row of table (30 million+ rows).

  • using sequential numbers not not work due parallel nature of hadoop.
  • the built in udfs rand() , hash(rand(),unixtime()) seem generate collisions.

there has simple way generate row ids, , wondering of has solution.

  • my next step creating java map reduce job generate real hash string secure random + host ip + current time seed. figure i'd ask here before doing ;)

use reflect udf generate uuids.

reflect("java.util.uuid", "randomuuid") 

Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

jquery - Fancybox - apply a function to several elements -

An easy way to program an Android keyboard layout app -