hadoop - generating unique ids in hive -
i have been trying generate unique ids each row of table (30 million+ rows).
- using sequential numbers not not work due parallel nature of hadoop.
- the built in udfs rand() , hash(rand(),unixtime()) seem generate collisions.
there has simple way generate row ids, , wondering of has solution.
- my next step creating java map reduce job generate real hash string secure random + host ip + current time seed. figure i'd ask here before doing ;)
use reflect udf generate uuids.
reflect("java.util.uuid", "randomuuid")
Comments
Post a Comment