parsing - text file of all titles / topic titles in Freebase -
i need text file contain every title / title of each topic / title of each item in .txt file each on own line.
how can or make if have downloaded freebase rdf dump?
if possible, need separate text file each topic's / item's description on single line each description on own line.
how can that?
i appreciate if me make either of these files freebase rdf dump.
thanks in advance!
filter rdf dump on predicate/property ns:type.object.name
. if want particular language, filter language e.g. @en
.
edit: missed second part descriptions being desired well. here's 3 part regex lines with:
- english names
- english descriptions
- a type of /commmon/topic
combining 3 left exercise reader.
zegrep $'\tns:(((type\\.object\\.name|common\\.topic\\.description)\t.*@en)|type\\.object\\.type\tns:common\\.topic)\\.$' freebase-rdf-2013-06-30-00-00.gz | gzip > freebase-rdf-2013-06-30-00-00-names-descriptions.gz
it seems have performance issue i'll have at. simple grep of entire file takes ~11 min on laptop, has been running several times that. i'll have @ later though...
Comments
Post a Comment