apache - Solr tika not storing any data -


i faced peculiar problem. configured data config , schema per solr wiki here : tika dih

data config :

<dataconfig> <datasource type="binurldatasource" name="bin" />     <document>         <entity name="tika-test" processor="tikaentityprocessor"                  url = "http://adobe.com/content/dam/adobe/en/devnet/acrobat/pdfs/pdf_open_parameters.pdf" datasource= "bin" format="text" >                 <field column="author" name="author" meta="true"/>                  <field column="title" meta="true" name="title"/>                 <field column="text" name="text"/>         </entity>     </document> </dataconfig> 

schema :

 <fields>    <field name="title" type="string" indexed="true" stored="true"/>     <field name="author" type="string" indexed="true" stored="true" />      <field name="text" type="text" indexed="true" stored="true" />    </fields>  <uniquekey>text</uniquekey> 

i have executable jar of tika well, above document processed prefectly when use jar version command line. however, solr data import imports empty set of fields. succeeds document created empty fields. going wrong?

i tried using extractingrequesthandler well. how request handler setup :

 <requesthandler name="/update/extract" class="org.apache.solr.handler.extraction.extractingrequesthandler">     <lst name="defaults">       <str name="fmap.last-modified">last_modified</str>       <str name="uprefix">ignored_</str>     </lst>   </requesthandler> 

attempting following request :

curl "http://localhost:3533/solr/solr/update/extract?literal.id=doc1&commit=true" -f "myfile=/home/superq/downloads/tutorial.html" 

i empty response like:

<response><lst name="responseheader"><int name="status">0</int><int name="qtime">13</int></lst></response> 

even log files don't have might help.and document not indexed yet. seems nothing being worked on changing target file name file not exist not throw error should.

my question :

1) solr tika integration need copy respective tika files(build artifacts) solr library path or need install service well?

2) converting files need create binary version of .doc/.pdf file , feed solr? saw literature on rather confusing. shouldn't tika taking care of this?

my article on setting tika & extracting request handler may of use you:

http://amac4.blogspot.co.uk/2013/07/setting-up-tika-extracting-request.html


Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

jquery - Fancybox - apply a function to several elements -

An easy way to program an Android keyboard layout app -