apache - Solr tika not storing any data -

July 15, 2015

i faced peculiar problem. configured data config , schema per solr wiki here : tika dih

data config :

<dataconfig> <datasource type="binurldatasource" name="bin" />     <document>         <entity name="tika-test" processor="tikaentityprocessor"                  url = "http://adobe.com/content/dam/adobe/en/devnet/acrobat/pdfs/pdf_open_parameters.pdf" datasource= "bin" format="text" >                 <field column="author" name="author" meta="true"/>                  <field column="title" meta="true" name="title"/>                 <field column="text" name="text"/>         </entity>     </document> </dataconfig>

schema :

 <fields>    <field name="title" type="string" indexed="true" stored="true"/>     <field name="author" type="string" indexed="true" stored="true" />      <field name="text" type="text" indexed="true" stored="true" />    </fields>  <uniquekey>text</uniquekey>

i have executable jar of tika well, above document processed prefectly when use jar version command line. however, solr data import imports empty set of fields. succeeds document created empty fields. going wrong?

i tried using extractingrequesthandler well. how request handler setup :

 <requesthandler name="/update/extract" class="org.apache.solr.handler.extraction.extractingrequesthandler">     <lst name="defaults">       <str name="fmap.last-modified">last_modified</str>       <str name="uprefix">ignored_</str>     </lst>   </requesthandler>

attempting following request :

curl "http://localhost:3533/solr/solr/update/extract?literal.id=doc1&commit=true" -f "myfile=/home/superq/downloads/tutorial.html"

i empty response like:

<response><lst name="responseheader"><int name="status">0</int><int name="qtime">13</int></lst></response>

even log files don't have might help.and document not indexed yet. seems nothing being worked on changing target file name file not exist not throw error should.

my question :

1) solr tika integration need copy respective tika files(build artifacts) solr library path or need install service well?

2) converting files need create binary version of .doc/.pdf file , feed solr? saw literature on rather confusing. shouldn't tika taking care of this?

my article on setting tika & extracting request handler may of use you:

http://amac4.blogspot.co.uk/2013/07/setting-up-tika-extracting-request.html

Search This Blog

Detect

apache - Solr tika not storing any data -

Comments

Post a Comment

Popular posts from this blog

javascript - addthis share facebook and google+ url -

ios - Show keyboard with UITextField in the input accessory view -

c++ - importing crypto++ in QT application and occurring linker errors? -