Search in solr with special characters -
i have problem search special characters in solr. document has field "title" , can "titanic - 1999" (it has character "-"). when try search in solr "-" receive 400 error. i've tried escape character, tried "-" , "\-". changes solr doesn't response me error, returns 0 results.
how can search in solr admin special character(something "-" or "'"???
regards
update here can see current solr scheme https://gist.github.com/cpalomaresbazuca/6269375
my search field "title".
excerpt schema.xml:
... <!-- general text field has reasonable, generic cross-language defaults: tokenizes standardtokenizer, removes stop words case-insensitive "stopwords.txt" (empty default), , down cases. @ query time only, applies synonyms. --> <fieldtype name="text_general" class="solr.textfield" positionincrementgap="100"> <analyzer type="index"> <tokenizer class="solr.standardtokenizerfactory"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true" /> <!-- in example, use synonyms @ query time <filter class="solr.synonymfilterfactory" synonyms="index_synonyms.txt" ignorecase="true" expand="false"/> --> <filter class="solr.lowercasefilterfactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.standardtokenizerfactory"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true" /> <filter class="solr.synonymfilterfactory" synonyms="synonyms.txt" ignorecase="true" expand="true"/> <filter class="solr.lowercasefilterfactory"/> </analyzer> </fieldtype> ... <field name="title" type="text_general" indexed="true" stored="true"/>
you using standard text_general field title attribute. might not choice. text_general meant huge chunks of text (or @ least sentences) , not exact matching of names or titles.
the problem here text_general uses standardtokenizerfactory.
<fieldtype name="text_general" class="solr.textfield" positionincrementgap="100"> <analyzer type="index"> <tokenizer class="solr.standardtokenizerfactory"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true" /> <!-- in example, use synonyms @ query time <filter class="solr.synonymfilterfactory" synonyms="index_synonyms.txt" ignorecase="true" expand="false"/> --> <filter class="solr.lowercasefilterfactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.standardtokenizerfactory"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true" /> <filter class="solr.synonymfilterfactory" synonyms="synonyms.txt" ignorecase="true" expand="true"/> <filter class="solr.lowercasefilterfactory"/> </analyzer> </fieldtype> standardtokenizerfactory following:
a general purpose tokenizer strips many extraneous characters , sets token types meaningful values. token types useful subsequent token filters type-aware of same token types.
this means '-' character ignored , used tokenize string.
"kong-fu" represented "kong" , "fu". '-' disappears.
this explain why select?q=title:\- won't work here.
choose better fitting field type:
instead of standardtokenizerfactory use solr.whitespacetokenizerfactory, splits on whitespace exact matching of words. making own field type title attribute solution.
solr has mininal fieldtype called text_ws. depending on requirements might enough.
Comments
Post a Comment