search suggestion - Solr (Open Solr) suggester results contain punctuation marks -


i'm working on suggester , results i'm gettig contain punctuation. example, when type "volcan" get:

"volcanoes", "volcanic", "volcano", "volcano,", <- comma "volcanoes." <- period/full stop

here code in solrconfig.xml file:

<searchcomponent class="solr.spellcheckcomponent" name="suggest">   <lst name="spellchecker">     <str name="name">suggest</str>     <str name="classname">org.apache.solr.spelling.suggest.suggester</str>     <str name="lookupimpl">org.apache.solr.spelling.suggest.tst.tstlookup</str>     <str name="field">text</str>     <float name="threshold">0.005</float>     <str name="buildoncommit">true</str>   </lst> </searchcomponent> <requesthandler class="org.apache.solr.handler.component.searchhandler" name="/suggest">   <lst name="defaults">     <str name="echoparams">explicit</str>     <str name="spellcheck">true</str>     <str name="spellcheck.dictionary">suggest</str>     <str name="spellcheck.onlymorepopular">true</str>     <str name="spellcheck.count">5</str>     <str name="spellcheck.collate">true</str>   </lst>   <lst name="invariants">       <!-- run suggester queries handler -->       <str name="spellcheck">true</str>       <!-- collate not needed, query if tokenized keyword, need suggestions term -->       <str name="spellcheck.collate">false</str>   </lst>   <arr name="components">     <str>suggest</str>   </arr> </requesthandler> 

in schema.xml file have this:

<fieldtype name="spell" class="solr.textfield" positionincrementgap="100" indexed="true" stored="false" multivalued="true" termvectors="true" termpositions="true" termoffsets="true">   <analyzer type="index">     <tokenizer class="solr.standardtokenizerfactory"/>     <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt"/>     <filter class="solr.standardfilterfactory"/>     <filter class="solr.removeduplicatestokenfilterfactory"/>     <filter class="solr.shinglefilterfactory"                     minshinglesize="2"                     maxshinglesize="4"                     outputunigrams="true"                     outputunigramsifnoshingles="true"/>     <filter class="solr.lowercasefilterfactory"/>   </analyzer>   <analyzer type="query">     <tokenizer class="solr.keywordtokenizerfactory"/>     <filter class="solr.synonymfilterfactory" synonyms="synonyms.txt" ignorecase="true" expand="true"/>     <filter class="solr.trimfilterfactory"/>     <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt"/>     <filter class="solr.standardfilterfactory"/>     <filter class="solr.removeduplicatestokenfilterfactory"/>     <filter class="solr.lowercasefilterfactory"/>   </analyzer> </fieldtype> 

and result is:

{     "responseheader": {         "status": 0,         "qtime": 0,         "params": {             "wt": "json",             "q": "volcan"         }     },     "spellcheck": {         "suggestions": [             "volcan",             {                 "numfound": 5,                 "startoffset": 0,                 "endoffset": 6,                 "suggestion": [                     "volcanoes",                     "volcanic",                     "volcano",                     "volcano,",                     "volcanoes."                 ]             }         ]     } } 

the problem not on requesthandler... rather, seems reside in way you're indexing files go spell field, , maybe spell field it's self. i'm thinking should enable tokenizer strips out punctuation fields.

here's spell field definition works me in schema.xml

<fieldtype name="spell" class="solr.textfield" positionincrementgap="100">   <analyzer type="index">     <tokenizer class="solr.standardtokenizerfactory"/>     <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt"/>     <filter class="solr.standardfilterfactory"/>     <filter class="solr.removeduplicatestokenfilterfactory"/>   </analyzer>   <analyzer type="query">     <tokenizer class="solr.standardtokenizerfactory"/>     <filter class="solr.synonymfilterfactory" synonyms="synonyms.txt" ignorecase="true" expand="true"/>     <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt"/>     <filter class="solr.standardfilterfactory"/>     <filter class="solr.removeduplicatestokenfilterfactory"/>   </analyzer> </fieldtype> 

Comments

Popular posts from this blog

matlab - "Contour not rendered for non-finite ZData" -

delphi - Indy UDP Read Contents of Adata -

qt - How to embed QML toolbar and menubar into QMainWindow -