What's up with multi-word synonyms in Solr?

Datetime:2016-08-23 02:01:42          Topic: Solr           Share

There were some questions floating around the Solr mailing lists about multi-word synonyms and a few notable answers are as follows. The short version is, it’s complicated and every use case has different considerations. Doh!

An aside, I’ve been giving hon-lucene-synonyms some love since December. I got it working on Solr 5.3.1 and Solr 6.0.0 but neglected the documentation. The latest release of hon-lucene-synonyms included a number of namespace changes which weren’t completely reflected in the README.md so there has been some confusion as to how to get the plugin running. With that, the hon-lucene-synonyms README.md is now update to date explaining how to get the plugin working in Solr 6.0.0.

Doug Turnbull said Re: Solutions for Multi-word Synonyms ,

Honestly half the time I run into this problem, I end up creating a
QParserPlugin because I need to do something specific. With a QParserPlugin
I can run whatever analysis, slicing and dicing of the query string to
manually construct whatever I need to

http://www.supermind.org/blog/1134/custom-solr-queryparsers-for-fun-and-profit

One thing I often do is repeat the functionality of Elasticsearch's match
query. Elasticsearch's match query does the following:

- Analyze the query string using the field's query-time analyzer
- Create an OR query with the tokens that come out of the analysis

You can look at the field query parser as something of a starting point for
this.

I usually do this in the context of a boost query, not as the main edismax
query.

Bernd Fehling added Re: Solutions for Multi-word Synonyms ,

you should really try to build your own solution for Multi-word Synonyms
because every need is different and you can customize it for your special
use case, like adding a Thesaurus.

http://www.ub.uni-bielefeld.de/~befehl/base/solr/InsideBase_eurovocThesaurus.html

From myself Re: Solutions for Multi-word Synonyms (where APT refers to Lucidwork’s auto-phrasing tokenfilter ),

The auth-phrasing-token (APT ) filter is a two pronged solution that
requires index and query time processes versus hon-lucene-synonyms (HLS)
which is strictly a query time implementation. The primary take away from
that is, APT requires reindexing your data when you update the autophrases
and synonyms while HLS does not.

APT is more precise while HLS is more flexible.

Note that hon-lucene-synonyms is also very useful for when you have a single term in documents but want multiple multi-worded synonyms to find it. For example you could have FDA in your documents but can make matches like Food and Drug Administration,Food Drug Administration=>FDA which allows multi-term synonyms to be search for and inserted without reindexing the entire system.

If you have any questions please reach out and contact us !





About List