Django + Elastic Search + haystack = Powering your website with search functionality – part 2

Datetime:2016-08-23 02:07:59          Topic: Django  Elastic Search           Share

Search is one of the most important and powerful functionality in web applications these days. It helps users to easily find content, products etc on your website. Without search option, users will have to find their way out to reach the desired content which no one likes to do. Just imagine Amazon without search bar, you will have to navigate through various categories to find out the product, and it may take you forever to find it.

Having said that, implementing search into your web app is not difficult thanks to lot of libraries built for this purpose. I have broken down this tutorial into two parts.First article explains how to setup elastic search and haystack with django, create search search indexes and how to query data. In this article, we will take one step forward and discuss about using autocomplete, spelling suggestions and creating custom backends. If you haven’t read part-1, I suggest you do it first and then jump to this article.

Remember the SearchIndex we created in the part-1 of this tutorial, we are going to continue modifying the same in this article.

import datetime
from haystack import indexes
from myapp.models import Blog

class BlogIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)
    author = indexes.CharField(model_attr='user')
    pub_date = indexes.DateTimeField(model_attr='pub_date')

    def get_model(self):
        return Blog

    def index_queryset(self, using=None):
        """Used when the entire index for model is updated."""
        return self.get_model().objects.filter(pub_date__lte=datetime.datetime.now())

Spelling Suggestions

Many times your users type in a wrong spelling of a word and press search. At this point, its wise to show him the corrected spelling and show search results according to that. You can also give an option to user if he wants to search for the entered words only. Google search also works like that as shown below

Here the input from the user is “book of strangrer” as shown in red, but google suggests and shows results for “book of stranger” which is highlighted in green. It also gives an option to the user if he still wants to search for the entered value (as shown in blue). Lets talk about how to achieve this in your django app.

To enable the spelling suggestion functionality in Haystack, first you need to create a special field in your SearchIndex which mirrors the content of the “text” field. The modified SearchIndex is shown below

class BlogIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)
    author = indexes.CharField(model_attr='user')
    pub_date = indexes.DateTimeField(model_attr='pub_date')
    suggestions = indexes.FacetCharField()

    def prepare(self, obj):
       prepared_data = super(BlogIndex, self).prepare(obj)
       prepared_data['suggestions'] = prepared_data['text']
       return prepared_data
   
    def get_model(self):
        return Blog

    def index_queryset(self, using=None):
        """Used when the entire index for model is updated."""
        return self.get_model().objects.filter(pub_date__lte=datetime.datetime.now())

You also need to include spellCheckComponent in haystack settings. Just modify your django settings file to include spellings as mentioned below

HAYSTACK_CONNECTIONS = {
  'default': {
    'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
    'URL': 'http://127.0.0.1:9200/',
    'INDEX_NAME': 'haystack',
    'INCLUDE_SPELLING':True,
  },
}

You are all set to show spelling suggestions now. You can query for suggested searchTerm like this

suggestedSearchTerm = SearchQuerySet().spelling_suggestion(searchTerm)

Short and simple right ��

Autocomplete Functionality

Autocomplete is becoming an important feature whenever search functionality is present. Implementing this using django-haystack is not difficult at all. First you need to prepare data, and then you need to implement the search to fetch these suggestions.

You have to create an additional field that contains the text you want to autocomplete on in your SearchIndex. This can be a NgramField or EdgeNgramField, both have different uses.

  • EdgeNgramField tokenizes on the whitespace which prevents incorrect matching across two words. Most of the time you should be using this only.
  • However if its required to autocomplete across word boundaries, then you should use NgramField

Your SearchIndex need to be modified as shown below

class BlogIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)
    author = indexes.CharField(model_attr='user')
    pub_date = indexes.DateTimeField(model_attr='pub_date')
    suggestions = indexes.FacetCharField()
   text_auto = indexes.EdgeNgramField(model_attr='getAutocompleteText')

Here, getAutocompleteText is a function in the Blog model which supplies the data for autocompletion. You can query these results using SearchQuerySet.autocomplete method

SearchQuerySet().autocomplete(text_auto='old')
# Result match things like 'goldfish', 'cuckold' and 'older'.

The above query can also be written as

SearchQuerySet().filter(text_auto='old')  # Result match things like 'goldfish', 'cuckold' and 'older'.

That’s it, your site has the autocomplete functionality now.

Creating Custom Backend

Lot of times, you may feel the requirement to extend the existing backend to incorporate certain functionality. For example, the default min_gram value for an edgeNgram tokenizer is 2, but what if you want to modify it to 3. This can be achieved by extending the elasticSearch backend and overriding these values. I am just changing the min_gram value for demonstration purpose, you are free to modify other functionality as well.

First lets define the custom elasticsearch settings in the django settings file. You can see the default_settings of elasticsearch backend on the official haystack source code

ELASTICSEARCH_INDEX_SETTINGS = {
  'settings': {
      "analysis": {
          "analyzer": {
             "edgengram_analyzer": {
                 "type": "custom",
                 "tokenizer": "lowercase",
                 "filter": ["haystack_edgengram"]
              }
           },
           "tokenizer": {
              "haystack_edgengram_tokenizer": {
                 "type": "edgeNGram",
                 "min_gram": 3,
                 "max_gram": 15,
                 "side": "front"
              }
           },
           "filter": {
              "haystack_edgengram": {
                 "type": "edgeNGram",
                 "min_gram": 3,
                 "max_gram": 15
              }
            }
        }
    }
}

Next I am going to create a new custom backend file, lets call it my_elaticsearch_backend.py

from django.conf import settings

from haystack.backends.elasticsearch_backend import ElasticsearchSearchBackend, ElasticsearchSearchEngine


class ConfigurableSearchBackend(ElasticsearchSearchBackend):

 def __init__(self, connection_alias, **connection_options):
    super(ConfigurableSearchBackend, self).__init__(connection_alias, **connection_options)
    setattr(self, 'DEFAULT_SETTINGS', settings.ELASTICSEARCH_INDEX_SETTINGS)


class ConfigurableSearchEngine(ElasticsearchSearchEngine):
 backend = ConfigurableSearchBackend

This extended backend just replaces the ‘DEFAULT_SETTINGS’ with your own custom settings.

Now you just to reference this new backend in the haystack connection settings.

HAYSTACK_CONNECTIONS = {
 'default': {
   'ENGINE': 'home.my_elasticbackend.ConfigurableSearchEngine',
   'URL': 'http://127.0.0.1:9200/',
   'INDEX_NAME': 'haystack',
   'INCLUDE_SPELLING':True,
 },
}

There are other features also not discussed in this article like boosting, faceting etc which you may require for certain use cases. You can read about them on official django-haystack documentation .

Make sure to rebuild/update your index after making these changes.

While updating indexing, make sure to add ‘remove’ argument otherwise haystack will keep data in index even if the corresponding model is deleted from the database.

./manage.py update_index --remove

I hope you find this article helpful. Let me know if you have any suggestions/ feedback in the comments section below.

Fun Fact: Game of Thrones season 6 is back, and its episode 4 is also titled as the  book of stranger ��





About List