Django + Elastic Search + haystack = Powering your website with search functionality – part 1

Datetime:2016-08-23 02:08:56          Topic: Elastic Search  Django           Share

Search is one of the most important and powerful functionality in web applications these days. It helps users to easily find content, products etc on your website. Without search option, users will have to find their way out to reach the desired content which no one likes to do. Just imagine Amazon without search bar, you will have to navigate through various categories to find out the product, and it may take you forever to find it.

Having said that, implementing search into your web app is not difficult thanks to lot of libraries built for this purpose. In this article, we will discuss how to power up your django website with search functionality using haystack and elastic search. Assuming you already have a fair knowledge of django web framework, lets jump on to haystack and elastic search.

Elastic Search

Elastic search is a highly scalable lucene based search engine. It provides distributed, multitenant-capable full text search with support of schemaless JSON documents. ElasticSearch is able to achieve fast search responses because, instead of searching the text directly, it searches an index instead. It also provides RESTful API and almost any action can be performed using a simple RESTful API using JSON over HTTP. More details on elastic search can be found on its official page .

Haystack

Haystack is a django app which provides modular search and supports various backends like elastic search, whoosh, solr,  etc. It provides a unified API so that underlying backend can be changed if required without needing to modify the code.

Setting up haystack and elastic search

Installing haystack

Haystack can be installed via pip. After installation, just add it to your installed apps.

pip install django-haystack

INSTALLED_APPS = [
                  ....
                  'haystack',
                  ...
                  ]

Installing Elastic search

Download the elastic search from their official website . After downloading the file, unzip it and navigate to bin directory. You can run the elastic search executable to start the elastic search server with default config. Just hit 127.0.0.1:9200 in your browser to check whether your elastic search server is up or not.

You can also specify your own config file while starting elastic search server using the following command

elasticsearch --config=<PATH_TO YOUR_CONFIG_FILE>/elasticsearch.yml

You will also need to install elastic search python binding to get it working with haystack

pip install elasticsearch

Modifying django configuration to specify haystack backend

Once this is done, you need to modify django settings file and specify the search backend.

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
        'URL': 'http://127.0.0.1:9200/',
        'INDEX_NAME': 'haystack',
    },
}

That’s it, your django website is now running with haystack and elastic search. Now that you have setup haystack and elastic search, lets see how to use them in the next section.

Working with Search Indexes

First you need to create an index (SearchIndex) so that haystack knows what to search on. SearchIndex objects are the way Haystack determines what data should be placed in the search index. SearchIndex are field-based and manipulate/ store data similar to Django Models.

Lets assume we have a blog model with the following model attributes.

from django.db import models
from django.contrib.auth.models import User

class Blog(models.Model):
    user = models.ForeignKey(User)
    pub_date = models.DateTimeField()
    title = models.CharField(max_length=200)
    body = models.TextField()

    def __unicode__(self):
        return self.title

Creating Search Indexes

Now we want to build search functionality for this blog model with the capability to search in blog’s title, body and with author name. The first step is to create the SearchIndex as outlined below

import datetime
from haystack import indexes
from myapp.models import Blog

class BlogIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)
    author = indexes.CharField(model_attr='user')
    pub_date = indexes.DateTimeField(model_attr='pub_date')

    def get_model(self):
        return Blog

    def index_queryset(self, using=None):
        """Used when the entire index for model is updated."""
        return self.get_model().objects.filter(pub_date__lte=datetime.datetime.now())

Understanding Search Index

Every SearchIndex requires there be one (and only one) field with document=True . This indicates the primary search field to both Haystack and the search engine. Additionally, we’re providing use_template=True  on the text field. This allows us to use a data template (rather than error-prone concatenation) to build the document for search engine to index. Template is a simple text file and everything you want to be available for search should go in this file. Just create a new file named blog_text.txt inside your template directory with the following content

# templates/search/indexes/myapp/blog_text.txt

{{ object.title }}
{{ object.user.get_full_name }}
{{ object.body }}

Here we have included blog title, author name and blog body to be included for search.

Note that we have added author and pub_date fields as well in the BlogIndex. These are useful if you want to do additional filtering on your search results.

We have also specified custom index_queryset to only allow indexing of blogs whose published date is less than equals present date. This is done to prevent indexing of blogs which are not yet published. You can put any condition in this method and control what all things you want to be indexed.

That’s it, now run the following command to build the index

./manage.py rebuild_index

This will build the index, there are other commands also like clear_index, update_index etc which you will require later, full reference is given on the official haystack documentation page .

Querying Data

Now that your search is setup and index is built, its time to query data you need. Haystack has a very good API for querying data and is lot similar to django ORM in terms of usage and functions provided.

Haystack provides SearchQuerySet class to make performing a search and iterating over its results easy and consistent. Lets search for content “haystack with elastic search” using the index built previously.

from haystack.query import SearchQuerySet
results = SearchQuerySet().filter(content='haystack with elastic search')

The results can be iterated upon as well for individual items like shown below

for item in results:
    author = item.author
    ....

Often if you have multiple searchIndex classes, its better to specify which models to search in to speed up the search like shown below

from haystack.query import SearchQuerySet
results = SearchQuerySet().models(Blog).filter(content='haystack with elastic search')

You can also filter with other fields in the searchIndex class, use order_by, values, values_list and other options. Have a look at the official documentation for more details on SearchQuerySet API.

That’s it for this tutorial, I will talk about using autocomplete , spelling suggestions , custom backend and other functionalities of haystack and elastic search in the second part of this tutorial.

I hope you find this article helpful. Let me know if you have any suggestions/ feedback in the comments section below.

Fun Fact: Game of Thrones season 6 is back, and its episode 4 is also titled as the  book of stranger ��





About List