Building Full-Text Search with PostgreSQL and Phoenix

Datetime:2016-08-23 02:17:02          Topic: PostgreSQL           Share

I’ve been building an Elixir / Phoenix job board over the last two weekends–I’m a big fan of the ecosystem and I wanted to prototype a little project that benefited the community at large. One useful aspect of a job board is searching – being able to query for “Los Angeles” and returning jobs that are actually in Los Angeles, instead of scrolling through a list looking for each location.

Phoenix uses PostgreSQL by default, which is great for building a search solution. While Postgres has a full documentation section about full-text search , I wanted to quickly give an overview of how I specifically added full-text search to my Phoenix application.

(I want to note here that in my instance, I’m searching across a single model Job – if you need to search across multiple models, some of the code below will need additional configuration)

We need to determine what fields are going to be searchable. For a Job, those fields are “title”, “company”, and “location”. We’ll create a Phoenix migration to index those fields, using Postgres’ GIN indexes – a specific type of index designed to speed up full-text search. Read more in the Postgres documentation , if you’re interested.

defmoduleWorkwithelixir.Repo.Migrations.AddIndexesToJobdo
  use Ecto.Migration
 
  defchangedo
    execute("CREATE INDEX index_jobs_on_title ON jobs USING gin(to_tsvector('english', title))")
    execute("CREATE INDEX index_jobs_on_company ON jobs USING gin(to_tsvector('english', company))")
    execute("CREATE INDEX index_jobs_on_location ON jobs USING gin(to_tsvector('english', location))")
  end
end

With the fields indexed, we’ll create a Postgres “view”, which will create a singular access point to searching all the fields on our model:

defmoduleWorkwithelixir.Repo.Migrations.AddSearchesViewdo
  use Ecto.Migration
 
  defchangedo
    execute("
CREATE VIEW searches AS
 
  SELECT
    jobs.id AS searchable_id,
    'Job' AS searchable_type,
    jobs.title AS term
  FROM jobs
 
  UNION
 
  SELECT
    jobs.id AS searchable_id,
    'Job' AS searchable_type,
    jobs.company AS term
  FROM jobs
 
  UNION
 
  SELECT
    jobs.id AS searchable_id,
    'Job' AS searchable_type,
    jobs.location AS term
  FROM jobs
 
    ")
  end
end

(Note here that if you are using multiple models, you should also UNION here with a SELECT from each model… the pattern for each field should be pretty straightforward)

With this view created, we can test the query in psql :

workwithelixir_dev=# select * from searches;
-[ RECORD 1 ]---+------------------
searchable_id  | 2
searchable_type | Job
term            | Backend Engineer
-[ RECORD 2 ]---+------------------
searchable_id  | 2
searchable_type | Job
term            | Acme Co
-[ RECORD 3 ]---+------------------
searchable_id  | 2
searchable_type | Job
term            | Los Angeles, CA

As we can see, the searches view formats each field on a model as an individual record to be searched on. Each of these fields returns a searchable_id and searchable_type , which we can use to look up the model itself.

Let’s add a search query. To do this, we’ll make use of the to_ts* (text search) utilities Postgres provides, for transforming an input string and matching it against the text that we’re searching. Note that at any point in the code, you can test out the Postgres to_tsvector / to_tsquery functions in the psql interface – we’ll breeze through them a bit as the Postgres documentation covers them pretty well.

workwithelixir_dev=# select * from searches where to_tsvector('english', term) @@ to_tsquery('english', 'acme');
-[ RECORD 1 ]---+--------
searchable_id  | 2
searchable_type | Job
term            | Acme Co

In this instance, we can see that passing in the query acme has matched against the Job record with the term “Acme Co”. One thing I realized here is this view doesn’t include the column that is matched. In this case, this record has matched our query with a term on the company field, but we have no way of knowing that. I’d encourage you, if you need that information, to revise the “searches” view we implemented above.

One more thing – if you test the above SQL with the query “acme co”, you’ll notice it fails. This is because Postgres text search queries have interesting behavior around spaces – instead of “acme co”, we need to provide either “acme&co” (match a term with “acme” AND “co”) or “acme|co” (match a term with “acme” OR “co”). You can determine what kind of search you’d like – in my application’s case, I decided to use | , so I could do searches like “acme|los|angeles|engineer” and match multiple terms at once.

At this point, we can begin adding code in our Phoenix project to allow searching directly in the web application.

We need to do two things in our application: first, add code to execute and load models from SQL, and using that code, execute something like the above SQL to receive a set of models based on a query.

I elected to use a solution like the one in this StackOverflow post , in lib/<yourapp>/repo.ex :

defmoduleWorkwithelixir.Repodo
  # ...
 
  defexecute_and_load(sql, params, model) do
    Ecto.Adapters.SQL.query!(__MODULE__, sql, params)
    |> load_into(model)
  end
 
  defpload_into(response, model) do
    Enum.map(response.rows, fnrow ->
      fields = Enum.reduce(Enum.zip(response.columns, row), %{}, fn({key, value}, map) ->
        Map.put(map, key, value)
      end)
      Ecto.Schema.__load__(model, nil, nil, nil, fields,
                          &Ecto.Type.adapter_load(__adapter__, &1, &2))
    end)
  end
end

With that implemented, we can add a search/1 function, which accepts a query and returns a set of Job s matching that term:

defmoduleWorkwithelixir.Jobdo
  # ...
 
  @doc """
  Search for jobs based on a provided term, using Postgres full-text search.
 
  Returns [Job]
  """
  defsearch(term) do
    formatted = term |> String.replace(" ", "|")
    Repo.execute_and_load("select * from jobs where id in (select searchable_id from searches where to_tsvector('english', term) @@ to_tsquery($1));", [ formatted ], Job)
  end
end

The above code is pretty straightforward – first, we take the input term and transform it to use | instead of spaces (remember “acme|co” versus “acme&co”. We then use pretty similar SQL to what we used in the console – we search using the “searches” view for a term, select the searchable_id for all of the returned results, and then, using those ids, look up the actual Job rows using their ids. The execute_and_load function accepts a SQL statement, any variables, and finally, a model to load the data in. This is an Ecto thing that I can best explain as making a strongly-formatted Map of your model fields (like %Job{title: nil, company: nil, location: nil} ), and then taking the SQL output to fill out that map.

With that, we can boot up a iex console and make sure that we get the data we expect:

$ iex -S mix
 
iex(2)> Job.search("backend")
[debug] QUERYOKdb=32.4ms
select * fromjobswhereidin (selectsearchable_idfromsearcheswhereto_tsvector('english', term) @@ to_tsquery($1)); ["backend"]
[%Workwithelixir.Job{__meta__: #Ecto.Schema.Metadata<:loaded, "jobs">,
  id: 2, title: "Backend Engineer" ...>}]

What a neat little way to build powerful text search! I’ll leave the remainder of setting this up, UI-wise, as an exercise to the reader. It could be as simple (as it is in my case) as a controller function that accepts query as a param and passes it to this function.

This guide was built with Elixir 1.2 and Phoenix 1.2.1 – if you’re on a previous (or later) versions of Elixir, Phoenix, or Ecto, this may not work as expected – sorry about that.





About List