Dev diary - Rewriting Autocomplete Day 6

Datetime:2016-08-23 02:11:50          Topic: Elastic Search  Lucene  Java           Share

Day 6

This day is mostly about trying to decide on a technical stack, and the approach we wanted to take to make sure that we select the best out of several options.

May I present you... the Stack !

Following the meeting mentioned by Jason on Day 5, we came to the conclusion that several technical approaches were possible. Since this is a greenfield project, the sky's the limit!

We went through the technical requirements at a very high level and came up with the following 4 components:

  • Clients - they need to be able to get some json containing the autocomplete data to display
  • Service API - the messenger in our stack, that will parse data and expose it to our clients
  • Data store - the data needs to live somewhere, and be ready to be parsed at any time. It comes with a way of processing the raw data before inserting it in the store.
  • Monitoring/reporting/analytics tools - external tools that will allow us to learn from our experiments, capturing KPIs and drive the next few steps of the product.

All that was left is to spike it!

Spikes, you said ?

Yes, spikes (not a group of blonde vampires, no. Nice try.): we have several options available when it comes to the API/Data Store - ultimately, we wanted to make sure that we choose one we're all comfortable with, but also that it meets our acceptance criteria properly.The main one being - obviously - performance, since any kind of autocomplete has to be very, very responsive. We've established a maximum of 150ms for any response under load.

Being a majority of .NET developers we obviously thought about a Nancy Web API first. This is the part I'm currently trying out.

We also considered Node.js, Java and Python. Java was particularly interesting since it works best with Lucene (being built in Java). Node.js is being tested by a fellow colleague, and we dismissed Python since we have absolutely no knowledge of it and we still need to be able to deliver quality software in a decent period of time.

Our architect already had a Java attempt at the time of the meeting.

For the Data store, we have worked with Elastic Search for our new Search Engine. We simply find it amazing, and it certainly has a huge potential when it comes to autocomplete.

So obviously that was one of the choices. Another one was the system Elastic Search is built on, Lucene.

Lucene isn't exactly a Data store but it comes with search/indexing features that also meet our needs perfectly well.

And, lastly, we have to give good old SQL a chance. We're not really fond of it from the looks of it but SQL is still a very decent option, even though it doesn't benefit from a proper indexing/search engine like Lucene/Elastic do.

Early conclusions of the experiments

On my end, so far I've tried a Nancy Web API using our Hotel data stored in Elastic Search, suggesting names of hotels.

It looks promising, with a very basic setup (local dev machine, albeit a good one + a single node Elastic Search cluster) it averages 50ms in response time parsing over 150k documents (hotels), all that being under load of 10k requests made in a few seconds.

The node version (still with Elastic Search, but another kind of data) also looks promising, since its average response time tends to be the same as the Nancy API.





About List