My Journey Building Elasticsearch for Retail

If, like me, you’ve taken the journey that is building an Elasticsearch retail project, you’ve inevitably experienced many challenges. Challenges like, how do I index data, use the query API to build facets, page through the results, sorting, and so on? One aspect of optimization that frequently receives too little attention is the correct configuration of search analyzers. Search analyzers define the architecture for a search query. Admittedly, it isn’t straightforward!

The Elasticsearch documentation provides good examples for every kind of query. It explains which query is best for a given scenario. For example, “Phrase Match” queries find matches where the search terms are similar. Or “Multi Match” with “most field” type are “useful when querying multiple fields that contain the same text analyzed in different ways”.

All sound good to me. But how do I know which one to use, based on the search input?

Elasticsearch works like cogs within a Rolex

Where to Begin? Search query examples for Retail

Let’s pretend we have a data feed for an electronics store. I will demonstrate a few different kinds of search inputs. Afterward, I will briefly describe how search should work in each case.

Case #1: Product name.

For example: “MacBook Air

Here we want to have a query that matches both terms in the same field, most likely the title field.

Case #2: A brand name and a product type

For example: “Samsung Smartphone”

In this case, we want each term to match a different field: brand and product type. Additionally, you want to find both terms as a pair. Modifying the query in this way prevents other smartphones or Samsung products from appearing in your result.

Case #3: The specific query that includes attributes or other details

For example: “notebook 16 GB memory”

This one is tricky because you want “notebook” to match the product type, or maybe your category is named such. On the other hand, you want “16 GB” to match the memory attribute field as a unit. The number “16” shouldn’t match some model number or other attribute.

For example: “MacBook Pro 16 inch“ is also in the “notebook” category and has some “GB” of “memory“. To further complicate matters, search texts might not contain the term “memory”, because it’s the attribute name.

As you might guess, there are many more. And we haven’t even considered word composition, synonyms, or typos yet. So how do we build one query that handles all cases?

Know where you come from to know where you’re headed

Preparation

Before striving for a solution, take two steps back and prepare yourself.

Analyze your data

First, take a closer look at the data in question.

  • How do people search on your site?
  • What are the most common query types?
  • Which data fields hold the required content?
  • Which data fields are most relevant?

Of course, it’s best if you already have a site search running and can, at least, collect query data there. If you don’t have a site search ana