searchHub Logo Header

Quick-Start with OCSS – Creating a Silver Bullet

Last week, I took pains to share with you my experience building Elasticsearch Product Search Queries. I explained there is no silver bullet. And if you want excellence, you’ll have to build it. And that’s tough. Today, I want to show how our OCSS Quick-Start endeavors to do just that. So, here you have it: a Quick-Start framework to ensure Elasticsearch Product Search performs at an exceptional level, as it ought.

How-To Quick-Start with OCSS

Do you have some data you can get your hands on? Let’s begin by indexing and try working with it. To quickly start with OCSS, you need docker-compose. Find the “operations” folder of the project, at a minimum, and run docker-compose up inside the “docker-compose” folder. It might also be necessary to run docker-compose restart indexer since it will fail to set up properly if the Elasticsearch container is not ready at the start.

You’ll find a script to index CSV data into OCSS in the “operations” folder. Run it without parameters to view all options. Now, use this script to push your data into Elasticsearch. With the “preset” profile in the docker-compose setup active by default, data fields like “EAN,” “title,” “brand,” “description,” and “price” are indexed respectively for search and facet usage. Have a look at the “preset” configuration if more fields need to be indexed for search or facetting.

Configure Query Relaxation

True to the OCSS Quick-Start philosophy, the “preset” configuration already comes with various query stages. Let’s take a look at it; afterward, you should be able to configure your own query logic.

How to configure “EAN-search” and “art-nr-search”

The first two query configurations “EAN-search” and “art-nr-search” are very similar:
				
					ocs:
  default-tenant-config:
    query-configuration:
      ean-search:
        strategy: "ConfigurableQuery"          1️⃣
        condition:                             2️⃣
          matchingRegex: "\\s*\\d{13}\\s*(\\s+\\d{13})*"
          maxTermCount: 42
        settings:                              3️⃣
          operator: "OR"
          tieBreaker: 0
          analyzer: "whitespace"
          allowParallelSpellcheck: false
          acceptNoResult: true
        weightedFields:                        4️⃣
          "[ean]": 1
      art-nr-search:
        strategy: "ConfigurableQuery"          1️⃣
        condition:                             2️⃣
          matchingRegex: "\\s*(\\d+\\w?\\d+\\s*)+"
          maxTermCount: 42
        settings:                              3️⃣
          operator: "OR"
          tieBreaker: 0
          analyzer: "whitespace"
          allowParallelSpellcheck: false
          acceptNoResult: true
        weightedFields:                        4️⃣
          "[artNr]": 2
          "[masterNr]": 1.5
				
			

1️⃣ OCSS distinguishes between several query strategies. The “ConfigurableQuery” is the most flexible and exposes several Elasticsearch query options (more to come). See further query strategies below.

2️⃣ The condition clause configures when to use a query. These two conditions (“matchingRegex” and “maxTermCount“) specify that a specific regular expression must match the user input. These are then used for a maximum of 42 terms. (A user query is split by whitespaces into separate “terms” in order to verify this condition).

3️⃣ The “settings” govern how the query is built and how it should be used. These settings are documented in the QueryBuildingSettings. Not all settings are supported by all strategies, and some are still missing – this is subject to change. The “acceptNoResult” is essential here because if a numeric string does not match the relevant fields, no other query is sent to Elasticsearch, and no results are returned to the client.

4️⃣ Use the “weightedFields” property to specify which fields should be searched with a given query. Non-existent fields will be ignored with a minor warning in the logs.

How to configure “default-query” the OCSS Quick-Start way

Next, the “default-query” is available to catch most queries:

				
					ocs:
  default-tenant-config:
    query-configuration:
      default-query:
        strategy: "ConfigurableQuery"
        condition:                            1️⃣
          minTermCount: 1
          maxTermCount: 10
        settings:
          operator: "AND"
          tieBreaker: 0.7
          multimatch_type: "CROSS_FIELDS"
          analyzer: "standard"                2️⃣
          isQueryWithShingles: true           3️⃣
          allowParallelSpellcheck: false      4️⃣
        weightedFields:
          "[title]": 3
          "[title.standard]": 2.5             5️⃣
          "[brand]": 2
          "[brand.standard]": 1.5
          "[category]": 2
          "[category.standard]": 1.7
				
			

1️⃣ “Condition” is used for all queries with up to 10 terms. This is an arbitrary limit and can, naturally, be increased – depending on users’ search patterns.

2️⃣ “Analyzer” uses the “standard” analyzer on search terms. This means it applies stemming and stopwords. These analyzed terms are then searched within the various fields and subfields (see point #5 below). Simultaneously, the “quote analyzer” is set to “whitespace” to match search phrases exactly.

3️⃣ The option “isQueryWithShingles” is a unique feature we implemented into OCSS. It combines neighboring terms and searches, combined with their individual variations, but set at nearly double the weight. The goal is to find compound words in the data as well.

Example: “living room lamp” will result in “(living room lamp) OR (livingroom^2 lamp)^0.9 OR (living roomlamp^2)^0.9”.

4️⃣ “allowParallelSpellcheck” is set to false here because this requires extra time, which we don’t want to waste in most cases wherever users pick the correct spelling. If enabled, a parallel “suggest query” is sent to Elasticsearch. If the first try yields no results and it’s possible to correct some terms, the same query will be fired again using the corrected words.

5️⃣ As you can see here, subfields can be uniquely applied congruent to their function.

How to configure additional query strategies

I will not go into any great detail regarding the following query stages configured within the “preset” configuration. They are all quite similar — here just a few notes concerning additionally available query strategies.

  • DefaultQueryBuilder: This query tries to balance precision and recall using a minShouldMatch value of 80% and automatic fuzziness. Use if you don’t have the time to configure a unique default query.
  • PredictionQuery: This is a special implementation that necessitates a blog post all its own. Simply put, this query performs an initial query against Elasticsearch to determine which terms match well. The final query is built based on the returned data. As a result, it might selectively remove terms that would, otherwise, lead to 0 results. Other optimizations are also performed, including shingle creation and spell correction. It’s most suitable for multi-term requests.
  • NgramQueryBuilder: This query builder divides the input terms into short chunks and searches them within the analyzed fields in the same manner. In this way, even partial matches can return results. This is a very sloppy approach to search and should only be used as a last resort to ensure products are shown instead of a no-results page.

How to configure my own query handling

Now, use the “application.search-service.yml” to configure your own query handling:
				
					ocs:
  tenant-config:
    your-index-name:
      query-configuration:
        your-first-query:
          strategy: "ConfigurableQuery"
          condition:
            # ..
          settings:
            #...
          weightedFields:
            #...
				
			

As you can see, we are trying our best to give you a quick-start with OCSS. It already comes pre-packed with excellent queries, preset configurations, and the ability to use query relaxation without touching a single line of code. And that’s pretty sick! I’m looking forward to increasing the power behind the configuration and leveraging all Elasticsearch options.

Stay tuned for more insights into OCSS.

And if you haven’t noticed already, all the code is freely available. Don’t hesitate to get your hands dirty! We appreciate Pull Requests! 😀

Thanks for reaching out!

We’ll be in touch shortly.

Your searchHub Team