How-To Solr E-Commerce Search

Solr Ecommerce Search –
A Best Practice Guide — Part 1

How-To Do Solr E-Commerce Search just right? Well, imagine you want to drive to the mountains for a holiday. You take along your husband or wife and your two children (does it get any more stereotypical?) — what kind of car would you take? The two-seater sports car or the station wagon? Easy choice, you say? Well, choosing Solr as your e-commerce search engine is a bit like taking the sports car on the family tour.

Part of the issue is how Solr was originally conceived. Initially, Solr was designed to perform as a full-text search engine for content, not products. Although it has evolved “a little” since then, there are still a few pitfalls that you should avoid.

That said, I’d like to show you some best practices and tips from one of my projects. In the end, I think Solr is good at getting the job done after all. 😉

How to Not Reinvent the Wheel When Optimizing Solr for E-commerce Search

First, don’t reinvent the wheel when integrating basic things like synonyms and boostings on the Lucene level. These can be more easily managed using open-source add-ons like Querqy.
If you want to perform basic tasks such as eliminating specific keywords from consideration, replacing words with alternatives better matching your product data, or simply setting up synonyms and boostings… Querqy does the job with a minimal of effort.
Solr, by default, uses a scoring model called TF/IDF (Term Frequency/Inverse Document Frequency). In short, it scores documents higher with more occurrences of a search term. And lower if fewer documents contain the search term.

For general use cases, how often a search term resides in a text document may be important; for e-commerce search, however, this is most often not the case.

E-Commerce does not concern itself with search term frequency but rather with where, in which field, the search term is found.

How-To Teach Solr to Think Like an E-Commerce Search Manager

To help Solr account for this, simply set the “tie” option for your request handler to 0.0. This will have the positive effect of only considering the best matching field. It will not sum up all fields, which could adversely result in a scenario where the sum of the lower weighted fields is greater than your best matching most important field.

How-To Fix Solr’s Similarity Issues for E-Commerce Search

Secondly, turn off the similarity scoring by setting uq.similarityScore to “off.”

					<float name="tie">0.0</float> <str name="uq.similarityScore">off</str>

This will ensure a more usable scoring for e-commerce scenarios. Moreover, by eliminating similarity scoring, result sorting is more customer-centric and understandable. This more logical sorting results from product name field matches leading to higher scores than matches found in the description texts. Don’t forget to set up your field boostings correctly as well!

Give my previous blog post about search relevancy a read for more advice on what to consider for good scores.

Even with the best scoring and result sorting, the number of items returned can be overwhelming for the user. Especially for generic queries like “smartphone,” “washing machine,” or “tv.”

How-To Do Facets Correctly in Solr

The logical answer to this problem is, of course — faceting.

Enabling your visitors to drill down to their desired products is critical.

While it may be simple to know upfront which facets are relevant to a particular category within a relatively homogenous result-set, the more heterogeneous search results become, the greater the challenge. And, of course, you don’t want to waste CPU power and time for facets that are irrelevant to your current result set, especially if you have hundreds or even thousands of them.

So, wouldn’t it be nice to know which fields Solr should use as facets — before calling it? After all, it’s not THAT easy. You need to take a two-step approach.

For this to work, you have to store all relevant facet field names for a single product in a special field. Let’s call it, e.g., “facet_fields.” It will contain an array of field names, e.g.

Facets For Product 1 (tablet):

					"category", "brand", "price", "rating", "display_size", "weight""category", "brand", "price", "rating", "display_size", "weight"

Facets For Product 2 (freezer):

					"category", "brand", "price", "width", "height", "length", "cooling_volume”

Facets For Product 3 (tv):

					"category", "brand", "price", "display_size", "display_technology", "vesa_wall_mount"

If a specific type, e.g., “televisions,” is searched, you can now make an initial call to Solr with just ONE facet, based on the “facet_fields” field, which will return available facets restricted to the found televisions.

Additionally, it’s possible to significantly reduce overhead by holding off requesting untimely product data at this stage.

It may also be the right time to run a check confirming whether you get back any matches at all or if you ended up on the zero result page.

If that is the case, you can either try the “spellcheck” component of Solr to fix typos in your query or implement our SmartQuery technology to avoid these situations in most cases right from the start.

Now, you use the information collected in the first call to request facets based on “category”, “brand”, “price”, “display_size”, “display_technology” and “vesa_wall_mount”, in the second call to Solr.

How-To Reduce Load with Intelligent Facet-Rules !