Sustainable Development for ecommerce site search

What’s your first association when you read “sustainable development”? Perhaps it conjures up some dry country in the Southern Hemisphere with lots of potential for present and future development? Maybe IT startups developing solutions that help reduce CO2 emissions are your thing? Or perhaps a Tesla Model S Plaid+ that “develops” you from 0-100 km/h in a sustainable 2.1 seconds?

© Photo: Johan Eriksson for Dollar Street 2015

My Association with Sustainable Development? Cache!

My Association is Cache. Not Cash. Cache.

These days, everyone’s talking about website performance. The rapid increase in online traffic during the pandemic has contributed to an even greater focus on the topic. And what better way to increase website performance in volatile times than an intelligent caching strategy? The knowledge is not new but bears frequent discussion.

A clever caching design not only increases the performance of your website, but it’s also a smart way to build environmentally friendly and develop sustainable scaling applications.

Before continuing with this post, it will serve you well to familiarize yourself with the following: Latency numbers every programmer should know.

Wow, — that’s impressive. One quick check in the server’s local memory is 1,500,000 times faster than requesting the same information with an HTTP-request over the internet. Not only is it faster, but it’s more efficient and climate-friendly as well.

Tiny little numbers aren’t your thing? The following compares all the stuff with numbers more humanly compatible:

How to Sustainably Develop Ecommerce Infrastructure?

I have often stumbled over the same problem while building e-commerce applications over the last 20 years: website users want to see stuff other users have already seen. This behavior has not changed a tiny bit. Things like a product image, the detailed description of the newest Xphone, a search result page, or in most cases, even the “in stock” status. Now, what’s a developer to do, tasked with delivering this information correctly to the customer?

List of Developer Tasks to Right the World

  • Images: Loading the image from the media database (usually stored in high resolution), scale it to the correct resolution for the customer’s device, and send it over the net.
  • Article texts: Load the texts from the PIM (product information management), where all the marketing people could edit the texts at any time and want to see the latest update online as soon as they hit the save button.
  • In Stock status: Send a request to the ERP system asking for the number of articles still available and determine the “in stock” status based on various information like “already put in some other customers’ baskets,” “already ordered, but the customer has still not finished payment process, and so I do not know exactly whether this one-piece has finally been sold or not” and maybe other fancy stuff.
  • Search Results: Sending the user query for “ihpone” to the search index, which will try to find products that are more or less similar to the user query and — if lucky — return some iPhones or matching accessories

 

Congratulations to the development team. If they followed all the guidelines above, they would have built a rock-solid system that will always show real-time data to the customers. But it will not be sustainable.

Why Your e-commerce Infrastructure Development isn’t Sustainable?

The type of development described above requires servers all over the planet to repeatedly calculate or HTTP-request the same stuff, though not a single kilobyte has changed since the last time they (or some other server) calculated it.

You Need A Strategic Caching Approach

It’s all about the Cache

Let’s look at this practically.

  • If your product images’ source has not changed, there is no need to ask your SaaS image-scaling-service to scale the image to some smaller resolution more than once. It will produce the same output for the first time, the 10th time, or the 1000th time.
  • Suppose your article texts have not been SEO-optimized within the last few minutes, and there has been no other activity connected with this specific product either. Why then should you bother the (maybe distant) database?
  • Or suppose no system has yet registered the status change of a particular article from “in stock” to “unavailable.” As a result, interested parties have yet to receive a notification. Why continue asking the ERP like a three-year-old kid bugging his parents, then?
  • If your domain-specific language hasn’t changed, and as long as customers typing “ihpone” still mean “iPhone,” why should your search engine try finding fuzzy matches all day long? 🤦‍♂️

 

While the first three aspects are quite obvious and implemented widely throughout the eCommerce landscape, the latter is not. But its impact is enormous!

What is the Impact of Poorly Cached Site Search?

Imagine a search index of product texts which can easily contain 1,000,000 different words. If a user searches for any given phrase, the index must, to some extent, compare each input word with each indexed word. As long as we are talking about exact matches (“iPhone” → “iPhone”) or matches explicitly made by some analyzers such as stemmers (“iPhones” → “iPhone”), this should be no concern. But as soon as we are using more sophisticated fuzzy matches, the impact can be huge. Some algorithms are much less efficient in FACT than the ones used by Elastic — some say this is necessary to achieve higher precision.

I, however, adhere to a massively different approach. Imagine you are sure regarding how relevant a specific user input for a particular product text is. In that case, it would be wise to remember this decision (or load all appropriate decisions into memory). This way, you don’t have to calculate it again next time. Let me show you a rough calculation of the effect this has on your server load. To simplify the calculation, I’ll measure all server costs in milliseconds necessary to perform the operation and the resulting CO2 emissions:

Server Load Relative to CO2 Output

User input
Matching Algorithm
Costs (ms)
Result
Costs per Search
CO2 Emissions per search
CO2 Emissions per 1M Searches

ihpone

exact

0.1

unsuccessful

0.1

0.001 mg

1g

ihpone

Levenshtein Distance 1

1

unsuccessful

1

0..1 mg

10g

ihpone

Levenshtein Distance 2

5

iPhone

5

0.05 mg

50 g

ihpone

Sophisticated algorithm

100

iPhone

100

1 mg

1kg

ihpone

Sophisticated algorithm with cached result + exact match

100.1

iPhone

100 + 0.1 x search

decreasing

1.001g

Calculations based on information found here.

How Site Search Server Load Increases Your Shop’s CO2 Footprint

The first time the term “ihpone” is entered into your shop, it’s necessary for your eCommerce application to use a sophisticated algorithm to determine that the user intended to find an “iPhone.” Some search engines use sophisticated (i.e., load-intensive) algorithms by default. Admittedly, they are easy to use. Simply provide enough server power to scale them horizontally, and they will return surprisingly good results.

Mind your ecological footprint

On the other hand, if we take an ecologically strategic approach to server load compared to its strain on the environment, the story looks dramatically different.

For example: How often do you think users’ search intent changes, for an identically misspelled phrase, over, say hours, days, or even years? Although the product changed since its 2007 debut, the “ihpone” typo and its intent have remained stable throughout the last 14 years. How many billions of search requests have been executed within that period requiring search engines to apply more or less sophisticated algorithms forcing server CPUs to produce heat and resulting CO2?

Only the typo’s first appearance needs expensive algorithms to calculate a proper response in an ideal world. After that, every request uses exact (and cheap) matching technologies.

With searchHub, we do our best not only to optimize the search result quality. By making exact search easy and using it frequently, we also reduce eCommerce search’s climate footprint by utilizing sophisticated calculating knowledge only once and reusing it wherever possible.

American Carbon Footprint is relatively high – Townsquare Media

With 56 billion optimized search requests, we project to have saved roughly 30 tons of CO2 emissions within the last 12 months. That’s equivalent to approximately the yearly CO2 amount of four Belgians, or just over one American! True, this is a tiny drop upon the hot rock we all call home, but maybe it inspires you to rethink caching strategies for your product or within your eCommerce shop.

Siegfried Schüle

CEO

Thanks for reaching out!

We’ll be in touch shortly.

Your searchHub Team

searchHub "b" logo.