NeuralInfusion™ – a blueprint towards relevant and efficient eCommerce vector retrieval – Part 2

The Ideas behind NeuralInfusion™

Building on the insights and challenges discussed in PART-1 regarding effective and efficient vector retrieval for eCommerce, we developed an entirely new solution that introduces a groundbreaking new capability called NeuralInfusion™. Since the core principles and architectural choices behind this solution are applicable to any search system, we decided to share more details here in PART 2. In short NeuralInfusion™ is centered around three key concepts that set it apart from anything we’ve seen before in how it addresses the identified areas of improvement:

Picture depicting the ideas behind NeuralInfusion from searchHub.io

In this second part, we will shed some light on areas 1 and 2, while area 3 will be addressed in the third post of the blog series.

1. Identifying queries with optimization potential

We already learned in PART-1 that recall in modern search stacks is primarily driven by two methods: keyword-search and vector-search. Each method has its strengths and weaknesses when retrieving documents: keyword-search excels at keyword matching between query and document, while vector-search shines at more broadly inferring the query context. 

Recognizing that hybrid-search-approaches primarily boost recall and precision in specific contexts, we focused on identifying these opportunities.

Recall: To enhance recall, we looked for both clear and subtle indicators of optimization potential. Clear signs include zero-result queries, queries with few results, low engagement, or very low findability. Additionally, we identified a less obvious area: queries where product data errors or inconsistencies hinder keyword-based retrieval systems from returning all relevant items. We further leveraged the concept of query entropy to jointly optimize the recall-set-generation across the different retrieval methods.

Precision: For improving precision, we analyze user data and relevance judgments to find queries with low engagement, poor findability, extensive search bounds, and high exit rates. Moreover, the diversity and appeal of search results are crucial, particularly for branded and generic but also very specific queries. Specific markers or indicators can be used to test and refine these aspects.

Fortunately, our searchHub searchInsights feature already identifies promising queries for improvement, while the Query Experimentation (Query Testing) tool allows us to test various retrieval strategies to optimize their combined performance. This ongoing experimentation helps us determine the most effective approach for each query type over time.

Once we identified queries where we expect some significant uplift potential by increasing recall and/or precision, we had to come up with a strategy to tune / balance it. As a first step towards a more optimal retrieval strategy, we decided to build a model that would allow us to adaptively tune the recall set size from each retrieval mechanism based on the request context. We refer to this as the “Adaptive Precision Reduction Model” or APRM. Its primary goal is to determine when to increase recall and when to hold back. If increasing recall would significantly compromise precision, the model either adjusts or avoids the recall boost altogether. This is especially helpful in zero-result scenarios where a recall expansion might lead to completely irrelevant results.

We are still convinced that avoiding zero-results in all cases is not the best strategy as it hides critical information about the observed gaps in product assortment, user needs vs. the  product assortment on offer. Therefore, precise adjustment and tuning of precision vs. recall is essential but very difficult to achieve with a standard vector search that is based on approximate nearest neighbors algorithms. Search managers, gain total control over precision vs. recall with the Adaptive Precision Reduction Model approach. To retain search platform independence, we decided to decouple the new advanced retrieval stack from the underlying search platform.

2. Decoupling advanced retrieval from the search platform

Search platform independence is central to searchHub’s philosophy. Our customers use a wide variety of search solutions, including vendors like Algolia, FactFinder, Bloomreach, Constructor.io, SolR, Elasticsearch, and OpenSearch. Some rely on search platforms integrated into their commerce platforms, while others build and fine-tune their homegrown search engines using open-source software or custom implementations.

We design our enhancements to be platform-agnostic because we believe our customers should retain control over their core business logic. This encompasses critical components such as stock control, pricing, merchandising strategies — like retailMedia Ads, and personalization. Retaining control over these elements allows retailers to maintain flexibility and innovation, adapting swiftly to market changes, customer preferences, and evolving business goals. And it enables retailers to create unique experiences aligned with their brand and operational strategies, without being limited by the constraints of a particular platform. Recognizing that many of our customers have invested considerable time and resources in customizing their search platforms, we aimed to offer NeuralInfusion™ in a platform-agnostic manner—without requiring them to re-platform. The challenge was finding an efficient way to achieve this.

I’ll admit, we were well into the development of many parts of the system before we had a clear solution for decoupling. It’s necessary to point out how deeply grateful I am, to my development team and our customers. Over the better part of a year, they remained patient as we worked through this challenge. In the end, we developed a solution that is IMHO both simple and elegant. Though, admittedly, it now seems straightforward in hindsight.

The breakthrough came when we realized that direct integration of different technologies was unnecessary. Instead, we focused on integrating the results

whether documents or products. Essentially, every search result listing page (SERP) is just a set of products or documents retrieved from a system in response to a given query, which can then be refined through filters, rankings, and additional layers of personalization.

This led us to find a way to add missing or more relevant documents to the search process and, by extension, to the search results. In this way, we would maintain platform independence while ensuring our customers benefit from cutting-edge neural retrieval capabilities, without the disruption of replatforming.

The high-level blueprint architecture:

Instead of requiring our customers to implement and manage additional systems like a vector retrieval system, we optimized and simplified the retrieval process by dividing it into distinct phases and decoupling them. Rather than applying multiple retrieval methods to every query in real-time, we perform hybrid retrieval asynchronously only for specific cases, such as clustered intents or predefined queries. This is especially true for queries that revealed clear opportunities for improvement, as mentioned above.

By separating the embedding model inference from query execution, we can precompute results and retrieve them through very efficient key-value lookups. At query time, we use the query intent as the key, and link it to a set of documents or product IDs as values, which are then incorporated into the search engine query.

An architectural diagram of the searchHub technology NeuralInfusion
NeuralInfusion architectural diagram.

Fortunately, 9 out of 10 search platforms support combining language queries with structured document or record ID lookups via their query APIs. This enables several interesting specialized optimization use cases like adding results, pinning results and removing results from the initial search results.

Here is a simple integration example for the elasticsearch search platform:
NeuralInfusion Add Products Case
add-case (increase recall)
NeuralInfusion Pinned Products Case
pin-case (improve ranking)

This approach effectively tackles multiple challenges at once:

  • Latency and Efficiency: By decoupling the complex retrieval phases, we enhance speed and responsiveness. This shift transforms expensive real-time look-ups into cost-efficient batch lookups that primarily utilize CPU resources without significantly increasing response times.
  • Business Feature Support: We maintain essential business functionalities that retailers depend on, such as filtering or hiding products, implementing searchandising campaigns, applying KPI-based ranking strategies, and integrating third-party personalization capabilities.
  • Testable Outcomes: The system’s performance can be continuously measured and optimized on both a global scale and at the query/intent level.
  • Transparency and Control: Customers retain full visibility and control over the process and returned results, ensuring they can oversee and manage outcomes effectively.
  • Future-Proof Architecture: This decoupling strategy allows us to maximize potential retrieval strategies. We can integrate or combine any current or future models or architectures into our execution pipelines without requiring our customers to adjust their APIs or hurting their response times.

 

For new or previously unseen queries, the system requires one initial attempt before reaching its full effectiveness. However, this limitation affects only a small portion of queries, as our system already successfully maps over 85% of queries to known search intents. We expect to capture more than 90% of the potential value immediately. We will evaluate the economic impact of this minor limitation in future analyses.

Summary

In this post, we outlined the careful design behind the Infusion component of our NeuralInfusion™ capability. By reframing what was once seen as a capability issue into a straightforward integration process, we’ve created a solution that requires minimal additional investment while preserving high agility for future needs. With this architecture, setup, and integration, our customers can realize 90-95% of the identified optimization potential—all with complete transparency, no added latency, and no increase in system complexity.

In the 3rd and final part of this series, we’ll focus on the “Neural” part of our NeuralInfusion™ capability and how we found unique ways to overcome most challenges related to tokenization and domain adaptation.

Thanks for reaching out!

We’ll be in touch shortly.

Your searchHub Team

searchHub "b" logo.