Neuralinfusion™ — A Blueprint Towards Relevant And Efficient Ecommerce Vector Retrieval — Part 3

Understanding Current Embedding Models And Retrieval Stacks

After fully decoupling our customers’ search platforms from the NeuralInfusion™ retrieval stack (see PART-2 of this blog series) and identifying the queries we aim to enhance, we were still confronted with the challenges outlined in PART-1 of this blog series: tokenization and vocabulary limitations, domain adaptation, multilingual understanding, and optimizing hybrid result combinations.

  • Domain Adaptation: Most models require specialized training to align with our specific use case and domain. While numerous embedding models exist, only a few are designed or fine-tuned for retail and ecommerce applications.
  • Multilingual Support: With 92% of our customers operating also in non-English-speaking regions, it is essential that models provide effective multilingual capabilities.
  • Tokenization: Tokenization and corpus size remain critical constraints for modern embedding models. Many of the models we analyzed impose corpus size limitations for efficiency, defaulting to substring tokenization when an exact token or string is missing. Our testing revealed that while vector retrieval occasionally yields good results, it frequently misinterprets query intent.

Most existing solutions … neglect core issues such as retrieval quality.

To tackle these challenges, our first step was to identify a robust vector retrieval stack for evaluation purposes. We evaluated various options, hybrid search systems, and API vendors to optimize document and document ID retrieval during the initial retrieval stage. However, we were surprised to find that most existing solutions focused primarily on embedding, indexing, and retrieval—while neglecting core issues such as retrieval quality.

Selecting A Retrieval Stack For NeuralInfusion

The only vendor showing substantial promise from the start was Marqo.ai with its open-source vector retrieval solution. Despite their focus on multi-modal embedding, indexing, and retrieval—utilizing Vespa.ai as their backend retrieval engine—Marqo.ai’s emphasis on multi-modal retrieval and their thoughtful approach to UX/UI challenges and fine-tuning solutions (Generalized Contrastive Learning) made them a strong candidate.
Our internal benchmarks indicated that Vespa.ai excelled as an ideal backend retrieval engine, meeting our key criteria for performance and flexibility. Vespa offers extensive configurability, including custom data structures and choices between memory and disk storage, and supports multiphase ranking for optimizing search results.
Convinced that Vespa.ai was the most promising backend retrieval engine and that Marqo.ai provided the ideal tooling around it, we decided to utilize Marqo.ai to power the vector-based retrieval component of our new NeuralInfusion™ capability.
During production-data testing, we observed that while Marqo.ai’s multi-modal retrieval performs strongly, there are opportunities for further enhancement in three key areas: Domain Adaptation, Multi-Language Support, and Tokenization. Marqo.ai offers a paid service called Marqtune for domain adaptation aka fine-tuning, which helps address some of these challenges. However, it requires huge amounts of high-quality training data, GPU resources, and, in most cases, customer-specific fine-tuning. As a result, this approach is particularly well-suited for larger customers, but may present additional overhead for smaller-scale implementations.

  1. When we conduct benchmarks, we do so with a specific use case in mind—not for generic evaluation. For example, in NeuralInfusion, our primary focus is on peak indexing performance and achieving optimal, consistent recall, since real-time retrieval isn’t required. Instead, all vector retrieval is performed in scheduled batches. As a result, the outcomes of our benchmarks may differ significantly from what you might observe in your use case.
  2. For us, the retrieval stack is purely infrastructure—a fundamental tool that enables us to implement various use cases while balancing financial and human resource constraints. This means our technology choices are driven by practical requirements and constraints, rather than marketing-driven benchmark comparisons.

Understanding The Impact Of Embedding Models On Your Retrieval Quality

Now that we’ve selected a retrieval stack (the engine), it’s time to focus on the data being indexed and identify key areas for improving the use of embedding models (the fuel).
We’ve already discussed the fundamental challenges of domain adaptation, multilingual support, and tokenization. These issues arise from the embedding model’s limitations in understanding and confidence—whether due to unknown tokens, insufficient statistical representation in the training corpus, or not enough context in the query itself. As a result, the model may struggle to properly position and differentiate words, phrases, and products within the embedding space.
In the retrieval process, the search query is also embedded into the vector space, and these vector queries are matched against documents in the embedding space. Generally, the input query is processed by the same machine learning model that created the indexed embeddings, producing a vector within the same space. Since similar vectors cluster together—the foundation of dense retrieval—finding relevant results becomes a matter of identifying the vectors closest to the query vector and retrieving the associated documents.

Image illustrating a Simple embedding model for color representations
Simple embedding model for color representations

The quality of the retrieved results predominantly depends on two factors (not accounting for the quality of the underlying training data-set):

  1. The quality of document and query embeddings—which can be improved through fine-tuning.
  2. The effectiveness of the chosen retrieval algorithm, whether Approximate Nearest Neighbors (ANN) for efficiency or K-Nearest Neighbors (KNN) for exact matching.

 

While this might seem obvious, we frequently see teams focusing exclusively on the technical aspects of retrieval algorithms—without giving enough attention to the quality of indexed data and embeddings, which are just as crucial for improving search quality.
In such scenarios, the go-to-solution for most people is to throw in some fine-tuning. The primary objective of fine-tuning multi-modal embedding models is to improve the alignment and relevance between different data modalities (e.g., text and images) within a specific domain or task.

A silhouette of a pre-trained vs. a fine-tuned multi-modal CLIP model.
Silhouette of a pre-trained vs. a fine-tuned multi-modal CLIP model

Selecting An Embedding Model For Neuralinfusion

Given the three major challenges outlined in the beginning, we strongly believe that multi-modal embeddings are the most efficient approach to addressing the fundamental obstacles in ecommerce retrieval. To evaluate embedding quality, we selected the following two models:

  • open_clip/ViT-L-14/laion2b_s32b_b82k Link
  • Marqo-Ecommerce-B (fine-tuned model by marqo.ai) Link

As previously discussed, dense retrieval is built on the principle that similar documents should cluster together in vector space. Since our primary goal is relevant retrieval, our first step is to assess how effectively the embedding model utilizes indexed documents—ensuring they are positioned in vector space in a way that accurately reflects and differentiates their semantic meaning.

The Effect Of Document Embedding Quality On Retrieval

For the following example, we embedded a corpus of 50,000 fashion products using both embedding models, assigned each product the taxonomy leaf category that best represents its product type, and visualized the results using t-SNE.

Illustration of the open_clip/ViT-L-14/laion2b_s32b_b82k embedding model.
embedding silhouette with
open_clip/ViT-L-14/laion2b_s32b_b82k
Illustration of the Marqo-ecommerce-B embedding
embedding silhouette with
Marqo-Ecommerce-B

This type of embedding visualization is highly valuable, as it allows for the quick identification of areas with significant semantic overlap (e.g., coats vs. jackets or shirts vs. jackets and some shirts vs. polo shirts). As seen in the two figures, even a highly fine-tuned embedding model struggles to generate embeddings that form dense clusters with minimal semantic (taxonomic) overlap, although it significantly improved the situation.

In terms of retrieval performance, let’s compare the results of the two multi-modal embedding models with the one already measured in PART-1 of this blog series:

Illustration of the query recall analysis
Query recall analysis
Illustration of a Precision@12 analysis with an average improvement of 11 percent.
Precision@12 analysis, with an average improvement of 11 percent.

Even though others claim higher numbers, we saw limited impact of fine-tuning, leading up to an 11% improvement in the Precision@12 score. On the other hand, it definitely enhances the embedding model’s ability to form denser clusters with minimal semantic overlap. However, achieving these gains requires a substantial amount of high-quality training data and incurs significant GPU computational costs, making it crucial to carefully assess the trade-offs from a business perspective.

The Effect Of Query Embedding Quality On Retrieval

Now that we detailed the impact of the embedding model quality for the portion we control (the documents), let’s also confirm what this means for the query side of things. This is necessary because the search query is embedded into the same vector space. As this is the case, the effect of the improved embedding model quality allows the top-k closest documents to be returned. A very simple description of the search process would be to think of using the embedded query as a kind of entry point into the embedding space. Again, it’s beneficial to visualize the process to understand the implication of each component:

Finding the best possible entry point into the embedded document space during query embedding is crucial. Otherwise, retrieval may return only the top-k closest documents—which, despite their proximity in vector space, might be completely irrelevant to the query.

Illustration of the model Marqo-Ecommerce-B embedding of the query “polo shirt”
Marqo-Ecommerce-B embedding of the query “polo shirt”

Since queries and documents are embedded using the same model, the primary limitation isn’t the document embeddings but rather the information richness (context) of the query. Our analysis revealed that the most significant opportunity for improving relevant vector retrieval doesn’t lie solely in the embedding model itself but in query enrichment—enhancing the user query to better capture intent, allowing the embedding model to interpret it more effectively. As can be seen, over 85% of all valid queries have equal or less than 3 tokens.

Illustration of the cumulative share of information across our known queries.
cumulative share of information across our known queries

The hypothesis formed was: What if we could improve retrieval by 30-40% simply by enhancing the query embedding, without any fine-tuning? Leveraging our deep expertise in optimizing user search queries, we explored an innovative approach: mapping user queries into an intermediate intent space. This intent-mapping technique effectively addresses domain adaptation, multi-language support, and tokenization, while remaining fully compatible with fine-tuning strategies.

This innovative intent-mapping technique effectively addresses domain adaptation, multi-language support, and tokenization, while remaining fully compatible with fine-tuning strategies.

Intent Mapping

The core idea behind intent mapping is simple — instead of directly embedding the user query, we first map it into an intermediate intent space, thereby significantly enriching the query’s information. Then we send it to the retrieval system that retrieves the closest documents from the customer’s indexed document space. You can think of the query-intent model as a foundational tool that helps pinpoint a more precise entry point into the product embedding space during retrieval.

The main objective of intent mapping is to find a better entry point into the embedded document space and by doing so, maximize the amount of relevant documents during retrieval.

Embedding model illustration using a Polo Shirt, Vector cloud, and customer keywords to illustrate.

Please understand, we cannot disclose the full details of how we are building and enhancing the intent-space model at this time, as we are exploring the possibility of patenting the process or parts of it. However, we’re still eager to share the underlying concept together with some example use cases.

Take, for example, one of the most common queries across our customer base: “trousers”. On its own, this single-word query offers little contextual information to refine retrieval—right?
Not necessarily. With intent-space mapping and a multi-modal model, we can enrich the query dynamically. For instance, we can integrate recent user behavior, such as trending or highly engaged items from the past 1–2 weeks, simply by appending this data to the query. This allows for more context-aware retrieval, improving relevance without requiring any changes to the underlying embedding model.

Benchmarking Neuralinfusion’s Vector Retrieval

As always, if you come up with an entirely different approach compared to market trends, you are naturally curious how your approach performs against the current state and to which extent it varies. Therefore, we compared the performance against what is currently considered state-of-the-art, mentioned in PART-1 of this blog series, and also included a state-of-the-art multi-modal-model (Marqo-Ecommerce-B) specifically fine-tuned for ecommerce data.

Illustration of a Recall Analysis comparing query type optimization methods with Marqo models.
Illustration of a query precision@12 analysis, avg. improvement up to 25 percent.

Results:

Our standardized precision-versus-recall analysis confirmed that our “intent mapping” approach significantly outperforms all the evaluated state-of-the-art methods. Although recall is generally robust across existing approaches, combining Intent Mapping with a fine-tuned multi-modal model increased precision on average more than 25%. Notably, it even surpassed all other models when paired with the non–fine-tuned CLIP model—a result that we found particularly impressive.
Keeping Twyman’s law in mind—that when something looks too good to be true, it often isn’t—we also conducted extensive manual evaluations with our customers. Once the two versions were compared side by side, the boost in confidence was unmistakable. To illustrate this improvement, we have selected several impressive examples for you to review.
Keep in mind that these examples were performed without any additional re-ranking applied. This means there are no incorporated re-rankers, user-feedback or any other sort of business rules applied, however these could easily be inserted on top in a follow-up ranking stage.

Some hand-picked examples

Pure Marqo-Ecommerce-B

query: sensorarmatur

case 1: assortment offers no perfect match but very close alternatives
case 1: assortment offers no perfect match but very close alternatives

Intent Mapping + Marqo-Ecommerce-B

query: sensorarmatur

Intent mapping + Marqo-Ecommerce-B case 1: assortment offers no perfect match but very close alternatives
case 1: assortment offers no perfect match but very close alternatives

Pure Marqo-Ecommerce-B

query: wand styropor leiste

Intent mapping + Marqo-Ecommerce-B case 2: assortment offers more perfect matches
case 2: assortment offers more perfect matches

Intent Mapping + Marqo-Ecommerce-B

query: wand styropor leiste

Intent mapping + Marqo-Ecommerce-B case 2: assortment offers more perfect matches query wand styropor leiste
case 2: assortment offers more perfect matches

Pure Marqo-Ecommerce-B

query: yatak odası dolapları

Pure Marqo-Ecommerce-B case 3: foreign language, tokens not included in the model vocabulary
case 3: foreign language, tokens not included in the model vocabulary

Intent Mapping + Marqo-Ecommerce-B

query: yatak odası dolapları

Intent mapping + Marqo-Ecommerce-B query: yatak odası dolapları case 3: foreign language, tokens not included in the model vocabulary
case 3: foreign language, tokens not included in the model vocabulary

Pure Marqo-Ecommerce-B

query: bauplatten für feuchträume

Pure Marqo-Ecommerce-B, case 4 - query bauplatten für feuchträume
case 4: a query that needs some serious semantic understanding

Intent Mapping + Marqo-Ecommerce-B

query: bauplatten für feuchträume

Intent Mapping + Marqo-Ecommerce-B, case 4 - query bauplatten für feuchträume
case 4: a query that needs some serious semantic understanding

Pure Marqo-Ecommerce-B

query: sensorarmatur

case 1: assortment offers no perfect match but very close alternatives
case 1: assortment offers no perfect match but very close alternatives

Intent Mapping + Marqo-Ecommerce-B

query: sensorarmatur

Intent mapping + Marqo-Ecommerce-B case 1: assortment offers no perfect match but very close alternatives
case 1: assortment offers no perfect match but very close alternatives

Pure Marqo-Ecommerce-B

query: wand styropor leiste

Intent mapping + Marqo-Ecommerce-B case 2: assortment offers more perfect matches
case 2: assortment offers more perfect matches

Intent Mapping + Marqo-Ecommerce-B

query: wand styropor leiste

Intent mapping + Marqo-Ecommerce-B case 2: assortment offers more perfect matches query wand styropor leiste
case 2: assortment offers more perfect matches

Pure Marqo-Ecommerce-B

query: yatak odası dolapları

Pure Marqo-Ecommerce-B case 3: foreign language, tokens not included in the model vocabulary
case 3: foreign language, tokens not included in the model vocabulary

Intent Mapping + Marqo-Ecommerce-B

query: yatak odası dolapları

Intent mapping + Marqo-Ecommerce-B query: yatak odası dolapları case 3: foreign language, tokens not included in the model vocabulary
case 3: foreign language, tokens not included in the model vocabulary

Pure Marqo-Ecommerce-B

query: bauplatten für feuchträume

Pure Marqo-Ecommerce-B, case 4 - query bauplatten für feuchträume
case 4: a query that needs some serious semantic understanding

Intent Mapping + Marqo-Ecommerce-B

query: bauplatten für feuchträume

Intent Mapping + Marqo-Ecommerce-B, case 4 - query bauplatten für feuchträume
case 4: a query that needs some serious semantic understanding

Recap

As this is the 3rd and final part of this blog series about NeuralInfusion, it’s only appropriate to review our main objectives laid out when we began work on this capability:

  1. To identify the most challenging problems of product retrieval in ecommerce and retail, which we established in PART-1.
  2. To come up with an architecture to address these challenges, irrespective of the underlying search platform and technologies used. This architecture must be orders of magnitude cheaper compared to solutions considered to be state-of-the-art, while also simple to integrate and operate as described in PART-2.
  3. To address the challenges of search relevance within the scope of dense retrieval. To this end, we introduced a new method to augment the user-query called “Intent Mapping”. It by far surpassed all expectations.

Why we think it matters

A picture is often worth more than 1000 words. And so, we would like to finish off this series with one:

NeuralInfusion blueprint - using a smart thin-client (smartQuery) in the same environment as your search engine which infuses pre-inferred docs into your search results (here as an example with elasticsearch as underlying search engine)

Rather than at query-time combining myriad retrieval methods for all queries all the time, we instead run asynchronously and only for clustered intents OR predefined queries. This method applies to scenarios like, zero-result-queries, underperforming queries, or simply queries with few results, that show adequate room for improvement.
Structuring our retrieval method in this way, we decouple the online embedding & inference from query runtime. This is made possible because the results are already precalculated and can be served by extremely efficient K/V lookups in our thin-client called smartQuery. This eliminates the need for slow, blocking, or expensive inference—and only needs to be applied to a single query within a search intent cluster (with practically no added latency < 2ms @p99 avg response times).

This setup allows you to leverage the latest model improvements without the need to deploy, manage and expose them publicly. At the same time, seamlessly integrating model benefits into your existing search engine preserving all business rules, including filtering, sorting, as well as re-ranking. We have already demonstrated that NeuralInfusion is fully supported through native APIs by (elasticsearch, opensearch, solr, typesense, bloomreach).

This means that you can significantly enhance the efficiency of your current retrieval system without needing to replace or build any sort of side-car solution. Secondly, it provides unparalleled flexibility by allowing the curated results to be adjusted whenever required. This is particularly true when certain models’ relevancy performance is subpar. In such cases, it enables the seamless inclusion of supplementary models, either by using the entire query set or only a portion of it — a typical use case in the instance of visual merchandising activities.

Closing words

Although there is still much to achieve in terms of relevant and efficient retrieval and product discovery in ecommerce, our team—and our customers’ teams—remain dedicated to advancing what’s possible in this space.
We believe this blueprint is immensely valuable for search teams worldwide, helping them navigate resource constraints while maintaining ownership and continuously enhancing their retrieval stack.

Vielen Dank!

Dein Download steht unten bereit.

Wir würden uns freuen,
mit Dir bald in Kontakt zu treten.

Dein searchHub-Team

Thanks for reaching out!

We’ll be in touch shortly.

Your searchHub Team

searchHub "b" logo.