GSI Technology

Addressing the Memory Constraint of Billion-Scale Similarity Search

Author: Pat Lasserre

We have previously written several blog posts about approximate nearest neighbor (ANN) similarity search, with a particular focus on the billion-scale similarity search problem that many large e-commerce and social media companies are faced with.

The billion-scale issue stems from the fact that large e-commerce and social media companies such as eBay, Facebook, and Pinterest need to quickly and efficiently surface the most relevant recommendations (candidates) during the candidate generation stage of their similarity search pipeline, but they need to do so in a way that scales with their vast amount of data.

For example, in a presentation given last year, Pinterest explained that during their candidate generation stage, they need to quickly funnel down from billions of images in their visual search pipeline to the most relevant 10,000, or so, images when providing recommendations to their users.

In a blog post, Facebook described a similar challenge to Pinterest’s. Figure 1 below shows how Facebook needs to funnel down from billions of candidates to a much smaller number of relevant candidates. In Facebook’s example, they are looking to provide the most relevant recommendations to Instagram users.

Figure 1. Candidate Generation (Source: Facebook)

Another example is eBay who mentioned in a post that their “computer vision algorithm sifts through more than half a billion images and eBay’s 1.4 billion listings to find and show you the most relevant listings that are visually similar.”

In all of these examples, ANN similarity search is used in the candidate generation stage to quickly funnel down from billions of recommendations (candidates) to a much smaller number of relevant candidates.

The items to be searched against in these billion-scale databases are represented by high-dimensional feature vectors, which means that they require significant computation and memory resources. One way to address the computation and memory constraints is by compressing the high-dimensional feature vectors.

In this blog post, we will briefly introduce a model that compresses the feature vectors by learning to map a large-scale dataset in the original feature representation space to binary hash codes in the hamming space. This allows for fast and efficient approximate nearest-neighbor similarity search and helps address the memory constraint by providing a compressed representation of the original feature vectors.

Model Overview

Our model learns a binary hash that preserves local similarity by using a novel combination of a loss function and sampling method.

The model is based on a simple neural network and a training scheme that aims to preserve the locality relations between the original data points. It does this by optimizing the hamming-based similarity distribution relation to the original space cosine-based similarity distribution.

We introduce a loss function that translates the relational similarities in the original cosine space and the new hamming space to probability distributions — and optimizes the KL divergence between them. This results in distance preservation of the original cosine space in the new hamming space.

We also introduce a simple data sampling method by representing the database with randomly generated proxies, used as reference points to query points from the training set.

Figure 2 below shows the training model.

The Results

Experimenting with three publicly available standard ANN benchmarks (Sift1M, Deep1M, and BigANN1M, where Deep1M, and BigANN1M consist of the first million vectors of Deep1B, and BigANN respectively), our model provides significant improvement over other binary hashing methods — achieving an improvement of between 7% and 17%.

Unlike other methods, which report improvement in accuracy only over small code sizes (up to 128 bits), we show high performance in both low (64 bits) and high (768 bits) dimensional bit representation, offering increased accuracy when more resources are available and flexibility in choice of ANN strategy. No pre-computations or graphs are required for the training process.

To get the details about the model, please refer to the full paper, Hamming Space Locality Preserving Neural Hashing for Similarity Search, or attend the virtual BayLearn 2020 Bay Area Machine Learning Symposium where the author of the paper will give a virtual poster presentation.

High Performance, High Reliability, Low Power

Intelligence, Accelerated

Data to Knowledge to Insight to Action

Accelerating LLM & VLM Inference

Massive Search. Minimal Latency.

High performance compute-in-memory

Bringing real-time datacenter SAR processing to mobile applications

Next Gen Navigation

Can’t hide from the APU

Safer. Smarter. More Aware.

Safety and Security

Technology that benefits human health

Efficiently processing big data for A&D applications

Bringing real-time datacenter SAR processing to mobile applications

APU finds the needle in the haystack

Next Gen Navigation

Advanced threat detection

From stringent space-grade memory to AI acceleration

From stringent space-grade memory to AI acceleration

The highest performance, highest density monolithic SRAMs

From stringent space-grade memory to AI acceleration

From stringent space-grade memory to AI acceleration

Visionary thought leadership

AI’s Energy Dilemma

Accelerating AI Through Partnerships

Create, maintain, debug

Addressing the Memory Constraint of Billion-Scale Similarity Search

Model Overview

The Results

Company Info

AI & Compute

Aerospace & Defense

Memory

Quick Links

FOLLOW US