The Weizmann Institute of Science Reduces Molecular Search Time from Minutes to Milliseconds—a Case Study

 


The Nancy and Stephen Grand Israel National Center for Personalized Medicine (G-INCPM) at the Weizmann Institute of Science is always looking for new ways to lower the costs and time needed to discover new drugs.
 

One way they are looking to lower drug development time and cost is through virtual screening of drug candidates, where libraries of small molecules are searched to find molecules that are most likely to be biologically active and worth further evaluation. The problem they were struggling with, however, was the time it took them to search their small-molecule libraries.
 

Restrictive Similarity Search Options

In addition to slow molecule similarity search, scalability was an issue for G-INCPM because because of all the indexing required to build a large database using their previous solution.
 

Lack of flexibility in their similarity search solution was another challenge for them. As part of the G-INCPM’s virtual screening process, they initially set the molecule similarity threshold to 0.4 or below. This allows them to build a diverse small-molecule library to serve as the foundation for their virtual screening. Their previous options, however, either limited the threshold to 0.7 and above or were too slow to be of practical use when the threshold was below 0.7.
 

The Path to Scalability and Flexibility

G-INCPM realized that in order to take their drug discovery efforts to the next level, they needed a better solution—the solution they turned to was GSI Technology’s first Associative Processing Unit (APU), named Gemini.
 

Gemini is a custom, compute-in-memory chip that combines high speed SRAM and programmable bit-logic interleaved throughout the memory. It computes functions directly on the data using parallel processing.
 

Gemini allows for a flexible search threshold to be set. For example, G-INCPM was able to set the threshold to below 0.4, with no impact to performance. This allows for a diverse set of molecules to be returned from the initial similarity search.
 

With a diverse set of molecules now in hand, G-INCPM then performs a similarity search on a subset of those molecules (the ones that are determined to be hits through other biological assays). This similarity structure search uses a threshold of around 0.8 because, at this point, they want to find very similar molecules to the hits and to expand the hit space. The assumption here is that structurally similar molecules exhibit similar biological activities. This is one of the steps in building a SAR (StructureActivity-Relationship) table.
 

Gemini also allows G-INCPM to efficiently scale to and explore larger databases. “Because the APU is so fast, it eliminates the need to index the database. This simplifies database management, and it makes adding compounds to the database easy,” explained Dr. Efrat Ben-Zeev, Computational Chemist and Cheminformatics Project Leader, Weizmann Institute of Science. Gemini reduced G-INCPM’s similarity search time from several minutes to a few hundred milliseconds.
 

Gemini also provides the flexibility to work with many different types of fingerprints (molecule representations used in search), such as MDL, ECFP, and FCFP. Additionally, it can work with longer fingerprints (e.g., 8192-bit fingerprints, which are better able to discriminate because they are more descriptive).
 

Looking ahead, G-INCPM is excited about the prospects of using Gemini to do batch searches, interactive searches, and for 3D similarity structure search.
 

The full case study can be found here.

 

Author:  Pat Lasserre