Molecular Similarity Search: A Simple But Powerful Drug Discovery Tool

Author: Pat Lasserre

According to an of-cited Tufts Center for the Study of Drug Development (CSDD) study, the average cost to develop a new drug is roughly $2.6 billion. Additionally, 90% of new drugs fail to win approval. For those few that do manage to win approval, it still takes at least 10 years to get them to market.

Given those sobering statistics, it’s no surprise that pharmaceutical companies are looking for ways to lower the costs and the time of drug discovery.

This blog post will briefly discuss molecular similarity search and how researchers are using it to help expedite drug discovery, while also lowering the associated costs.

Virtual Screening of Molecules

As previously mentioned, drug development is costly, time-consuming, and has a high failure rate. Thus, researchers are always looking for ways to develop drugs more cost effectively and with a higher probality of success.

One way to help accomplish those goals is by using virtual screening — a computational technique used in drug discovery that searches databases of small molecules to help find leads early in a drug discovery project. Molecules that have the highest probabilities of activity, and thus success, are selected in silico for further, more-detailed study. This helps reduce the number of in vitro experiments — significantly reducing the time and costs of drug discovery.

Finding Molecules with Higher Probability of Activity

Researchers leverage the Similar Property Principle to find molecules that are likely to be active.

The Similar Property Principle states that molecules that are structurally similar to an active molecule are also likely to be active. Thus, finding molecules that are structurally similar to a known active molecule is one of the keys to successful drug discovery.

The structural similarity of two molecules is determined by comparing their molecular fingerprtints.

Molecular Fingerprints

Molecular fingerprints are created by encoding molecular structural fragments into a binary vector of features, where each bit corresponds to the presence of a particular fragment

If two molecular fingerprints have 1’s at the same position, then both molecules have the same fragment, and the more fragments they share, the more similar they are considered.

The Tanimoto Coefficient

The most popular way to measure the similarity of molecular fingerprints is by computing the Tanimoto coefficient.

As seen in the figure below, the Tanimoto coefficient is the ratio of the number of fragment positions shared by the molecules divided by the fragment positions set by either molecule. The Tanimoto score ranges from 0 (no similarity) to 1 (identical molecules).

The Tanimoto coefficient measures molecular similarity. Source: CMBI


Collaborating on Cheminformatics Research

GSI Technology is collaborating with researchers from The Nancy and Stephen Grand Israel National Center for Personalized Medicine at the Weizmann Institute of Science on cheminformatics research.

Initially, the teams are exploring how GSI’s APU technology can be used for ultra-fast molecular structural similarity search, and they have found that the APU can speed up the search by orders of magnitude.


Drug discovery is a costly, time-consuming process that suffers from high failure rates. Thus, researchers are always looking for ways to improve their odds of finding active molecules at the lowest possible cost and with the highest probality of success.

A simple, but powerful, method that researchers are using to accomplish this task is molecular similarity search.

©2024 GSI Technology, Inc. All Rights Reserved