Machine Learning (ML) has continued to become more widely applicable in drug discovery over the past decade, and all indications are that this trend will continue. Finding plausible, synthesizable and non-toxic chemical motifs that are novel is one of the most impactful tasks that an automated design system can do.
Drug Discovery & ML
Drug discovery is a complex multi-parameter optimization process where every design-make-test cycle informs the next one. This process is far from linear, as any molecular change will affect the properties for optimization differently. A major challenge in the field has been that general ML-models for key properties, such as solubility or permeability, are only broadly predictive. Potentially more specific ML-models for families of compounds require a substantial amount of data on that family. Application of ML-models is thus difficult in the early phase of a discovery project. This is most pronounced for ML-models of biological activity on novel targets.
Furthermore, most de novo design methods are primarily based on 2D characteristics of molecules and do not utilize what is at the heart of structure-based drug design: the 3D structural information of the ligand-protein complex by employing pharmacophore or docking models.
What has made ML-models more useful recently is its pairing with generative methods. Millions of permutations on a single molecule can be rapidly filtered by ML-models or other property filters. The best surviving molecules are then subjected to another round of permutations and filtered against progressively tougher criteria. These iterations will lead to increasingly diverse molecules fitting the desired criteria. In the absence of ML-activity models, filtering and optimization against the 3D ligand-protein interactions is a powerful way to generate novel molecules expected to be active on the relevant target(s).
While typical problems are commonly solved by expert medicinal and structural chemistry teams, it is often difficult to generate a sufficient number of suitable ideas as quickly as needed. GTD demonstrated the ability to generate a number of new ideas that satisfy all the desired interactions with the protein while staying in the desired a range of other molecular properties overnight. This capability is especially important for teams working on novel targets, or who have many projects to work on simultaneously.
Gilead and GTD
BIOVIA and Gilead Sciences are collaborating to optimize the combined use of 3D structural information on the targeted active site and other forms of structure-activity relationship (SAR) models. We explored both pharmacophore and docking as methods to incorporate information about the target, and found both methods to significantly increase the quality of the produced molecules. We ended up doing most of our designs with pharmacophore models because of their lower computational expense.
GTD’s evolutionary engine incorporates an iterative generate-filter-predict-prune cycle to find molecules meeting the input criteria. For new molecule generation, GTD incorporates an array of methods that cause varying extents of alteration from the input, which can be used together or selectively. These include: Matched Molecular Pair derived transformations, classic medicinal chemistry transformations, individual atomic and bond order changes, ring & chain replacement and others. These different approaches to molecule generation allow GTD to be used in numerous different stages of a project.
Using Gilead’s extensive medicinal chemistry experience and SAR data pertaining to the discovery of spleen tyrosine kinase (SYK) inhibitors Entospletinib and Lanraplenib, several ML-assisted design exercises have been undertaken. First, a retrospective re-identification of drug candidate molecules based on data from an intermediate stage of the project; second, an attempt to morph one chemical lead series into another; third, the generation of molecules that would represent novel hybrids between two vastly different series.
“BIOVIA Generative Design provides ideation for researchers when activity is proven and the other aspects of the target product profile will not budge.”
William Shirley, Exec. Director Structural Biology & Chemistry, Gilead Sciences
In our work with Gilead Sciences on retrospective candidate re-identification and bridging between different lead series GTD generated the desired compounds along with a number of close relatives. In making novel hybrids, minimally, many of the generated ideas were inspirational, pushing the design teams into new avenues. Using 3D-inclusive ML-methods should be a tremendous help when it comes to paving the way to new directions and can only expedite drug discovery.
Our approach enables broad, accelerated compound ideation for many scenarios, ranging from the use of one ligand to larger SAR datasets. The 3D, shape-based aspects can be well represented by various pharmacophoric properties. We are now investigating how docking and scoring used in the learning cycle can optimize the fitness function and produce even better molecules. Stay tuned!
To learn more about BIOVIA Generative Therapeutics Design, download the datasheet!