AI in early phase drug development

AI technologies are currently the in the pharmaceutical industry. Over the past year, we have quite extensively covered the impact these intelligent technologies can have on conventional drug discovery and development processes.

We charted how AI/ML technologies came to be a of drug discovery and development, their potential to exponentially scale and this function, their ability to of drug research even in data-scarce specialities like rare diseases and the power of to transform a range of drug discovery and development tasks.

AI technologies can radically remake of the drug discovery and development process, from research to clinical trials. Today we’ll delve a bit deeper into the transformational possibilities of these technologies in two foundational stages — Early Drug Discovery and Preclinical Development — of the drug development process.

Early Drug Discovery and Preclinical Development


Early drug discovery and preclinical development is a complex process that essentially determines the productivity and value of downstream development programs. Therefore, even incremental improvements in accuracy and efficiency during these early stages could dramatically improve the entire drug development value chain.

AI/ML in early drug discovery

The early drug discovery process flows, broadly, across target identification, lead identification, lead optimization and finally, candidate selection. Currently, this time-consuming and resource-intensive process has a on translational approaches and assumptions.

As a result, a new molecular entity (NME) may well enter late-stage development without adequate evidence that it will meet quality or performance expectations. (DTIs)), therefore, is a critical step to enhancing the success rate of new drug discovery.

Predicting drug-target interactions

Today, thanks to high-throughput sequencing and CADD methodologies, there are several publicly available databases, such as , , etc., with organized datasets of sequenced proteins and synthesized compounds. However, the pharmacological functions of most of these proteins and compounds are yet to be identified.

The limitations of conventional biomedical techniques to scale across the volume and complexity of data make ML techniques ideal for drug-target interaction prediction.

There are currently several state-of-the-art ML models available for DTI prediction. However, many conventional ML approaches regard DTI prediction either as a classification or a regression task, both of which can lead to bias and variance errors.

However, novel that balance bias and variance through a multi-task learning framework have been able to deliver superior performance and accuracy over even the state-of-the-art. In the new approach, both tasks are performed with similar feature representations of CNNs with a co-attention mechanism.

Even though ML techniques perform significantly better at DTI prediction than conventional approaches, the development of novel theoretical and computational models for this critical stage of drug development continues to be a significant area of focus.

Lead identification & optimization

This stage focuses on identifying that and optimising these potential molecules. The challenge in this hit-to-lead generation phase is twofold. First, the search space to extract hit molecules from compound libraries extends to millions of molecules.

For instance, a single database like the ZINC database comprises 230 million purchasable compounds and the universe of make-on-demand synthesis compounds can be 10 billion. And second, the of conventional HTS (high-throughput screening) approaches to yield a viable compound is just around 0.1 %.

Over the years, there have been several initiatives to improve the productivity and efficiency of hit-to-lead generation, including the use of (HCS) techniques to complement HTS and improve efficiency and CADD virtual screening methodologies to reduce the number of compounds to be tested.


The availability of huge volumes of high-quality data combined with the ability of AI to parse and learn from this data has the potential to take the .

There are at least — access to new biology, improved or novel chemistry, better success rates, and quicker and cheaper discovery processes — in which AI can add new value to small-molecule drug discovery.

AI technologies can be applied to a variety of discovery contexts and biological targets and can play a critical role in redefining long-standing workflows and many of the challenges of conventional techniques.

AI/ML in preclinical development

Preclinical development addresses a number of critical issues relevant to the success of new drug candidates. Preclinical studies are a regulatory prerequisite to generating that validate the safety of a drug for humans prior to clinical trials.

These studies also inform trial design and provide the pharmacokinetic and pharmacodynamic information that defines . Preclinical data also provides chemistry, manufacturing, and control information that will be crucial for clinical manufacturing.

And finally, it helps pharma companies identify candidates with the broadest potential benefits and the greatest chance of success.

It is estimated that just in preclinical studies actually make it to clinical trials. One reason for this extremely high turnover is the “imperfect nature” of many research models used in preclinical in-vivo studies. As a result, efficacy is often overestimated and safety remains hard to evaluate.

However, AI/ML technologies are increasingly being used to bridge the between preclinical discoveries and new therapeutics. For instance, a key approach to de-risking clinical development has been the use of translational biomarkers that demonstrate target modulation, target engagement, and confirm proof of mechanism (PoM).

In this context, AI techniques have been deployed to learn from large volumes of heterogeneous and high-dimensional omics data and provide valuable insights that streamline translational biomarker discovery. Similarly, ML algorithms that learn from problem-specific training data have been successfully used to accurately predict bioactivity, ADMET related endpoints, and physicochemical properties.

These technologies have also been used to successfully predict drug interactions during preclinical testing.

The age of data-driven drug discovery & development

Network-based approaches that enable a of the mechanisms underlying disease pathophysiology are increasingly becoming the norm in drug discovery and development.

This in turn has opened up a new era of data-driven drug development where the focus is on the integration of heterogeneous of data, including molecular, clinical trial and drug label data.

The is being transformed by AI technologies like NLP that are enabling the identification of novel targets and previously undiscovered drug-disease associations based on insights extracted from unstructured data sources like biomedical literature, EMRs and insurance claims.

Sophisticated and intelligent computational tools combined with powerful ML/AI algorithms now enable the unified analysis of huge volumes of diverse datasets to autonomously reveal complex non-linear relationships that streamline and accelerate drug discovery and development.

Ultimately, the efficiency and productivity of early drug discovery and preclinical development processes will determine the value of the entire pharma R&D value chain. And that’s where AI/ML technologies have been gaining the most traction in recent years.

Register for future blogs

Originally published at .



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
BioStrand (a subsidiary of IPA)

Software and proprietary solutions for MULTI-omics data analysis. Effective research requires convenient and scalable tools.