Creating an AI-ready data foundation for successful AI-enabled drug discovery

5 min readMay 4, 2023

Over the past year, we have looked at drug discovery and development from several different perspectives.

For instance, we looked at the Big Data frenzy in biopharma, as zettabytes of sequencing, real-world and textual data (RWD) pile up and stress the data integration and analytic capabilities of conventional solutions. We also discussed how the time-consuming, cost-intensive, low productivity characteristics of the prevalent ROI-focused model of development have an adverse impact not just on commercial viability in the pharma industry but on the entire healthcare ecosystem. Then we saw how antibody drug discovery processes continued to be cited as the biggest challenge in therapeutic R&D even as the industry was pivoting to and mAbs.

No matter the context or frame of reference, the focus inevitably turns to how AI technologies can transform the entire drug discovery and development process, from research to clinical trials.

Biopharma companies have traditionally been to adopt innovative technologies like AI and the cloud. Today, however, digital innovation has become an industry-wide priority with drug development expected to be the most impacted by smart technologies.

From application-centric to data-centric

AI technologies have a range of applications across the drug discovery and development pipeline, from opening up new insights into biological systems and diseases to streamlining drug design to optimizing clinical trials. Despite the wide-ranging potential of AI-driven transformation in biopharma, the process does entail some complex challenges.

The most fundamental challenge will be to make the transformative shift from an application-centric to a data-centric culture, where data and metadata are operationalized at scale and across the entire drug design and development value chain.

However, creating a data-centric culture in drug development comes with its unique set of data-related challenges . To start with there is the sheer scale of data that will require a scalable architecture in order to be efficient and cost-effective. Most of this data is often distributed across disparate silos with storage practices, quality procedures, and naming and labeling conventions. Then there is the issue of different data modalities, from MR or CT scans to unstructured clinical notes, that have to be extracted, transformed, and curated at scale for unified analysis. And finally, the level of regulatory scrutiny on sensitive biomedical data means that there is this constant tension between enabling collaboration and ensuring compliance.

Therefore, creating a strong data foundation that accounts for all these complexities in biopharma data management and analysis will be critical to ensuring the successful adoption of AI in drug development.

Three key requisites for an AI-ready data foundation

Successful AI adoption in drug development will depend on the creation of a data foundation that addresses these key requirements.

Accessibility

Data accessibility is a key characteristic of AI leaders irrespective of sector. In order to ensure effective and productive data democratization, organizations need to enable access to data distributed across complex technology environments spanning multiple internal and external stakeholders and partners. A key caveat of accessibility is that the data provided should be contextual to the analytical needs of specific data users and consumers. A modern cloud-based and connected enterprise data and AI platform designed as a “one-stop-shop” for all drug design and development-related data products with ready-to-use analytical models will be critical to ensuring broader and deeper data accessibility for all users.

Data management and governance

The quality of any data ecosystem is determined by the data management and governance frameworks that ensure that relevant information is accessible to the right people at the right time. At the same time, these frameworks must also be capable of protecting confidential information, ensuring regulatory compliance, and facilitating the ethical and responsible use of AI. Therefore, the key focus of data management and governance will be to consistently ensure the highest quality of data across all systems and platforms as well as full transparency and traceability in the acquisition and application of data.

UX and usability

Successful AI adoption will require a data foundation that streamlines accessibility and prioritizes UX and usability. Apart from democratizing access, the emphasis should also be on ensuring that even non-technical users are able to use data effectively and efficiently. Different users often consume the same datasets from completely different perspectives. The key, therefore, is to provide a range of tools and features that help every user customize the experience to their specific roles and interests.

Apart from creating the right data foundation, technology partnerships can also help accelerate the shift from an application-centric to a data-centric approach to AI adoption. In fact, a 2018 report advised organizations to explore vendor offerings as a foundational approach to jump-start their efforts to make productive use of AI. More recently, pharma-technology partnerships have emerged as the fastest-moving model for externalizing innovation in AI-enabled drug discovery. According to a recent Roots Analysis report on the AI-based drug discovery market, partnership activity in the pharmaceutical industry has grown at a CAGR of 50%, between 2015 and 2021, with a majority of the deals focused on research and development.

So with that trend as background, here’s a quick look at how a data-centric, full-service biotherapeutic platform can accelerate biopharma’s shift to an AI-first drug discovery model.

The approach to data-centric drug development

Our approach to biotherapeutic research places data at the very core of a dynamic network of biological and artificial intelligence technologies.

With our LENSai platform, we have created a Google-like solution for the entire biosphere, organizing it into a multidimensional network of 660 million data objects with multiple layers of information about sequence, syntax, and protein structure. This “one-stop-shop” model enables researchers to seamlessly access all raw sequence data. In addition, HYFTs ®, our universal framework for organizing all biological data, allows easy, one-click integration of all other research-relevant data from across public and proprietary data repositories.

Researchers can then leverage the power of the LENSai Integrated Intelligence Platform to integrate unstructured data from text-based knowledge sources such as scientific journals, EHRs, clinical notes, etc. Here again, researchers have the ability to expand the core knowledge base, containing over 33 million abstracts from the PubMed biomedical literature database, by integrating data from multiple sources and knowledge domains, including proprietary databases.

Around this multi-source, multi-domain, data-centric core, we have designed next-generation AI technologies that can instantly and concurrently convert these vast volumes of text, sequence, and protein structure data into meaningful knowledge that can transform drug discovery and development.

Register for future blogs

Originally published at https://blog.biostrand.ai.