Data-Driven, High-Dimensional Design for Trustworthy Drug Discovery

Grant number: unknown

Grant search

Key facts

  • Disease

    COVID-19
  • Funder

    C3.ai DTI
  • Principal Investigator

    Jennifer Listgarten, Sergey Levine
  • Research Location

    United States of America
  • Lead Research Institution

    University of California-Berkeley
  • Research Priority Alignment

    N/A
  • Research Category

    Therapeutics research, development and implementation

  • Research Subcategory

    N/A

  • Special Interest Tags

    N/A

  • Study Type

    Unspecified

  • Clinical Trial Details

    N/A

  • Broad Policy Alignment

    Pending

  • Age Group

    Not Applicable

  • Vulnerable Population

    Not applicable

  • Occupations of Interest

    Not applicable

Abstract

Machine learning-based predictive modeling tools have been applied to a wide variety of tasks in computational biology and chemistry, such as predicting protein binding and stability, small molecule antibiotic properties, synthesizability, and drug-likeness. However, when such data-driven models are used to produce new designs, they are likely to encounter a major challenge. Learning-based design involves optimizing over the input to a predictive model. For example, a model that predicts how well a small molecule binds to a particular drug target takes as input some representation of the molecule, and outputs the binding efficiency. Hence, finding the best small molecule involves performing an optimization over the input to the model when its output is fixed to be, say, as large as possible. We refer to this problem setting as "high-dimensional model inversion" (HDMI). Critically, by definition of the design problem, the predictive model will never have seen any molecules with precisely the desired property, and thus we are asking the model to extrapolate. What does it mean to extrapolate in this context? Can we extrapolate? How far can we extrapolate? How can we trust such decisions? We will develop a new formal framework and associated algorithms for solving HDMI with high capacity models such as neural networks and high-dimensional inputs, which will enable us to answer these questions. We will draw on ideas from learning-based decision making (reinforcement learning), robust uncertainty estimation, and probabilistic modelling. We will focus on data-driven drug design, including a collaboration toward developing a therapeutic for COVID-19.