Data-Driven, High-Dimensional Design for Trustworthy Drug Discovery
- Funded by C3.ai DTI
- Total publications:0 publications
Grant number: unknown
Grant search
Key facts
Disease
COVID-19Funder
C3.ai DTIPrincipal Investigator
Jennifer Listgarten, Sergey LevineResearch Location
United States of AmericaLead Research Institution
University of California-BerkeleyResearch Priority Alignment
N/A
Research Category
Therapeutics research, development and implementation
Research Subcategory
N/A
Special Interest Tags
N/A
Study Type
Unspecified
Clinical Trial Details
N/A
Broad Policy Alignment
Pending
Age Group
Not Applicable
Vulnerable Population
Not applicable
Occupations of Interest
Not applicable
Abstract
Machine learning-based predictive modeling tools have been applied to a wide variety of tasks in computational biology and chemistry, such as predicting protein binding and stability, small molecule antibiotic properties, synthesizability, and drug-likeness. However, when such data-driven models are used to produce new designs, they are likely to encounter a major challenge. Learning-based design involves optimizing over the input to a predictive model. For example, a model that predicts how well a small molecule binds to a particular drug target takes as input some representation of the molecule, and outputs the binding efficiency. Hence, finding the best small molecule involves performing an optimization over the input to the model when its output is fixed to be, say, as large as possible. We refer to this problem setting as "high-dimensional model inversion" (HDMI). Critically, by definition of the design problem, the predictive model will never have seen any molecules with precisely the desired property, and thus we are asking the model to extrapolate. What does it mean to extrapolate in this context? Can we extrapolate? How far can we extrapolate? How can we trust such decisions? We will develop a new formal framework and associated algorithms for solving HDMI with high capacity models such as neural networks and high-dimensional inputs, which will enable us to answer these questions. We will draw on ideas from learning-based decision making (reinforcement learning), robust uncertainty estimation, and probabilistic modelling. We will focus on data-driven drug design, including a collaboration toward developing a therapeutic for COVID-19.