virtual compound screening using gene expression

PROJECT SUMMARY Today's technologies allow profiling thousands of gene expression features for diseases and drugs at a very low cost. This proposal entitled "Virtual Compound Screening Using Gene Expression" aims to develop novel data science approaches to leverage emerging gene expression profiles to discover novel drugs. Previously, we developed a scoring function called RGES to quantify the drug's potency to reverse disease gene expression based on the drug- and disease- expression profiles. We observed that RGES correlates with drug efficacy. Using this idea, we and others identified drugs that could be repurposed to treat a number of diseases. However, this approach currently does not support novel compound screening or lead optimization. To implement this approach for large-scale screening of a big compound library, we first need to generate gene expression profiles of the library compounds. However, because of the lack of large-scale gene expression profiles of new compounds, virtual compound screening was impossible until recent efforts including ours demonstrated the feasibility of predicting gene expression solely based on chemical structure. The objective of this project is thus to develop novel machine learning methods to boost the performance of drug-gene expression prediction and utilize the predicted profiles in practical drug discovery. To achieve the goals, we have assembled a team of experts in computational drug discovery, machine learning, drug screening, and medicinal chemistry. First, we will develop a robust, high-performance, and generalizable data-driven chemical structure embedding method to enhance drug-induced gene expression prediction. With the predicted profiles, we will deploy RGES to score compounds for given disease profiles. We will evaluate the performance in the screening of compounds for liver cancer inhibitors, SARS-CoV-2 inhibitors, and cell reprogramming regulators. Finally, we will apply it to lead optimization. Our previous drug repurposing efforts identified and validated two candidates: niclosamide in liver cancer and Mycophenolic acid in DIPG. However, the poor solubility of niclosamide and the poor penetration of Mycophenolic acid in the brain hindered their further development. Accordingly, we will develop a deep reinforcement learning framework to achieve the optimization of these two drugs. In parallel, domain experts will propose new analogs. We will synthesize the analogs and compare the performance between domain experts and the AI model. We expect this work will unleash the power of the emerging omics data in drug discovery.

virtual compound screening using gene expression

Key facts

Abstract