Identifying critical protein-protein interactions with ML methods
- Funded by National Institutes of Health (NIH)
- Total publications:0 publications
Grant number: 3T32GM142616-04S1
Grant search
Key facts
Disease
COVID-19Start & end year
2021.02026.0Known Financial Commitments (USD)
$65,484Funder
National Institutes of Health (NIH)Principal Investigator
ASSISTANT PROFESSOR James GumbartResearch Location
United States of AmericaLead Research Institution
GEORGIA INSTITUTE OF TECHNOLOGYResearch Priority Alignment
N/A
Research Category
Pathogen: natural history, transmission and diagnostics
Research Subcategory
Pathogen morphology, shedding & natural history
Special Interest Tags
N/A
Study Type
Non-Clinical
Clinical Trial Details
N/A
Broad Policy Alignment
Pending
Age Group
Not Applicable
Vulnerable Population
Not applicable
Occupations of Interest
Not applicable
Abstract
Project summary The cloud module proposed here is focused on applying machine learning (ML) methods for the analysis of large datasets generated from molecular dynamics (MD) simulations of biomolecules. ML has become a common set of tools used in many areas of scientific research, albeit still with some barriers to their imple- mentation due in part to a relative dearth of training materials. Thus, the proposed module is especially timely. The dataset that will be used in the module is derived from long-timescale MD simulations of the SARS-CoV-2 or SARS-CoV spike protein receptor binding domain (RBD) bound to the human receptor on the cell surface, ACE2. The ML approaches covered are logistic regression, random forest, and multilayer perceptron (a type of neural network). These methods will be used to facilitate the identification of the key residues responsible for the increase in binding affinity of SARS-CoV-2 relative to SARS-CoV. The module will guide scientists and researchers through the different steps for analyzing a large amount of data with ML approaches and gleaning meaningful insights from them. The aim is to decrease the barrier for students, scientists, and researchers with a nascent interest in applying ML to problems in quantitative biology. The skills and concepts learned through the module will facilitate the further implementation of ML approaches in the user's own research using a cloud environment. Such approaches can be extended by users to the application of ML for analyzing large datasets produced in other areas of research, including experimentally. The design of the module is based on tutorials developed for a recent workshop with participants spanning the full gamut of education levels and coding experience, illustrating its adaptability, meeting the needs of all users.