Identifying critical protein-protein interactions with ML methods

  • Funded by National Institutes of Health (NIH)
  • Total publications:0 publications

Grant number: 3T32GM142616-04S1

Grant search

Key facts

  • Disease

    COVID-19
  • Start & end year

    2021.0
    2026.0
  • Known Financial Commitments (USD)

    $65,484
  • Funder

    National Institutes of Health (NIH)
  • Principal Investigator

    ASSISTANT PROFESSOR James Gumbart
  • Research Location

    United States of America
  • Lead Research Institution

    GEORGIA INSTITUTE OF TECHNOLOGY
  • Research Priority Alignment

    N/A
  • Research Category

    Pathogen: natural history, transmission and diagnostics

  • Research Subcategory

    Pathogen morphology, shedding & natural history

  • Special Interest Tags

    N/A

  • Study Type

    Non-Clinical

  • Clinical Trial Details

    N/A

  • Broad Policy Alignment

    Pending

  • Age Group

    Not Applicable

  • Vulnerable Population

    Not applicable

  • Occupations of Interest

    Not applicable

Abstract

Project summary The cloud module proposed here is focused on applying machine learning (ML) methods for the analysis of large datasets generated from molecular dynamics (MD) simulations of biomolecules. ML has become a common set of tools used in many areas of scientific research, albeit still with some barriers to their imple- mentation due in part to a relative dearth of training materials. Thus, the proposed module is especially timely. The dataset that will be used in the module is derived from long-timescale MD simulations of the SARS-CoV-2 or SARS-CoV spike protein receptor binding domain (RBD) bound to the human receptor on the cell surface, ACE2. The ML approaches covered are logistic regression, random forest, and multilayer perceptron (a type of neural network). These methods will be used to facilitate the identification of the key residues responsible for the increase in binding affinity of SARS-CoV-2 relative to SARS-CoV. The module will guide scientists and researchers through the different steps for analyzing a large amount of data with ML approaches and gleaning meaningful insights from them. The aim is to decrease the barrier for students, scientists, and researchers with a nascent interest in applying ML to problems in quantitative biology. The skills and concepts learned through the module will facilitate the further implementation of ML approaches in the user's own research using a cloud environment. Such approaches can be extended by users to the application of ML for analyzing large datasets produced in other areas of research, including experimentally. The design of the module is based on tutorials developed for a recent workshop with participants spanning the full gamut of education levels and coding experience, illustrating its adaptability, meeting the needs of all users.