MFB: Better Homologous Folding using Computational Linguistics and Deep Learning
- Funded by National Science Foundation (NSF)
- Total publications:2 publications
Grant number: 2330737
Grant search
Key facts
Disease
COVID-19Start & end year
20242027Known Financial Commitments (USD)
$1,453,104Funder
National Science Foundation (NSF)Principal Investigator
Liang; David Huang; MathewsResearch Location
United States of AmericaLead Research Institution
Oregon State UniversityResearch Priority Alignment
N/A
Research Category
Pathogen: natural history, transmission and diagnostics
Research Subcategory
Pathogen genomics, mutations and adaptations
Special Interest Tags
N/A
Study Type
Unspecified
Clinical Trial Details
N/A
Broad Policy Alignment
Pending
Age Group
Not Applicable
Vulnerable Population
Not applicable
Occupations of Interest
Not applicable
Abstract
Ribonucleic acid (RNA) is of utmost importance in our daily life because it plays essential roles in every living cell. Furthermore, our world was recently turned upside down by an RNA virus, which was then partially contained by an RNA vaccine. Contrary to common wisdom, RNA is not just an intermediate "messenger" between the more well-known DNA and protein, but it can also have profound biological functions such as controlling gene expression. These functions are determined by RNA structures (the "shapes" of the RNAs), and therefore accurate modeling of these structures is critical for understanding RNA functions and for designing vaccines, test kits, and drugs. However, existing experimental methods for determining RNA structure are extremely expensive and often limited to short sequences, and existing computational tools are rather slow and not completely accurate. This slowness hinders their applications to full-length viral genomes such as coronavirus (about 30,000 nucleotides or "letters"). Therefore, there is a critical need to develop better computational methods to predict RNA structures that are more accurate and more efficient and scalable to longer sequences such as whole genomes. Advances in this direction could improve our understanding of RNA viruses (which include common cold, influenza, Rabies, HIV, Ebola, polio, measles, and more) and increase our readiness to fight the next pandemic. This project develops efficient algorithms for predicting the structures of multiple related ("homologous") RNA sequences such as SARS-CoV-2 variants. These algorithms will scale linearly in both the average sequence length and the number of sequences. This linear scaling will enable whole genome applications. The researchers aim to achieve these goals with ideas from two branches of artificial intelligence (AI): natural language processing and deep learning. Specifically, this project will improve three types of homologous folding algorithms and adapt them to structure discovery: (1) align-then-fold: first align the homologous sequences and then predict the consensus structure for the aligned sequences; (2) iteratively align-and-fold: iterate between sequence alignment and structure prediction; and (3) simultaneous align-and-fold: jointly predict alignment and structures. The team will adapt these fast methods to discover conserved structures using global structure prediction for RNA viral genomes and transcripts. This research will make it possible to discover new RNA structures and functions, and will help the design of vaccines, test kits, and drugs. This project is supported by the Divisions of Information and Intelligent Systems and of Chemistry and the Chemical Theory, Models, and Computational Methods Program in the Division of Chemistry. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Publicationslinked via Europe PMC
Last Updated:2 days ago
View all publications at Europe PMC