MFB: Better Homologous Folding using Computational Linguistics and Deep Learning

Ribonucleic acid (RNA) is of utmost importance in our daily life because it plays essential roles in every living cell. Furthermore, our world was recently turned upside down by an RNA virus, which was then partially contained by an RNA vaccine. Contrary to common wisdom, RNA is not just an intermediate "messenger" between the more well-known DNA and protein, but it can also have profound biological functions such as controlling gene expression. These functions are determined by RNA structures (the "shapes" of the RNAs), and therefore accurate modeling of these structures is critical for understanding RNA functions and for designing vaccines, test kits, and drugs. However, existing experimental methods for determining RNA structure are extremely expensive and often limited to short sequences, and existing computational tools are rather slow and not completely accurate. This slowness hinders their applications to full-length viral genomes such as coronavirus (about 30,000 nucleotides or "letters"). Therefore, there is a critical need to develop better computational methods to predict RNA structures that are more accurate and more efficient and scalable to longer sequences such as whole genomes. Advances in this direction could improve our understanding of RNA viruses (which include common cold, influenza, Rabies, HIV, Ebola, polio, measles, and more) and increase our readiness to fight the next pandemic. This project develops efficient algorithms for predicting the structures of multiple related ("homologous") RNA sequences such as SARS-CoV-2 variants. These algorithms will scale linearly in both the average sequence length and the number of sequences. This linear scaling will enable whole genome applications. The researchers aim to achieve these goals with ideas from two branches of artificial intelligence (AI): natural language processing and deep learning. Specifically, this project will improve three types of homologous folding algorithms and adapt them to structure discovery: (1) align-then-fold: first align the homologous sequences and then predict the consensus structure for the aligned sequences; (2) iteratively align-and-fold: iterate between sequence alignment and structure prediction; and (3) simultaneous align-and-fold: jointly predict alignment and structures. The team will adapt these fast methods to discover conserved structures using global structure prediction for RNA viral genomes and transcripts. This research will make it possible to discover new RNA structures and functions, and will help the design of vaccines, test kits, and drugs. This project is supported by the Divisions of Information and Intelligent Systems and of Chemistry and the Chemical Theory, Models, and Computational Methods Program in the Division of Chemistry. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Visualise a dataset

Research Funding Tracker

Clinical Research Registrations Tracker

Explore a dataset

Research Funding Tracker

Clinical Research Registrations Tracker

MFB: Better Homologous Folding using Computational Linguistics and Deep Learning

Key facts

Abstract

6 Publications linked via Europe PMC

Probabilistic RNA designability via interpretable ensemble approximation and dynamic decomposition.

Authors

Publish Year

Journal

DOI

Motif server: web server for undesignable RNA motifs and structures.

Authors

Publish Year

Journal

DOI

SamplingDesign: RNA design via continuous optimization with coupled variables and Monte-Carlo sampling.

Authors

Publish Year

Journal

DOI

Theory, Algorithms, and Applications for Identification of Undesignable RNA Secondary Structures and Motifs.

Authors

Publish Year

Journal

DOI

EnsembleDesign: messenger RNA design minimizing ensemble free energy via probabilistic lattice parsing.

Authors

Publish Year

Journal

DOI

LinearAlifold: Linear-time consensus structure prediction for RNA alignments.

Authors

Publish Year

Journal

DOI