Fast and flexible Bayesian phylogenetics via modern machine learning
- Funded by National Institutes of Health (NIH)
- Total publications:0 publications
Grant number: 5R01AI162611-05
Grant search
Key facts
Disease
N/A
Start & end year
20212026Known Financial Commitments (USD)
$744,770Funder
National Institutes of Health (NIH)Principal Investigator
Frederick MatsenResearch Location
United States of AmericaLead Research Institution
FRED HUTCHINSON CANCER CENTERResearch Priority Alignment
N/A
Research Category
Pathogen: natural history, transmission and diagnostics
Research Subcategory
Pathogen genomics, mutations and adaptations
Special Interest Tags
N/A
Study Type
Non-Clinical
Clinical Trial Details
N/A
Broad Policy Alignment
Pending
Age Group
Not Applicable
Vulnerable Population
Not applicable
Occupations of Interest
Not applicable
Abstract
Project Abstract/Summary The SARS-CoV-2 pandemic underlines both our susceptibility to and the toll of a global pathogen outbreak. Phylogenetic analysis of viral genomes provides key insight into disease pathophysiology, spread and po- tential control. However, if these methods are to be used in a viral control strategy they must reliably account for uncertainty and be able to perform inference on 1,000s of genomes in actionable time. Scaling Bayesian phylogenet- ics to meet this need is a grand challenge that is unlikely to be met by optimizing existing algorithms. We will meet this challenge with a radically new approach: Bayesian variational inference for phylogenet- ics (VIP) using flexible distributions on phylogenetic trees that are fit using gradient-based methods analogous to how one efficiently trains massive neural networks. By taking a variational approach we will also be able to integrate phylogenetic analysis into very powerful open-source modeling frameworks such as TensorFlow and PyTorch. This will open up new classes of models, such as neural network models, to integrate data such as sampling location and migration patterns with phylogenetic inference. These flexible models will inform strategies for viral control. In Aim 1 we will develop the theory necessary for scalable and reliable VIP, including subtree marginal- ization, local gradient updates needed for online algorithms, convergence diagnostics, and parameter support estimates. We will implement these algorithms in our C++ foundation library for VIP. In Aim 2 we will develop a flexible TensorFlow-based modeling platform for phylogenetics, enabling a whole new realm of phylogenetic models based on neural networks to learn phylodynamic heterogeneity with minimal program- ming effort. We will provide efficient gradients to this implementation via our C++ library. In Aim 3 we will use the fact that VIP posteriors are durable and extensible descriptions of the full data posterior to enable dynamic online computation of variational posteriors, including divide-and-conquer Bayesian phylogenetics. This work will enable a cloud-based viral phylogenetics solution to rapidly update our current estimate of the posterior distribution when new data arrive or the model is modified. 1