Fast and flexible Bayesian phylogenetics via modern machine learning

  • Funded by National Institutes of Health (NIH)
  • Total publications:0 publications

Grant number: 5R01AI162611-03

Grant search

Key facts

  • Disease

    N/A

  • Start & end year

    2021
    2026
  • Known Financial Commitments (USD)

    $744,770
  • Funder

    National Institutes of Health (NIH)
  • Principal Investigator

    Frederick Matsen
  • Research Location

    United States of America
  • Lead Research Institution

    FRED HUTCHINSON CANCER CENTER
  • Research Priority Alignment

    N/A
  • Research Category

    Pathogen: natural history, transmission and diagnostics

  • Research Subcategory

    Pathogen genomics, mutations and adaptations

  • Special Interest Tags

    N/A

  • Study Type

    Non-Clinical

  • Clinical Trial Details

    N/A

  • Broad Policy Alignment

    Pending

  • Age Group

    Not Applicable

  • Vulnerable Population

    Not applicable

  • Occupations of Interest

    Not applicable

Abstract

Project Abstract/Summary The SARS-CoV-2 pandemic underlines both our susceptibility to and the toll of a global pathogen outbreak. Phylogenetic analysis of viral genomes provides key insight into disease pathophysiology, spread and po- tential control. However, if these methods are to be used in a viral control strategy they must reliably account for uncertainty and be able to perform inference on 1,000s of genomes in actionable time. Scaling Bayesian phylogenet- ics to meet this need is a grand challenge that is unlikely to be met by optimizing existing algorithms. We will meet this challenge with a radically new approach: Bayesian variational inference for phylogenet- ics (VIP) using flexible distributions on phylogenetic trees that are fit using gradient-based methods analogous to how one efficiently trains massive neural networks. By taking a variational approach we will also be able to integrate phylogenetic analysis into very powerful open-source modeling frameworks such as TensorFlow and PyTorch. This will open up new classes of models, such as neural network models, to integrate data such as sampling location and migration patterns with phylogenetic inference. These flexible models will inform strategies for viral control. In Aim 1 we will develop the theory necessary for scalable and reliable VIP, including subtree marginal- ization, local gradient updates needed for online algorithms, convergence diagnostics, and parameter support estimates. We will implement these algorithms in our C++ foundation library for VIP. In Aim 2 we will develop a flexible TensorFlow-based modeling platform for phylogenetics, enabling a whole new realm of phylogenetic models based on neural networks to learn phylodynamic heterogeneity with minimal program- ming effort. We will provide efficient gradients to this implementation via our C++ library. In Aim 3 we will use the fact that VIP posteriors are durable and extensible descriptions of the full data posterior to enable dynamic online computation of variational posteriors, including divide-and-conquer Bayesian phylogenetics. This work will enable a cloud-based viral phylogenetics solution to rapidly update our current estimate of the posterior distribution when new data arrive or the model is modified. 1