Bayesian Inference under the Structured Coalescent Model

  • Funded by UK Research and Innovation (UKRI)
  • Total publications:0 publications

Grant number: 2435782

Grant search

Key facts

  • Disease

    Ebola
  • Start & end year

    2020
    2024
  • Known Financial Commitments (USD)

    $0
  • Funder

    UK Research and Innovation (UKRI)
  • Principal Investigator

    N/A

  • Research Location

    N/A
  • Lead Research Institution

    N/A
  • Research Priority Alignment

    N/A
  • Research Category

    Pathogen: natural history, transmission and diagnostics

  • Research Subcategory

    Pathogen genomics, mutations and adaptations

  • Special Interest Tags

    N/A

  • Study Type

    Non-Clinical

  • Clinical Trial Details

    N/A

  • Broad Policy Alignment

    Pending

  • Age Group

    Not Applicable

  • Vulnerable Population

    Not applicable

  • Occupations of Interest

    Not applicable

Abstract

The coalescent is a population genetics model for the inheritance of genetic material over time. Genetic sequences are taken at known times and based on genetic similarities between sequences, the times until pairs of lineages coalesce at a common ancestor can be estimated. An important factor, which is not accounted for in the ordinary coalescent model, is the spatial constraints. For example, if two lineages have been separated geographically at some time in the past, they cannot find a common ancestor until both lineages exist in a common location. This motivates an extension to the ordinary coalescent model which factors in spatial constraints, known as the structured coalescent. Individuals are assumed to exist in a fixed, and possibly unknown, number of distinct demes, with migrations occurring between demes at fixed rates backwards in time. A dataset consists of a number of genomes sampled at various timepoints and from various demes. From this, there are several evolutionary parameters which we would like to infer in the structured coalescent model, including the migration rates between demes and effective population sizes of each deme. The migration history that led to the current locations of the samples is also often of interest. The uncertainty in at least some of these parameters is likely to be important, which motivates a Bayesian approach to inference. Current methods to infer these parameters are either computationally expensive, or rely on approximations of the structured coalescent in place of the full model which can introduce significant biases (Muller et al, 2017). To combat this lack of scalable approaches to perform inference under the structured coalescent, I intend to construct a reversible jump Markov chain Monte Carlo algorithm which will infer migration histories and evolutionary parameters for a fixed coalescent genealogy. There are multiple robust methods currently available to infer a genealogy from genomic data, including BEAST (Suchard et al, 2018), LSD (To et al, 2016) and TreeTime (Sagulenko et al, 2018). My work will build upon previous MCMC schemes proposed by Drummond et al. (2002) and Ewing et al. (2004) for the coalescent and structured coalescent respectively. Further, I will release an implementation of my algorithm as an open source R package. The correctness and computational efficiency of my algorithm will be assessed by benchmarking on simulated datasets. Applications to state-of-the-art real datasets from infectious disease pathogens will demonstrate the usefulness of my algorithm, for example a global dataset of cholera genomes from the seventh pandemic (Didelot et al 2015) and a collection of Ebola genomes from the 2013-2016 West African epidemic (Dudas et al 2017). I anticipate that this project will contribute to advances in the accuracy of statistical methods for genetic sequences. It will also be relevant for generic MCMC methods on constrained and non-Euclidean spaces, which have applications across applied sciences and engineering.