Accelerating Bayesian Dimension Reduction for Dynamic Network Data with Many Observations

  • Funded by National Science Foundation (NSF)
  • Total publications:0 publications

Grant number: 2152774

Grant search

Key facts

  • Disease

    COVID-19
  • Start & end year

    2022
    2025
  • Known Financial Commitments (USD)

    $300,000
  • Funder

    National Science Foundation (NSF)
  • Principal Investigator

    Andrew Holbrook
  • Research Location

    United States of America
  • Lead Research Institution

    University of California-Los Angeles
  • Research Priority Alignment

    N/A
  • Research Category

    Epidemiological studies

  • Research Subcategory

    Disease transmission dynamics

  • Special Interest Tags

    N/A

  • Study Type

    Unspecified

  • Clinical Trial Details

    N/A

  • Broad Policy Alignment

    Pending

  • Age Group

    Not Applicable

  • Vulnerable Population

    Not applicable

  • Occupations of Interest

    Not applicable

Abstract

Global viral epidemics produce vast amounts of high-dimensional spatiotemporal data. Scientists, businesses, governments and independent organizations want to learn from this data so they can understand basic biological mechanisms, invest capital, allocate aid and design coherent policy in a changing world. Analyzing spatial associations within viral contagion is, unsurprisingly, an area of immense scientific interest, but the task requires accounting for the dynamic and multiscale transportation networks that shape the global economy. This project seeks to advance knowledge of statistical inference from stochastic process models in the context of massive amounts of dynamic and network-indexed data. The proposed research ideas will avoid costly direct representations of network structure and instead use Bayesian dimension reduction to probabilistically map network dynamics to a continuous domain. The project combines theoretical and methodological developments in scalable Bayesian dimension reduction; develops efficient algorithms into open-source, high performance computing (HPC) software; and applies them to the high-impact analysis of viruses including, but not limited to, SARS-CoV-2. The project will emphasize the combination of rigorous statistical methodology with parallel computing techniques available to any scientist with moderate resources.


The project will combine theory, methods and applications in advancing knowledge of statistical inference for network-indexed processes. Bayesian multidimensional scaling (BMDS) stands as an established tool for probabilistic dimension reduction of network data but the method's quadratic computational complexity prohibits big data application. The project will extend BMDS to the analysis of millions of data points using a multipronged approach. From a theoretical standpoint, the investigators will show that the classical BMDS model is strictly equivalent to a modified BMDS model with sparse couplings between observations. This 'free lunch' result will amount to a linear reduction in the computational complexity of the classical algorithm, but its use will require an upper bound on the rank of the traditional BMDS distance matrix. A jointly methodological and theoretical investigation will develop a cutting-edge rank estimation procedure for Euclidean distance matrices (EDM) and derive non-asymptotic and asymptotic bounds for the rank estimation error and its impact on the modified BMDS posterior. Bayesian inference with the developed sparse BMDS (S-BMDS) will amount to simulating a massive N-body problem with sparse pairwise couplings. A primary methodological investigation will develop fast parallel algorithms for computing (1) the S-BMDS likelihood and gradient, and (2) the EDM rank in ways that efficiently use multi-core and vectorized central processing units (CPU) and multiple graphics processing units (GPU). The investigators will then allow trends in Google mobility data to inform effective distances between viruses and use our developed machinery to model the spread of, e.g., SARS-CoV-2 through global mobility space. The project also includes an expansive plan for educational, outreach and mentoring activities and will actively disseminate the research findings in a form of open-source HPC software.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.