Accelerating Bayesian Dimension Reduction for Dynamic Network Data with Many Observations
- Funded by National Science Foundation (NSF)
- Total publications:0 publications
Grant number: 2152774
Grant search
Key facts
Disease
COVID-19Start & end year
20222025Known Financial Commitments (USD)
$300,000Funder
National Science Foundation (NSF)Principal Investigator
Andrew HolbrookResearch Location
United States of AmericaLead Research Institution
University of California-Los AngelesResearch Priority Alignment
N/A
Research Category
Epidemiological studies
Research Subcategory
Disease transmission dynamics
Special Interest Tags
N/A
Study Type
Unspecified
Clinical Trial Details
N/A
Broad Policy Alignment
Pending
Age Group
Not Applicable
Vulnerable Population
Not applicable
Occupations of Interest
Not applicable
Abstract
Global viral epidemics produce vast amounts of high-dimensional spatiotemporal data. Scientists, businesses, governments and independent organizations want to learn from this data so they can understand basic biological mechanisms, invest capital, allocate aid and design coherent policy in a changing world. Analyzing spatial associations within viral contagion is, unsurprisingly, an area of immense scientific interest, but the task requires accounting for the dynamic and multiscale transportation networks that shape the global economy. This project seeks to advance knowledge of statistical inference from stochastic process models in the context of massive amounts of dynamic and network-indexed data. The proposed research ideas will avoid costly direct representations of network structure and instead use Bayesian dimension reduction to probabilistically map network dynamics to a continuous domain. The project combines theoretical and methodological developments in scalable Bayesian dimension reduction; develops efficient algorithms into open-source, high performance computing (HPC) software; and applies them to the high-impact analysis of viruses including, but not limited to, SARS-CoV-2. The project will emphasize the combination of rigorous statistical methodology with parallel computing techniques available to any scientist with moderate resources.
The project will combine theory, methods and applications in advancing knowledge of statistical inference for network-indexed processes. Bayesian multidimensional scaling (BMDS) stands as an established tool for probabilistic dimension reduction of network data but the method's quadratic computational complexity prohibits big data application. The project will extend BMDS to the analysis of millions of data points using a multipronged approach. From a theoretical standpoint, the investigators will show that the classical BMDS model is strictly equivalent to a modified BMDS model with sparse couplings between observations. This 'free lunch' result will amount to a linear reduction in the computational complexity of the classical algorithm, but its use will require an upper bound on the rank of the traditional BMDS distance matrix. A jointly methodological and theoretical investigation will develop a cutting-edge rank estimation procedure for Euclidean distance matrices (EDM) and derive non-asymptotic and asymptotic bounds for the rank estimation error and its impact on the modified BMDS posterior. Bayesian inference with the developed sparse BMDS (S-BMDS) will amount to simulating a massive N-body problem with sparse pairwise couplings. A primary methodological investigation will develop fast parallel algorithms for computing (1) the S-BMDS likelihood and gradient, and (2) the EDM rank in ways that efficiently use multi-core and vectorized central processing units (CPU) and multiple graphics processing units (GPU). The investigators will then allow trends in Google mobility data to inform effective distances between viruses and use our developed machinery to model the spread of, e.g., SARS-CoV-2 through global mobility space. The project also includes an expansive plan for educational, outreach and mentoring activities and will actively disseminate the research findings in a form of open-source HPC software.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
The project will combine theory, methods and applications in advancing knowledge of statistical inference for network-indexed processes. Bayesian multidimensional scaling (BMDS) stands as an established tool for probabilistic dimension reduction of network data but the method's quadratic computational complexity prohibits big data application. The project will extend BMDS to the analysis of millions of data points using a multipronged approach. From a theoretical standpoint, the investigators will show that the classical BMDS model is strictly equivalent to a modified BMDS model with sparse couplings between observations. This 'free lunch' result will amount to a linear reduction in the computational complexity of the classical algorithm, but its use will require an upper bound on the rank of the traditional BMDS distance matrix. A jointly methodological and theoretical investigation will develop a cutting-edge rank estimation procedure for Euclidean distance matrices (EDM) and derive non-asymptotic and asymptotic bounds for the rank estimation error and its impact on the modified BMDS posterior. Bayesian inference with the developed sparse BMDS (S-BMDS) will amount to simulating a massive N-body problem with sparse pairwise couplings. A primary methodological investigation will develop fast parallel algorithms for computing (1) the S-BMDS likelihood and gradient, and (2) the EDM rank in ways that efficiently use multi-core and vectorized central processing units (CPU) and multiple graphics processing units (GPU). The investigators will then allow trends in Google mobility data to inform effective distances between viruses and use our developed machinery to model the spread of, e.g., SARS-CoV-2 through global mobility space. The project also includes an expansive plan for educational, outreach and mentoring activities and will actively disseminate the research findings in a form of open-source HPC software.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.