CAREER: Designing Optimal Sampling Strategies for Epidemiological Models

Emerging infectious diseases are an inevitable part of our world. Mathematical models can inform scientific understanding of these complex and ever-changing systems, as well as public health policy. In particular, mechanistic models, mathematical models that describe the key processes that drive dynamic, real-world systems, along with empirical data, can be useful for assessing the efficacy of different control strategies, or testing different hypotheses about the underlying biology. The COVID-19 pandemic has underscored both the importance of mechanistic models and the challenges of implementing these models to inform policy. One of these challenges, which spans disciplines, arises when there is a mismatch between the data available and the data required to inform these models and reduce uncertainty in model outputs and predictions. A model-data mismatch could lead to erroneous conclusions about which control policies will be most effective at achieving a particular goal. This project aims to create a methodology that finds the most cost-effective approach to data collection for a given mathematical model, such that the model produces reliable outputs and measures of uncertainty. Applying these methods to recent case studies like COVID-19, Zika, and Ebola can inform appropriate responses to, and improve preparedness for, future pandemics. These methods will also be applicable to areas of ecology and physiology, where the overarching goal of integrating modeling and empirical studies is critical to progress. Integral to the research objectives is a new education program targeting undergraduate students, with unrecognized potential in STEM. A series of educational
modules will engage students in the fundamentals of mathematical modeling, coding, and visualization, using a collaborative and inquiry-based approach. Participants will hone these skills by working with epidemiological models related to the proposed project, and will use their creativity and unique perspectives as novice modelers with diverse backgrounds to promote model literacy in the general public.

One way to address the problem of a model-data mismatch is first to evaluate the practical identifiability of a model, that is, the ability to estimate model parameters unambiguously from data, given a particular data set. Conclusions derived from a practically unidentifiable model may not be robust. Furthermore, commonly used methods to determine practical identifiability of epidemiological models rely on simple likelihood models (functions proportional to the probability of observed data given the model parameters) that make unrealistic assumptions about epidemiological data, which in reality are highly correlated and complex. A poor approximation to the true likelihood model can lead to inaccurate estimates of uncertainty in model outputs. Using appropriate identifiability metrics and improved (yet tractable) likelihood models, control theory can be leveraged to derive data sampling strategies, constrained by finite public health resources, that render a model practically identifiable. The approach is to find sampling protocols that jointly minimize the identifiability metric and the cost of the sampling strategy. These new methods will be tested using synthetic data, and applied to recent outbreaks to determine what data sampling strategies would have been sufficient to reduce the uncertainty in model parameter estimates, and therefore, reduce uncertainty in model outputs.

This award is jointly funded by the MPS Division of Mathematical Sciences (DMS) through the Mathematical Biology Program and the BIO Division of Environmental Biology through the Population and Community Ecology (PCE) Cluster.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

CAREER: Designing Optimal Sampling Strategies for Epidemiological Models

Key facts

Abstract