Identifying and understanding drivers of selection bias and information bias in clinical COVID-19 data
- Funded by National Institutes of Health (NIH)
- Total publications:1 publications
Grant number: 1R21LM013645-01
Grant search
Key facts
Disease
COVID-19Start & end year
20212023Known Financial Commitments (USD)
$200,970Funder
National Institutes of Health (NIH)Principal Investigator
ASSISTANT PROFESSOR Nicole WeiskopfResearch Location
United States of AmericaLead Research Institution
OREGON HEALTH & SCIENCE UNIVERSITYResearch Priority Alignment
N/A
Research Category
Epidemiological studies
Research Subcategory
Disease susceptibility
Special Interest Tags
Data Management and Data Sharing
Study Type
Clinical
Clinical Trial Details
Not applicable
Broad Policy Alignment
Pending
Age Group
Not Applicable
Vulnerable Population
Not applicable
Occupations of Interest
Not applicable
Abstract
Project Summary / Abstract During the COVID-19 pandemic, there is an immediate need for high-quality data for studies that support patient care, predict outcomes, identify and evaluate treatments, allocate resources, and make operations and policy decisions. While prospective research produces higher-quality evidence, retrospective studies that reuse clinical data can be executed in a shorter time frame and for less cost, both of which are crucial for research in a pandemic. Unfortunately, it has been shown that the usefulness and validity of available COVID-19 data are constrained by various forms of selection bias and information bias, which may lead to non-valid findings in research and analytics and disparities in resulting healthcare practices. The objective of the proposed work is to study the selection and information biases present in clinically derived COVID-19 datasets by integrating COVID-19 datasets from OHSU and the National COVID Cohort Collaborative with novel and traditional sources of clinical, epidemiological, social media, and citizen-generated data. From each data source we will extract data indicating COVID-19, as well as a set of social determinants of health that are commonly associated with healthcare utilization and access. To test for the presence of selection bias, we will construct and compare categorical probability distributions for each social determinant across COVID-19 cases in each data source. Differences in these distributions will indicate selection bias in one or more of the data sources. Next we will determine information bias by extending and adapting tests for missingness and other forms of information bias in the COVID-19 datasets to determine if the quantity and quality of these data vary with respect to clinical factors and those related to social determinants of health. This proposal therefore addresses a significant gap in knowledge: understanding not just the disparities in who is impacted by COVID-19, but who is represented by the data we have available for learning more about the disease. The identification and estimation the influence of social determinants of health on selection bias and information bias in COVID-19 data can guide the use of statistical and analytic approaches that can improve the external and internal validity of research and analytics that rely on these data, including estimates of disease prevalence, understanding the natural course of COVID-19, and identifying patients who are at risk for severe disease.
Publicationslinked via Europe PMC
Last Updated:38 minutes ago
View all publications at Europe PMC