Identifying and understanding drivers of selection bias and information bias in clinical COVID-19 data

  • Funded by National Institutes of Health (NIH)
  • Total publications:0 publications

Grant number: 5R21LM013645-02

Grant search

Key facts

  • Disease

    COVID-19
  • Start & end year

    2021
    2024
  • Known Financial Commitments (USD)

    $167,475
  • Funder

    National Institutes of Health (NIH)
  • Principal Investigator

    ASSISTANT PROFESSOR Nicole Weiskopf
  • Research Location

    United States of America
  • Lead Research Institution

    OREGON HEALTH & SCIENCE UNIVERSITY
  • Research Priority Alignment

    N/A
  • Research Category

    Epidemiological studies

  • Research Subcategory

    Disease susceptibility

  • Special Interest Tags

    Data Management and Data Sharing

  • Study Type

    Clinical

  • Clinical Trial Details

    Not applicable

  • Broad Policy Alignment

    Pending

  • Age Group

    Not Applicable

  • Vulnerable Population

    Not applicable

  • Occupations of Interest

    Not applicable

Abstract

Project Summary / Abstract During the COVID-19 pandemic, there is an immediate need for high-quality data for studies that support patient care, predict outcomes, identify and evaluate treatments, allocate resources, and make operations and policy decisions. While prospective research produces higher-quality evidence, retrospective studies that reuse clinical data can be executed in a shorter time frame and for less cost, both of which are crucial for research in a pandemic. Unfortunately, it has been shown that the usefulness and validity of available COVID-19 data are constrained by various forms of selection bias and information bias, which may lead to non-valid findings in research and analytics and disparities in resulting healthcare practices. The objective of the proposed work is to study the selection and information biases present in clinically derived COVID-19 datasets by integrating COVID-19 datasets from OHSU and the National COVID Cohort Collaborative with novel and traditional sources of clinical, epidemiological, social media, and citizen-generated data. From each data source we will extract data indicating COVID-19, as well as a set of social determinants of health that are commonly associated with healthcare utilization and access. To test for the presence of selection bias, we will construct and compare categorical probability distributions for each social determinant across COVID-19 cases in each data source. Differences in these distributions will indicate selection bias in one or more of the data sources. Next we will determine information bias by extending and adapting tests for missingness and other forms of information bias in the COVID-19 datasets to determine if the quantity and quality of these data vary with respect to clinical factors and those related to social determinants of health. This proposal therefore addresses a significant gap in knowledge: understanding not just the disparities in who is impacted by COVID-19, but who is represented by the data we have available for learning more about the disease. The identification and estimation the influence of social determinants of health on selection bias and information bias in COVID-19 data can guide the use of statistical and analytic approaches that can improve the external and internal validity of research and analytics that rely on these data, including estimates of disease prevalence, understanding the natural course of COVID-19, and identifying patients who are at risk for severe disease.