Addressing Algorithmic Unreliability and Dataset Shift in EHR-based Risk Prediction Models
- Funded by National Institutes of Health (NIH)
- Total publications:0 publications
Grant number: 5F31LM014282-02
Grant search
Key facts
Disease
COVID-19Start & end year
2023.02026.0Known Financial Commitments (USD)
$48,974Funder
National Institutes of Health (NIH)Principal Investigator
MD-PHD STUDENT Likhitha KollaResearch Location
United States of AmericaLead Research Institution
UNIVERSITY OF PENNSYLVANIAResearch Priority Alignment
N/A
Research Category
Secondary impacts of disease, response & control measures
Research Subcategory
Indirect health impacts
Special Interest Tags
N/A
Study Type
Clinical
Clinical Trial Details
Not applicable
Broad Policy Alignment
Pending
Age Group
Unspecified
Vulnerable Population
Unspecified
Occupations of Interest
Unspecified
Abstract
Project Summary Predictive analytic algorithms built on electronic health record (EHR) inputs, such as patient characteristics, administrative codes, and lab values, are increasingly used in health care settings to direct resources to high- risk patients. Data play an indispensable role in the development and deployment of effective predictive models. The greatest, yet understudied, challenge in the maintenance of these tools arises from a data-related concern, namely dataset shift, in which training data distribution differs from the population on which the algorithm is deployed, leading to model deterioration and inaccurate risk predictions. Dataset shift is a pervasive cause of algorithmic unreliability in EHR-based models due to inevitable changes in physician behaviors and health system operations that alter (1) the input distribution (covariate drift); and (2) changes in the relationship between predictors and outcome (concept drift). Sudden changes in healthcare utilization during the COVID-19 pandemic may have impacted the data generation process and the performance of clinical predictive models. Our preliminary study showed that decreased collection of patient labs during the COVID-19 quarantine period led to sparse data generation for important predictors of a single-institution EHR-based mortality risk prediction algorithm, underpredicting risk for patients with advanced cancers. Despite the increasing use of predictive tools in high stakes clinical applications; and growing recognition of dataset shift, we lack a framework for reasoning shift and its effects on care delivery; and for proactively addressing shift to maintain performance over time. In Aim 1, we propose to extend prior works on shift to a nationally deployed risk prediction algorithm, the VA Care Assessment Need (CAN) model, used on millions of VA beneficiaries each year. The VA CAN model predicts the likelihood of hospitalization within 90 days or 1 year after a primary care encounter to identify high-risk patients who would benefit from additional outpatient interventions. We also investigate covariate and concept drift as two possible mechanisms for COVID-19 associated dataset shift. In Aim 2, we apply an interrupted time series design to study the association between sudden shift at the onset of the pandemic on case-management decisions. Current solutions to address dataset shift have primarily been reactive (i.e. model retraining with recent data), however, fail to be robust in new testing environments. In Aim 3, we consider revision of the VA CAN model via machine learning and inclusion of variables that reflect potential drivers of shift. This project is innovative as it is the first to leverage a rigorous statistical framework to study extent and mechanisms of shift and develop proactive guidelines for model maintenance. The training plan is rigorous for Ms. Kolla, an MD-PhD student in biostatistics. She is strongly supported by her department and institution as well as her two high- qualified sponsors: Dr. Jinbo Chen, an expert in EHR-based risk prediction modeling, and Dr. Ravi Parikh, an expert in implementation of predictive analytics. The proposed research and career development plan will be an essential step towards Ms. Kolla's development as an interdisciplinary and independent physician-scientist.