A simulation-based pre-training framework for building more robust and trustworthy machine learning-based clinical prediction models
- Funded by Department of Health and Social Care / National Institute for Health and Care Research (DHSC-NIHR)
- Total publications:1 publications
Grant number: NIHR173695
Grant search
Key facts
Disease
COVID-19Start & end year
20252028Known Financial Commitments (USD)
$585,693.25Funder
Department of Health and Social Care / National Institute for Health and Care Research (DHSC-NIHR)Principal Investigator
N/A
Research Location
United KingdomLead Research Institution
University of OxfordResearch Priority Alignment
N/A
Research Category
N/A
Research Subcategory
N/A
Special Interest Tags
N/A
Study Type
Non-Clinical
Clinical Trial Details
N/A
Broad Policy Alignment
Pending
Age Group
Not Applicable
Vulnerable Population
Not applicable
Occupations of Interest
Not applicable
Abstract
Research Question -- Can we provide a general framework for building more robust machine learning based clinical prediction models? Background -- Machine learning-based clinical prediction models can learn the relationship between predictors and a future outcome. It does this by looking for patterns in historical data that has been collected. However, the properties of the historical data may not be fully representative of the wider population, it could be biased, or it may change over time. As a consequence, prediction models may fail to perform when under conditions for which they have not been trained for. There is a pressing need for methodologies to build more robust machine learning clinical prediction models. Aims and objectives -- Our aim is to develop a general pre-training framework for clinical prediction models that augments real historical data with simulated data at training time. Simulated data will describe challenging scenarios or constraints not seen in the historical data. By introducing these synthetic illustrations at training time, the aim is to bestow greater robustness upon the prediction models so that when they encounter unusual data in real-world use, they already have resilient mechanisms to handle such situations. Our objectives are to (i) embed approaches to simulate complex high-dimensional data types to enable a richer range of applications and to demonstrate how the framework can be used to build improved prediction models in the presence of (ii) data drift and (iii) algorithmic fairness constraints. Methods -- We will explore deep learning-based techniques for synthetic high-dimensional data generation and using both molecular and image data developed an example use of our pre-training framework to construct an ovarian cancer prognosis model that has improve out-of-distribution consistency compared to the original published model. We will review and embed recent approaches to data drift simulation in our framework and demonstrate how a prediction model can be made resilient to different forms of data drift for longer using a published COVID-19 prediction modelling example. Finally we will explore the algorithmic fairness literature to identify common fairness constraints and build these into prediction models as pre-trained properties. We will illustrate the utility of this pre-training for gender and ethnicity-related fairness within a recent Welsh Childhood Mental Health study example. Timelines for delivery -- This is a 36-month project and we will broadly address each of the three objectives consecutively in 12 month blocks. Anticipated impact and dissemination -- We have already developed two application examples that fit within this pre-training framework and our objectives seek to develop three further examples to demonstrate the broad utility of the framework. We will also seek to develop training materials and toolkits in the use of these techniques and serve them through a national learning platform. Our aim is to encourage wider adoption of these techniques by prediction model developers.
1 Publication linked via Europe PMC
Last Updated:6 days ago
View all publications at Europe PMC