Using data to improve public health: COVID-19 secondment

  • Funded by UK Research and Innovation (UKRI)
  • Total publications:3 publications

Grant number: MR/W021455/1

Grant search

Key facts

  • Disease

    COVID-19
  • Start & end year

    2021
    2022
  • Known Financial Commitments (USD)

    $160,015.37
  • Funder

    UK Research and Innovation (UKRI)
  • Principal Investigator

    Dr. Francisco Perez Reche
  • Research Location

    United Kingdom
  • Lead Research Institution

    University of Aberdeen
  • Research Priority Alignment

    N/A
  • Research Category

    Epidemiological studies

  • Research Subcategory

    Disease susceptibility

  • Special Interest Tags

    N/A

  • Study Type

    Clinical

  • Clinical Trial Details

    Not applicable

  • Broad Policy Alignment

    Pending

  • Age Group

    Unspecified

  • Vulnerable Population

    Unspecified

  • Occupations of Interest

    Unspecified

Abstract

This project will develop computational methods to predict the severity and duration of COVID-19 using data on metabolic biomarkers from cohort studies and machine learning. Highly accurate predictions are crucial to identify the individuals that are most at risk of serious effects of COVID-19. The data to be used consists of sociodemographic information (age, sex, ethnicity, etc), information on health conditions before COVID-19, and metabolic markers from biofluids including blood, urine and faeces. Incorporating metabolomic data into the analysis is expected to significantly enhance our ability to predict the severity of COVID-19 compared to methods that focus on, e.g., sociodemographic data only. The project will study both the severity of COVID-19 and the duration of symptoms. The specific aims of the project are the following: Aim 1. To identify metabolic biomarkers associated with severe COVID-19 and long COVID. Aim 2. To train computer programs to predict the susceptibility of individuals to severe COVID-19 and long COVID. In practice, the aims will be separately addressed for the severity of COVID-19 and the duration of symptoms. The aim of the project, however, is to integrate the results for both characteristics and provide a general view on how metabolomics can help understand the manifestations of COVID-19. The severity of COVID-19 will be quantified in terms of whether or not patients show symptoms. For Aim 1, associations between the characteristics of individuals and the presence/absence of symptoms will be explored using statistical methods which will include graphical visualisation, hypothesis testing or logistic regression. Feature selection and dimensionality reduction strategies will be used to identify relevant features in terms of symptoms. For Aim 2, machine learning models will be trained to automatically classify individuals into symptomatic and asymptomatic classes. A variety of machine learning techniques will be implemented; partial least squares discriminant analysis, support vector machines or artificial neural networks are expected to be particularly suitable to deal with the high dimensionality and correlated character of metabolomic data. Several descriptions will be considered for the duration of symptoms which require different degrees of statistical power to be feasible. If the data gives enough statistical power, the most natural approach will be to consider the duration as a continuous random variable. In this case, Aim 1 will be fulfilled by using regression methods to assess the statistical significance of the different predictor variables for each individual. A range of machine learning methods will be explored to train a predictor for the duration of symptoms. Suitable candidates may include partial least squares regression, principal component regression or artificial neural networks. An alternative description of durations that will require less statistical power will consist in discretising the duration into several categories. For example, into short (≤10 days) and long (>10 days) duration to describe short and long COVID, respectively. In this case, Aims 1 and 2 can be achieved using methods similar to those described above for the analysis of the presence or absence of symptoms.

Publicationslinked via Europe PMC

Last Updated:an hour ago

View all publications at Europe PMC

Impact of heterogeneity on infection probability: Insights from single-hit dose-response models.

ESPClust: unsupervised identification of modifiers for the effect size profile in omics association studies.

Age-specific all-cause mortality trends in the UK: Pre-pandemic increases and the complex impact of COVID-19.