Ecology or genetics? Adapting machine learning approaches to understand determinants of cross-species transmission and virulence in RNA viruses

  • Funded by UK Research and Innovation (UKRI)
  • Total publications:8 publications

Grant number: MR/T027355/1

Grant search

Key facts

  • Disease

    Disease X
  • Start & end year

  • Known Financial Commitments (USD)

  • Funder

    UK Research and Innovation (UKRI)
  • Principle Investigator

  • Research Location

    United Kingdom, Europe
  • Lead Research Institution

    University of Liverpool
  • Research Category

    Pathogen: natural history, transmission and diagnostics

  • Research Subcategory

    Pathogen morphology, shedding & natural history

  • Special Interest Tags


  • Study Subject


  • Clinical Trial Details


  • Broad Policy Alignment


  • Age Group

    Not Applicable

  • Vulnerable Population

    Not applicable

  • Occupations of Interest

    Not applicable


Emerging infectious diseases remain a prominent threat to global health, e.g., Ebola virus, Zika virus. In 2015, the WHO designated 'Disease X' to indicate the serious potential of previously unknown emerging pathogens to cause public health crises. Though zoonotic RNA viruses are known to present higher risks of emergence, detailed determinants of cross-species transmission remain unclear. Zoonotic viruses also vary widely in their capability to cause severe disease. To predict public health impacts of 'Disease X', a better understanding of which traits drive this variation in infectivity and virulence is urgently needed. Whilst previous approaches have focused on ecological predictors, these traditional frameworks have been unable to capture the information within increasingly available RNA virus sequences. This research aims to capitalise upon the potential power within large genetic data resources and quantify comparative influences of genetic versus ecological traits of RNA viruses and hosts upon cross-species transmission dynamics. To fully integrate novel, high-dimensional genetic data, new analytical approaches are needed. I will apply machine learning as a state-of-the-art statistical methodology, comparing several advanced approaches, e.g. gradient boosting, a method of gradual model learning which outperforms traditional methods. Models will span all known mammal and avian RNA viruses (22 families) using the exceptional breadth of EID2, a large, host-virus infectivity dataset. This project will additionally develop further text-mining tools to capture and integrate virulence data within EID2. The proposed models will allow tests of evolutionary theory across a range of RNA viruses. Quantified model outputs will contribute to public health risk assessments by informing prioritisation for novel viruses and advancing frameworks for emergence predictions, moving towards a 'smarter', empirically-driven strategy to prevent future disease burden.

Publicationslinked via Europe PMC

Last Updated:40 minutes ago

View all publications at Europe PMC

Past and future uses of text mining in ecology and evolution.

Tracking changes between preprint posting and journal publication during a pandemic.

Mammal virus diversity estimates are unstable due to accelerating discovery effort.

The science of the host-virus network.

Impact of climatic, demographic and disease control factors on the transmission dynamics of COVID-19 in large cities worldwide.

Lessons from the influx of preprints during the early COVID-19 pandemic.

The evolving role of preprints in the dissemination of COVID-19 research and their impact on the science communication landscape.