Understanding the evolution and diversity of viral pathogens using next generation sequencing technologies

A main cause of animal and human disease are infectious agents such as viruses. In this project we wish to study the genetic material of these pathogens. Genetic material is encoded as ordered 'sequences' of nucleotides. This information determine a virus' biological properties and response to the host immune system and thus the success of veterinary or medical treatments, whether they are vaccine or drug-based. Until very recently pathogen genetic material was characterized using Sanger sequencing, a technique invented in the late 1970s. More recently new sequencing technologies have become available that permit extremely large numbers of sequence fragments, called 'reads', to be generated. Many are referring to this as a revolution in sequencing because it now permits small groups of researchers to tackle projects previously only possible at sequencing centres, while sequencing centres can tackle truly massive sequencing projects, for example, the initiative to sequence 1,000 human genomes. This introduces the potential to explore pathogen genetic diversity on a scale that was previously unprecedented. However, there is a downside. The amount of data being generated is outstripping our ability to analyse it routinely, let alone carry out sophisticated evolutionary analysis. Particularly when it comes to pathogens, data sets could potentially be generated for which no suitable computational tools exist. This is exactly what happened in the case of the preliminary analysis in this project. HIV data was generated of importance to understanding drug resistance for which no software was available. This lack of software is because most research effort is being directed at assembling single complete genomes from next generation sequence data. However, with pathogens the interesting questions concern the diversity of sequences or so-called 'ultra-deep' sequencing. As a consequence, in this project we propose to develop, reliable, easy to use software that will be generically useful for all types of pathogen data sets. This will involve exploiting both the error information that is intrinsic to the new technology sequencing platforms and our considerable knowledge of the pathogen systems that we wish to analyse. Combined, this will permit us to develop software that will be able to summarise the variation in a sample of sequences and that will provide confidence in the sequence changes observed. Just as importantly, our computer-based approach will permit the sophisticated analysis of properties of the data in the hunt for clues to understanding a pathogen's biology. We will use this software in conjunction with next-generation sequence data to provide a detailed insight into intra-host dynamics of RNA viral populations. Particular focus will be given to genome diversity when the selective landscape within the host is altered, for example following transmission between individuals, disease progression or the initiation/alteration of drug treatments. Additionally our approach will be generically applicable to a wide range of research areas where understanding genetic variation is key.

Understanding the evolution and diversity of viral pathogens using next generation sequencing technologies

Key facts

Abstract

Publicationslinked via Europe PMC

Bayesian phylogenetics with BEAUti and the BEAST 1.7.

Authors

Publish Year

Journal

DOI

Analysis of high-depth sequence data for studying viral diversity: a comparison of next generation sequencing platforms using Segminator II.

Authors

Publish Year

Journal

DOI

BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics.

Authors

Publish Year

Journal

DOI

The evolutionary analysis of emerging low frequency HIV-1 CXCR4 using variants through time--an ultra-deep approach.

Authors

Publish Year

Journal

DOI