Adapting genetic clustering techniques to SARS-CoV-2

One of the positive outcomes of the ongoing SARS-CoV-2 pandemic has been the rapid collection and sharing of virus genomc data. Today, there are over 9 million SARS-CoV-2 genomes from around the world in public databases. This abundance of data has also created tremendous new challenges for genomic epidemiology - the use of genetic sequences to reconstruct the spread and adaptation of an infectious disease. The purpose of this project is to contribute to the global effort to update the computational toolkit for genomic epidemiology for the SARS-CoV-2 pandemic by focusing on clustering methods. Genetic clustering is a fundamental category of methods for analyzing sequences where we collect similar observations into groups. Clusters are intuitive and have a broad range of applications. For the study and management of infectious diseases, for example, we use clusters to detect outbreaks, to find associations between risk factors and the spread of disease, and to reconstruct how different infections are related back in time. Clusters are also a useful device for reducing large data sets while preserving the essential information. Many of the standard clustering methods used for infectious disease were developed and honed on HIV-1 sequences, not only because of the enormous global health burden of this disease, but also because these data are abundant around the world. Our specific objectives are to: (1) adapt methods from network science to partition large databases of SARS-CoV-2 genomes into clusters that are calibrated to measure the impact of age, location and other risk factors on transmission rates; (2) develop fast, approximate methods to extract epidemiological information, such as the number of unsampled infections, from cluster-based trees updated in real time; and (3) to adapt a method from dynamic social network analysis to reconstruct the role of recombination (the exchange of fragments between genomes) in the evolutionary history of coronaviruses.

Adapting genetic clustering techniques to SARS-CoV-2

Key facts

Abstract

Publicationslinked via Europe PMC

Psilocybin for treatment-resistant depression without psychedelic effects: study protocol for a 4-week, double-blind, proof-of-concept randomised controlled trial.

Authors

Publish Year

Journal

DOI