Adapting genetic clustering techniques to SARS-CoV-2
- Funded by Canadian Institutes of Health Research (CIHR)
- Total publications:1 publications
Grant number: 202203PJT
Grant search
Key facts
Disease
COVID-19Start & end year
20222027Known Financial Commitments (USD)
$459,459Funder
Canadian Institutes of Health Research (CIHR)Principal Investigator
N/A
Research Location
CanadaLead Research Institution
Western UniversityResearch Priority Alignment
N/A
Research Category
Pathogen: natural history, transmission and diagnostics
Research Subcategory
Pathogen genomics, mutations and adaptations
Special Interest Tags
N/A
Study Type
Non-Clinical
Clinical Trial Details
N/A
Broad Policy Alignment
Pending
Age Group
Not Applicable
Vulnerable Population
Not applicable
Occupations of Interest
Not applicable
Abstract
One of the positive outcomes of the ongoing SARS-CoV-2 pandemic has been the rapid collection and sharing of virus genomc data. Today, there are over 9 million SARS-CoV-2 genomes from around the world in public databases. This abundance of data has also created tremendous new challenges for genomic epidemiology - the use of genetic sequences to reconstruct the spread and adaptation of an infectious disease. The purpose of this project is to contribute to the global effort to update the computational toolkit for genomic epidemiology for the SARS-CoV-2 pandemic by focusing on clustering methods. Genetic clustering is a fundamental category of methods for analyzing sequences where we collect similar observations into groups. Clusters are intuitive and have a broad range of applications. For the study and management of infectious diseases, for example, we use clusters to detect outbreaks, to find associations between risk factors and the spread of disease, and to reconstruct how different infections are related back in time. Clusters are also a useful device for reducing large data sets while preserving the essential information. Many of the standard clustering methods used for infectious disease were developed and honed on HIV-1 sequences, not only because of the enormous global health burden of this disease, but also because these data are abundant around the world. Our specific objectives are to: (1) adapt methods from network science to partition large databases of SARS-CoV-2 genomes into clusters that are calibrated to measure the impact of age, location and other risk factors on transmission rates; (2) develop fast, approximate methods to extract epidemiological information, such as the number of unsampled infections, from cluster-based trees updated in real time; and (3) to adapt a method from dynamic social network analysis to reconstruct the role of recombination (the exchange of fragments between genomes) in the evolutionary history of coronaviruses.
Publicationslinked via Europe PMC
Last Updated:an hour ago
View all publications at Europe PMC