Tuning big data analysis infrastructure for HIV research

Key facts

Disease
COVID-19
Start & end year
2020
2022
Known Financial Commitments (USD)
$374,737
Funder
National Institutes of Health (NIH)
Principal Investigator
ANTON NEKRUTENKO
Research Location
United States of America
Lead Research Institution
PENNSYLVANIA STATE UNIVERSITY-UNIV PARK
Research Priority Alignment
N/A

Research Category
Pathogen: natural history, transmission and diagnostics
Research Subcategory
Pathogen genomics, mutations and adaptations
Special Interest Tags
N/A
Study Type
Non-Clinical
Clinical Trial Details
N/A
Broad Policy Alignment
Pending
Age Group
Not Applicable
Vulnerable Population
Not applicable
Occupations of Interest
Not applicable

SummaryThe COVID‐19/SARS‐CoV‐2 pandemic is a once in a generation, "all‐hands‐on‐deck" event for thescientific community. This pandemic is also the first in which real time genomic data are available,e.g. via GISAID [1], where genomic sequences are deposited daily. Vital insights about the virus andthe epidemic depend on rapid and reliable genomic analysis of diverse viral sample sequences bymultiple laboratories. Yet we repeatedly encounter the same avoidable shortcomings early in viralinvestigations, including COVID‐19: lack of reproducibility, rigor, and data/analytic sharing. Onlyabout 10% of the published genomes have quality metrics, primary data (read files), or any level ofdetails on analytics, making these data irreproducible and unverifiable; over 40% of GISAIDsubmissions to date provide no information about how the sequences were generated. Essentialquestions about the extent of intra‐host genomic variability (indicative of adaptation or multipleinfection), viral evolution (selection, recombination), transmission (phylogenetic andphylogeographic) cannot be answered reliably if researchers cannot trust/replicate the source dataand analytical approaches. One of the key goals/deliverables of this supplement will be the openanalytic workflows that can be used to curate and standardize genomic data, and high qualityannotated variation data.

Tuning big data analysis infrastructure for HIV research

Key facts

Abstract