Tuning big data analysis infrastructure for HIV research
- Funded by National Institutes of Health (NIH)
- Total publications:0 publications
Grant number: unknown
Grant search
Key facts
Disease
COVID-19Start & end year
20202022Known Financial Commitments (USD)
$374,737Funder
National Institutes of Health (NIH)Principal Investigator
ANTON NEKRUTENKOResearch Location
United States of AmericaLead Research Institution
PENNSYLVANIA STATE UNIVERSITY-UNIV PARKResearch Priority Alignment
N/A
Research Category
Pathogen: natural history, transmission and diagnostics
Research Subcategory
Pathogen genomics, mutations and adaptations
Special Interest Tags
N/A
Study Type
Non-Clinical
Clinical Trial Details
N/A
Broad Policy Alignment
Pending
Age Group
Not Applicable
Vulnerable Population
Not applicable
Occupations of Interest
Not applicable
Abstract
SummaryThe COVID‐19/SARS‐CoV‐2 pandemic is a once in a generation, "all‐hands‐on‐deck" event for thescientific community. This pandemic is also the first in which real time genomic data are available,e.g. via GISAID [1], where genomic sequences are deposited daily. Vital insights about the virus andthe epidemic depend on rapid and reliable genomic analysis of diverse viral sample sequences bymultiple laboratories. Yet we repeatedly encounter the same avoidable shortcomings early in viralinvestigations, including COVID‐19: lack of reproducibility, rigor, and data/analytic sharing. Onlyabout 10% of the published genomes have quality metrics, primary data (read files), or any level ofdetails on analytics, making these data irreproducible and unverifiable; over 40% of GISAIDsubmissions to date provide no information about how the sequences were generated. Essentialquestions about the extent of intra‐host genomic variability (indicative of adaptation or multipleinfection), viral evolution (selection, recombination), transmission (phylogenetic andphylogeographic) cannot be answered reliably if researchers cannot trust/replicate the source dataand analytical approaches. One of the key goals/deliverables of this supplement will be the openanalytic workflows that can be used to curate and standardize genomic data, and high qualityannotated variation data.