
Faculty, Staff and Student Publications
Publication Date
11-10-2022
Journal
Scientific Reports
Abstract
Detection of viral transmission clusters using molecular epidemiology is critical to the response pillar of the Ending the HIV Epidemic initiative. Here, we studied whether inference with an incomplete dataset would influence the accuracy of the reconstructed molecular transmission network. We analyzed viral sequence data available from ~ 13,000 individuals with diagnosed HIV (2012-2019) from Houston Health Department surveillance data with 53% completeness (n = 6852 individuals with sequences). We extracted random subsamples and compared the resulting reconstructed networks versus the full-size network. Increasing simulated completeness was associated with an increase in the number of detected clusters. We also subsampled based on the network node influence in the transmission of the virus where we measured Expected Force (ExF) for each node in the network. We simulated the removal of nodes with the highest and then lowest ExF from the full dataset and discovered that 4.7% and 60% of priority clusters were detected respectively. These results highlight the non-uniform impact of capturing high influence nodes in identifying transmission clusters. Although increasing sequence reporting completeness is the way to fully detect HIV transmission patterns, reaching high completeness has remained challenging in the real world. Hence, we suggest taking a network science approach to enhance performance of molecular cluster detection, augmented by node influence information.
Keywords
Humans, HIV Infections, Cluster Analysis, Molecular Epidemiology, Molecular Sequence Data, Epidemics, Phylogeny
DOI
10.1038/s41598-022-21924-8
PMID
36357480
PMCID
PMC9648870
PubMedCentral® Posted Date
11-10-2022
PubMedCentral® Full Text Version
Post-print
Published Open-Access
yes
Included in
Diseases Commons, Genetic Phenomena Commons, Medical Genetics Commons, Public Health Commons