An integrated statistical model for subtle allelic imbalance with SNP arrays
Abstract
Detection of DNA aberrations in a tumor sample is often complicated by the contamination of DNA from normal tissues. Numerous statistical methods have been proposed. Most of these require tumor purities to exceed 10-15%. Here, we present a more sensitive statistical model with the aid from integrally modeling germline haplotypes. Our model consists of two a priori independent hidden Markov models, one each for the germline and tumor genomes. The combined model offers a powerful approach to dealing with low tumor purities, which has applications such as monitoring tumor remission and early detection. Our joint model for germline haplotypes and acquired DNA aberration is flexible, allowing a large number of chromosomal alterations, including balanced and imbalanced losses and gains, copy-neutral loss-of-heterozygosity (LOH) and tetraploidy. We found our model (which we term J-LOH) to be superior for localizing rare aberrations in a simulated 3% mixture sample. We then sought to increase sensitivity by accommodating repeat measurements from multiple array experiments, either from the same sample (technical replicates) or from samples of varying tumor purities. Since the array data are correlated across experiments, even when conditioning on aberration type and tumor purity, we used a multivariate normal distribution for the B-allele frequencies and log R ratios. Application of this extension to mixtures of DNA from paired tumor and normal cell lines resulted in a small but noticeable improvement. Finally, a limitation of models of this sort is an assumption of tumor homogeneity. We attempted to address this by expanding the aberration space in J-LOH the aberration state space and by replacing the global proportion in J-LOH with a vector of effective subclone proportions. In both simulated and real microarray data, we successfully identified subclonal aberrant regions with 5% subclonal mixture proportion. In summary, we present a tool for detecting DNA with potentially complex combinations of chromosomal aberrations in samples with varying levels of tumor purities.
Subject Area
Biostatistics|Genetics
Recommended Citation
Xia, Rui, "An integrated statistical model for subtle allelic imbalance with SNP arrays" (2014). Texas Medical Center Dissertations (via ProQuest). AAI3665068.
https://digitalcommons.library.tmc.edu/dissertations/AAI3665068