Author ORCID Identifier
0000-0002-1797-2303
Date of Graduation
8-2020
Document Type
Dissertation (PhD)
Program Affiliation
Biostatistics, Bioinformatics and Systems Biology
Degree Name
Doctor of Philosophy (PhD)
Advisor/Committee Chair
Nicholas E. Navin
Committee Member
Ken Chen
Committee Member
Wenyi Wang
Committee Member
Mary Edgerton
Committee Member
Luay Nakhleh
Abstract
Tumor cells have heterogeneous genotypes, which drives progression and treatment resistance. Such genetic intratumor heterogeneity plays a role in the process of clonal evolution that underlies tumor progression and treatment resistance. Single-cell DNA sequencing is a promising experimental method for studying intratumor heterogeneity, but brings unique statistical challenges in interpreting the resulting data. Researchers lack methods to determine whether sufficiently many cells have been sampled from a tumor. In addition, there are no proven computational methods for determining the ploidy of a cell, a necessary step in the determination of copy number. In this work, software for calculating probabilities from a multinomial distribution was written to estimate the number of cells that must be sequenced (chapter 2). Two new methods were developed for predicting the number of mutations which would be discovered in additional single-cell sequencing of a tumor (chapter 3). Theoretical reasoning suggested that additional single-cell sequencing will always result in additional mutation discoveries, demonstrating the necessity a different approach to guide judgments of whether sufficiently many tumor cells were sequenced. To test computational methods for inferring ploidy from single-cell whole genome sequencing data, estimates were compared with fluorescence-based measurements of DNA content (chapter 4). Previously proposed methods for quantum estimation were found to correctly infer ploidy from most cells, enabling inference of precise copy number in copy number aberrations. Additionally, a weighting procedure based on a probabilistic model of sequencing read counts (described in chapter 3) reduced the error rate of ploidy inference in high-ploidy samples. The lessons learned and methodology proposed in this work may be useful in research and clinical applications of single-cell DNA sequencing.
Keywords
quantum estimation, quantum model, multinomial distribution, species capture problem, cancer, clonal evolution, ploidy, copy number aberrations, single-cell sequencing, whole genome sequencing
Included in
Computational Biology Commons, Data Science Commons, Genomics Commons, Medicine and Health Sciences Commons, Probability Commons, Special Functions Commons, Statistical Methodology Commons