Author ORCID Identifier

0000-0002-1797-2303

Date of Graduation

8-2020

Document Type

Dissertation (PhD)

Program Affiliation

Biostatistics, Bioinformatics and Systems Biology

Degree Name

Doctor of Philosophy (PhD)

Advisor/Committee Chair

Nicholas E. Navin

Committee Member

Ken Chen

Committee Member

Wenyi Wang

Committee Member

Mary Edgerton

Committee Member

Luay Nakhleh

Abstract

Tumor cells have heterogeneous genotypes, which drives progression and treatment resistance. Such genetic intratumor heterogeneity plays a role in the process of clonal evolution that underlies tumor progression and treatment resistance. Single-cell DNA sequencing is a promising experimental method for studying intratumor heterogeneity, but brings unique statistical challenges in interpreting the resulting data. Researchers lack methods to determine whether sufficiently many cells have been sampled from a tumor. In addition, there are no proven computational methods for determining the ploidy of a cell, a necessary step in the determination of copy number. In this work, software for calculating probabilities from a multinomial distribution was written to estimate the number of cells that must be sequenced (chapter 2). Two new methods were developed for predicting the number of mutations which would be discovered in additional single-cell sequencing of a tumor (chapter 3). Theoretical reasoning suggested that additional single-cell sequencing will always result in additional mutation discoveries, demonstrating the necessity a different approach to guide judgments of whether sufficiently many tumor cells were sequenced. To test computational methods for inferring ploidy from single-cell whole genome sequencing data, estimates were compared with fluorescence-based measurements of DNA content (chapter 4). Previously proposed methods for quantum estimation were found to correctly infer ploidy from most cells, enabling inference of precise copy number in copy number aberrations. Additionally, a weighting procedure based on a probabilistic model of sequencing read counts (described in chapter 3) reduced the error rate of ploidy inference in high-ploidy samples. The lessons learned and methodology proposed in this work may be useful in research and clinical applications of single-cell DNA sequencing.

Keywords

quantum estimation, quantum model, multinomial distribution, species capture problem, cancer, clonal evolution, ploidy, copy number aberrations, single-cell sequencing, whole genome sequencing

Share

COinS