Author ORCID Identifier

Date of Graduation


Document Type

Dissertation (PhD)

Program Affiliation

Cancer Biology

Degree Name

Doctor of Philosophy (PhD)

Advisor/Committee Chair

Xifeng Wu

Committee Member

Jaffer A. Ajani

Committee Member

Paul J. Chiao

Committee Member

Guillermo Garcia-Manero

Committee Member

Jeffrey N. Myers

Committee Member

Yuanqing Ye


Recent advancement in technologies including next-generation sequencing and production-scale throughput qPCR have revolutionized the identification of biomarkers in the epidemiology field. In response to the vast amount of data generated from high-throughput technologies, novel inventions in the computer sciences fields have been applied to analyze these data. The current study demonstrates the application of such technologies in a variety of scenarios.

I first described how targeted and whole-exome sequencing were used to identify somatic mutations which marked the differences between colorectal adenomas and adenocarcinomas. A statistical test based on the unique clustering pattern of tumor suppressor genes and oncogenes was employed to locate driver mutations. Random forest algorithm was performed to find somatic mutations which best classify samples into adenoma and adenocarcinoma. 20 important mutated genes (TP53, KRAS, APC, PIK3CA, SMAD4, FBXW7, CTNNB1, SYNE1, CDC27, CSMD1, NRAS, RYR3, NALCN, LRP1B, FAT4, ATM, TMPRSS13, SOX9, CSMD3, MED12) which constantly served to separate adenomas from adenocarcinomas were discovered.

The Second project focused on exploring differentially expressed genes (DEG) and pathways enriched with such genes in colorectal adenomas and adenocarcinomas. Fold changes of paired premalignant/malignant lesions compared to normal adjacent tissues from the same patient were analyzed. And the ratio of 20-gene panel found in the first project were also found to differ between colorectal adenomas and cancers.

The last project in the dissertation demonstrated the potential for microRNA (miR) in the serum to be used as a non-invasive prognostic factor for non-muscle invasive bladder cancers (NMIBC). With the help from a vast amount of miR profiles, we were able to identify two panels in overall population (miR-16/miR-378 + miR-24/miR-331-3p for recurrence and miR-16/miR-21 + miR-24/miR-375 for progression) and two panels in BCG-treated population (miR-16/miR-378 + miR-24/miR-331-3p for recurrence and miR-16/miR-21 + miR-24/miR-375 for progression).

Taken together, these projects showcased the discovery of tissue and circulating biomarkers with cutting-edge technologies. These biomarkers could lead to more informed distribution of limited medical resources for monitoring clinical outcomes as well as the origin for future studies on deciphering the intricate mechanisms underlying tumorigenesis, host response and patient survival.


colorectal cancer adenoma bladder microRNA somatic mutation transcriptome

Included in

Epidemiology Commons