Author ORCID Identifier

0000-0002-5340-1890

Date of Graduation

12-2025

Document Type

Dissertation (PhD)

Program Affiliation

Quantitative Sciences

Degree Name

Doctor of Philosophy (PhD)

Advisor/Committee Chair

Han Liang

Committee Member

Traver Hart

Committee Member

Li Ma

Committee Member

Peng Wei

Committee Member

Bing Zhang

Abstract

The development of high-throughput technologies greatly facilities precision oncology in the era of big data. With the growing size of pan-cancer genomic, transcriptomic and proteomic profiling data, there is imperative need for integrative analysis of molecular and clinical information in an efficient way. Here, we conducted omics analysis on three different cancer studies and developed an LLM-based bioinformatics chatbot, DrBioRight, that can perform cancer omics data mining based on natural language. We identified ultraconserved elements (UCE) that can be enhancers of tumor suppressor and silencers of oncogene in colon cancer via whole genome and targeted UCE sequencing from two cohorts. We characterized the tumor microenvironment of melanoma brain metastasis (MBM) during anti-PD1 therapy using single cell RNA sequencing (scRNA-seq) of patients’ cerebrospinal fluid (CSF) samples. We profiled functional proteomics from kidney tumor samples using reverse phase protein array (RPPA) and formulated an MTOR score that improves prognosis prediction of recurrent risk given Leibovich risk scoring. Finally, we selected high quality user queries to finetune the LLM modules of DrBioRight, evaluated its responses and improved its performances. Our results not only provide unique resources to the cancer research community, but also an online bioinformatics chatbot which helps users get access and conduct integrative analysis on cancer omics data efficiently.

Keywords

RPPA, genomics, scRNA-seq, cancer, LLM

Available for download on Friday, December 04, 2026

Share

COinS