Author ORCID Identifier

Date of Graduation


Document Type

Thesis (MS)

Program Affiliation

Biostatistics, Bioinformatics and Systems Biology

Degree Name

Doctor of Philosophy (PhD)

Advisor/Committee Chair

Liang Li

Committee Member

Yu Shen

Committee Member

Ying Yuan

Committee Member

Ya-Chen Tina Shih

Committee Member

Jian Wang


Establishing clear causality between exposures and outcomes is often complicated by confounders in observational studies, which leads to imbalance in covariate distributions between treatments and biased treatment effect inference. The propensity score (PS) has been widely used to adjust this covariate imbalance in observational data. However, the propensity score analysis methods rely on a correctly specified parametric PS model. When the model is misspecified, the covariate imbalance may occur, which leads to biased estimation of the treatment effect. Therefore, it is necessary to study how to improve the model misspecification in propensity score analysis.

My Ph.D. dissertation consists of three aims. In Aim 1, we examined whether the optimization of global balance — the mean balance of covariates or their transformations in the overall study population, can circumvent the need for correct propensity score model specification, and whether the use of a propensity score model further improves the estimation performance compared to methods without modeling propensity score. In Aim 2, we developed a propensity score analysis framework, the propensity score with local balance (PSLB), which incorporates nonparametric propensity score models and improves the balancing property of the estimated propensity score compared to existing methods that only optimize the global balance. In Aim 3, we developed a subgroup analysis method that is robust to propensity score model misspecification. Specifically, we proposed a new algorithm, the guaranteed subgroup balancing propensity score (G-SBPS), to ensure exact subgroup balance — the mean balance of covariates in each subgroup. In addition, we implemented kernel methods in G-SBPS to improve the propensity score model fitting. For each of the aim, we provided theoretical and simulation-based justification for the research question or proposed methodologies, and applied the proposed methods to the right heart catheterization (RHC) data to estimate the length of hospital stay or the diabetes self-management training (DSMT) data to evaluate the hospitalization rate within three years.


Observational study, Propensity score, Nonparametric modeling, Inverse probability weighting, Covariate balancing, Subgroup analysis

Available for download on Thursday, July 25, 2024