Statistical Methods to Correct for Sampling Bias in Prevalent Cohorts and Meta Analysis

Jin Piao, The University of Texas School of Public Health


Biased sampling is a commonly encountered sampling mechanism where the subjects in a target population are not given equal chances to be selected to a study, either accidentally by natural circumstances or intentionally by design. The major challenge is that the distribution of the observed data is not the same as that of the target population. In this dissertation, we considered two specific kinds of biased sampling: sampling bias in prevalent cohort studies and publication bias in meta-analysis. Despite treatment advances in breast cancer, some patients will have cancer recurrence within 5 years of their initial treatments. To better understand the relationship between patient characteristics at diagnosis and their survival after an intermediate event such as the local and regional cancer recurrence, it is of interest to analyze ordered bivariate survival data. A registry database reflects the real-word patient population, and provides a valuable resource for investigating these associations. One challenge in analyzing registry data is that the observed bivariate times tend to be longer than those in the target population due to the sampling scheme. In paper 1, we propose to jointly model the ordered bivariate survival data using a copula model and appropriately adjusting for the sampling bias. We develop an estimating procedure to simultaneously estimate the parameters for the marginal survival functions and the association parameter in the copula model, and use a two-stage expectation-maximization (EM) algorithm. Using empirical process theory, we prove that the estimators have strong consistency and asymptotic normality. We conduct simulations studies to evaluate the finite sample performance of the proposed method. We apply the proposed method to analyze a cohort of patients with relapsing breast cancer from the Surveillance, Epidemiology and End Results (SEER)--Medicare linked data to evaluate the association between patient characteristics and residual survival. Publication bias occurs when the published research results are systematically unrepresentative of the population of studies that have been conducted, and is a potential threat to meaningful meta-analysis. The Copas selection model provides a flexible framework for correcting estimates and offers considerable insight into the publication bias. However, maximizing the observed likelihood under the Copas selection model is challenging because the observed data contain very little information on the latent variable. In paper 2, we study a Copas-like selection model and propose an EM algorithm for estimation based on the full likelihood. Empirical simulation studies show that the EM algorithm and its associated inferential procedure performs well and avoids the non-convergence problem when maximizing the observed likelihood. In paper 3, we extend the Copas selection model from univariate outcomes to bivariate outcomes for the correction of publication bias when the probability of a study being published can depend on its sensitivity, specificity, and the associated standard errors. We develop an EM algorithm for the maximum likelihood estimation under the proposed selection model. We investigate the finite sample performance of the proposed method through simulation studies and illustrate the method by assessing a meta-analysis of 17 published studies of a rapid diagnostic test for influenza.

Subject Area


Recommended Citation

Piao, Jin, "Statistical Methods to Correct for Sampling Bias in Prevalent Cohorts and Meta Analysis" (2017). Texas Medical Center Dissertations (via ProQuest). AAI10606247.