Statistical tests for homogeneity using parametric and semiparametric models with applications to meta-analysis and statistical genetics

Chuan Hong, The University of Texas School of Public Health


Publication bias occurs when the publication of research results depends not only on the quality of the research but also on the direction, magnitude, or statistical significance of the results. The consequence is that published studies may not represent all valid studies undertaken, and this bias may threaten the validity of systematic reviews and meta-analyses. However, both detecting and accounting for publication bias are challenging in a multivariate meta-analysis setting because some studies may be completely unpublished while others may selectively report only part of multiple outcomes. In this paper, we propose a pseudolikelihood-based score test for detecting publication bias in multivariate random-effects meta-analysis. To the best of our knowledge, this is the first test for detecting publication bias in a multivariate meta-analysis setting. Two detailed case studies are given to show the limitations of univariate tests and to illustrate the advantage of the proposed test in practice. Through simulation studies, the proposed test is found to be more powerful than the existing univariate tests. In addition, by empirically evaluating 169 systematic reviews with multiple outcomes from the Cochrane Database, the proposed multivariate test is shown to identify more studies with publication bias than existing univariate tests. Motivated by analyses of DNA methylation data, we propose a semiparametric mixture model, namely the generalized exponential tilt mixture model, to account for heterogeneity between differentially methylated and non-differentially methylated subjects in the cancer group, and capture the differences in higher order moments (e.g. mean and variance) between subjects in cancer and normal groups. A pairwise pseudolikelihood is constructed to eliminate the unknown nuisance function. To circumvent boundary and non-identifiability problems as in parametric mixture models, we modify the pseudolikelihood by adding a penalty function. In addition, test with simple asymptotic distribution has computational advantages over permutational test for high-dimensional genetic and epigenetic data. We propose a pseudolikelihood based expectation--maximization test, and show the proposed test follows a simple chi-squared limiting distribution. Simulation studies show that the proposed test controls Type I errors well and has better power compared to several current tests. In particular, the proposed test outperforms the commonly used tests under all simulation settings considered, especially when there are variance differences between two groups. The proposed test is applied to a real data set to identify differentially methylated sites between ovarian cancer subjects and normal subjects. Quantitative trait locus analysis has been used as an important tool to identify markers where the phenotype or quantitative trait is linked with the genotype. Most existing tests for single locus association with quantitative traits aim at the detection of the mean differences across genotypic groups. However, recent research has revealed functional genetic loci that affect the variance of traits, known as variability-controlling quantitative trait locus. In addition, it has been suggested that many genotypes have both mean and variance effects, while the mean effects or variance effects alone may not be strong enough to be detected. The existing methods accounting for unequal variances include the Levene's test, the Lepage test and the D-test, but suffer from their limitations of lack of robustness or lack of power. We propose a semiparametric model and a novel pairwise conditional likelihood ratio test. Specifically, the semiparametric model is designed to identify the combined differences in higher moments among genotypic groups. The pairwise likelihood is constructed based on conditioning procedure, where the unknown reference distribution is eliminated. We show that the proposed pairwise likelihood ratio test has a simple asymptotic chi-square distribution, which does not require permutation or bootstrap procedures. Simulation studies show that the proposed test performs well in controlling Type I errors and having competitive power in identifying the differences across genotypic groups. In addition, the proposed test has certain robustness to model mis-specifications. The proposed test is illustrated by an example of identifying both mean and variances effects in body mass index using the Framingham Heart Study data.

Subject Area


Recommended Citation

Hong, Chuan, "Statistical tests for homogeneity using parametric and semiparametric models with applications to meta-analysis and statistical genetics" (2016). Texas Medical Center Dissertations (via ProQuest). AAI10126796.