
Faculty, Staff and Student Publications
Publication Date
6-1-2023
Journal
Biometrics
Abstract
Sample sizes vary substantially across tissues in the Genotype-Tissue Expression (GTEx) project, where considerably fewer samples are available from certain inaccessible tissues, such as the substantia nigra (SSN), than from accessible tissues, such as blood. This severely limits power for identifying tissue-specific expression quantitative trait loci (eQTL) in undersampled tissues. Here we propose Surrogate Phenotype Regression Analysis (Spray) for leveraging information from a correlated surrogate outcome (eg, expression in blood) to improve inference on a partially missing target outcome (eg, expression in SSN). Rather than regarding the surrogate outcome as a proxy for the target outcome, Spray jointly models the target and surrogate outcomes within a bivariate regression framework. Unobserved values of either outcome are treated as missing data. We describe and implement an expectation conditional maximization algorithm for performing estimation in the presence of bilateral outcome missingness. Spray estimates the same association parameter estimated by standard eQTL mapping and controls the type I error even when the target and surrogate outcomes are truly uncorrelated. We demonstrate analytically and empirically, using simulations and GTEx data, that in comparison with marginally modeling the target outcome, jointly modeling the target and surrogate outcomes increases estimation precision and improves power.
Keywords
Quantitative Trait Loci, Phenotype, Algorithms, Regression Analysis, EM algorithm, genetic association analysis, missing data, multivariate analysis, surrogate outcomes
DOI
10.1111/biom.13629
PMID
35218565
PMCID
PMC11023615
PubMedCentral® Posted Date
4-17-2024
PubMedCentral® Full Text Version
Author MSS
Published Open-Access
yes
Included in
Bioinformatics Commons, Biomedical Informatics Commons, Genetic Phenomena Commons, Medical Genetics Commons, Oncology Commons