Deterministic Bayesian variable selection developments for binary outcomes

Matthew D Koslovsky, The University of Texas School of Public Health


In behavioral public health research, researchers often use variable selection methods to identify or reaffirm relations between potential risk factors and unhealthy behaviors, such as smoking. Unfortunately, the available suites of variable selection methods are typically not developed to handle the complex data structures (e.g., related covariates, intensive longitudinal data) found in smoking behavior research and the higher-level model classes (e.g. multistate Markov models) used to analyze the behavioral data. Thus, novel variable selection methods are necessary to appropriately extract information captured in these data sets. In this study, we take advantage of the efficiency and flexibility of a deterministic Bayesian variable selection method, expectation maximization variable selection (EMVS), to identify and reaffirm risk factors associated with binary smoking outcomes. The main contributions of this work are threefold. First, we develop EMVS to handle related covariates (i.e., qualitative covariates and interaction terms) in logistic regression models, evaluate its performance compared to rival methods in various simulation settings, and apply it to data from the Mexican American Tobacco Use in Children (MATCh) study. In this application, we use EMVS for logistic regression models to reaffirm genetic and non-genetic risk factors previously associated with smoking experimentation as well as simultaneously investigate potential interactions between risk factors in a cohort of Mexican heritage adolescents. Second, we develop EMVS to perform variable selection in multistate Markov models. After validating its use in various simulation settings and comparing it to alternative methods in the special case of equally spaced assessment times, we apply our method to data from the PREVAIL study. Here, we use EMVS for multistate Markov models to identify risk factors for transitioning between smoking states in a cohort of socioeconomically disadvantaged smokers who were interested in quitting. In our third aim, we demonstrate how EMVS for multistate Markov models can be used in conjunction with other novel methods for intensive longitudinal data analysis to provide a deeper understanding of the temporal relation between risk factors and smoking behaviors in the critical times around a quit attempt.^

Subject Area

Biostatistics|Public health

Recommended Citation

Koslovsky, Matthew D, "Deterministic Bayesian variable selection developments for binary outcomes" (2016). Texas Medical Center Dissertations (via ProQuest). AAI10241540.