Multicollinearity, effect modification, and missing data in regression analyses conducted in epidemiologic studies

Kristina Petrova Vatcheva, The University of Texas School of Public Health

Abstract

Regression analysis is a widely used approach in epidemiological analyses to investigate associations between a specific exposure and an outcome. Correctly specified regression models can provide reliable parameter estimates for the regression coefficient of each of the variables in the model. These estimated regression coefficients may have a direct impact as to how a researcher interprets the data and ultimately answers the study questions. The impact on findings and data interpretation of ignoring multicollinearity, unidentified effect modification and interaction, and missing data in regression is well documented in statistical literature but requires greater attention in epidemiologic practice, where it is often ignored or improperly addressed. The failure to identify or correctly handle any of these three issues could lead to bias, inconsistent and less precise findings, and ultimately misinterpretation of the data, and in some key instances incorrect policy decisions. The objective of the dissertation was to highlight the effect of multicollinearity, unidentified effect modification and interaction, and the impact of missing data on the results from regression analysis in epidemiologic studies, and educate and encourage the researchers on the use of the diagnostic for each of these issues as one of the major steps in the regression analysis process. Examples from the epidemiological literature were used to illustrate the impact of the three issues on the findings from regression analyses. Simulation studies to evaluate the impact of ignoring multicollinearity, unidentified effect modification and interaction, and missing data in regression analysis which is used to investigate an association between a specific exposure and an outcome simulation studies were conducted. Specifically, simulated datasets were generated to demonstrate the potential differences in the findings from the regression models when each of the aforementioned issues is present. Analyses to illustrate the points made above were conducted using the Cameron County Hispanic Cohort, which will serve to elucidate new disease patterns, risk factors and unique aspects of disease conditions and provide information for future studies in this population.

Subject Area

Biostatistics|Epidemiology

Recommended Citation

Vatcheva, Kristina Petrova, "Multicollinearity, effect modification, and missing data in regression analyses conducted in epidemiologic studies" (2015). Texas Medical Center Dissertations (via ProQuest). AAI3731982.
https://digitalcommons.library.tmc.edu/dissertations/AAI3731982

Share

COinS