Assessment of the effect on statistical power of regression model misspecification by using techniques of mathematical statistics and simulation study

Hongyun Dong, The University of Texas School of Public Health

Abstract

Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model. Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.

Subject Area

Public health

Recommended Citation

Dong, Hongyun, "Assessment of the effect on statistical power of regression model misspecification by using techniques of mathematical statistics and simulation study" (2010). Texas Medical Center Dissertations (via ProQuest). AAI1483185.
https://digitalcommons.library.tmc.edu/dissertations/AAI1483185

Share

COinS