Comparing quantile imputation with other selected imputation methods for missing data with application to assess factors associated with self efficacy of physical activity in breast cancer survivors

Ye Lin, The University of Texas School of Public Health

Abstract

Missing data are a problem that is almost universally encountered by researchers at one point or another of their work. Understanding the reasons why data may be missing and the appropriate way to recover the lost data becomes the major question for the data analyst. To avoid bias caused by an incomplete data set, using missing data methods to impute missing values properly becomes crucial during data analysis to preserve power. Thus imputation methods can help researchers to better analyze data and yield more accurate results. Missing values are a common occurrence in self-reported measurements and questionnaires of behavior studies. In many social behavior studies, self efficacy is one of the key concepts. Self efficacy refers to a sense of confidence in personal ability to perform specific tasks. Research on Self efficacy related to physical activity can be very helpful in addressing factors needed to personalize patients' treatment since physical activity can help cancer survivors relieve long-term sequelae such as decreasing physical function, psychological distress and pain. In such cases, missing-data methods become important since most of the study results are collected through questionnaire and survey questions that often end up with missing values. Using a simulated dataset, we compared advantages and disadvantages of missing data methods to detect factors associated with Self Efficacy of physical activity in breast cancer survivors. This study focused on comparing performance of single value imputation using the mean, single imputation using quantiles and multiple imputation using MCMC. From the simulations we computed both power and mean squared error to measure the performance of each imputation method, we also used all the three methods to detect factors associated with physical activity-related self efficacy in breast cancer survivors using data from University of Texas M.D. Anderson Cancer Center, the Houston chapter of the Sisters' Network, The Rose, and Lyndon B. Johnson Harris County Hospital District General hospital in Houston, Texas from January to October 2002.

Subject Area

Statistics

Recommended Citation

Lin, Ye, "Comparing quantile imputation with other selected imputation methods for missing data with application to assess factors associated with self efficacy of physical activity in breast cancer survivors" (2015). Texas Medical Center Dissertations (via ProQuest). AAI1597537.
https://digitalcommons.library.tmc.edu/dissertations/AAI1597537

Share

COinS