Assessment of the Performance of Imputation Techniques in Observational Studies with Two Measurements
DOI:
https://doi.org/10.6000/1929-6029.2015.04.03.1Keywords:
, HRQoL, Imputation, Missing data, Pre-post designAbstract
In observational studies with two measurements when the measured outcome pertains to a health related quality of life (HRQoL) variable, one motivation of the research may be to determine the potential predictors of the mean change of the outcome of interest. It is very common in such studies for data to be missing, which can bias the results. Different imputation techniques have been proposed to cope with missing data in outcome variables. We compared five analysis approaches (Complete Case, Available Case, K- Nearest Neighbour, Propensity Score, and a Markov Chain Monte Carlo algorithm) to assess their performance when handling missing data at different missingness rates and mechanisms (MCAR, MAR and MNAR). These strategies were applied to a pre-post study of patients with Chronic Obstructive Pulmonary Disease. We analyzed the relationship of the changes in subjects HRQoL over one year with clinical and socio-demographic characteristics. A simulation study was also performed to illustrate the performance of the imputation methods. Relative and standardized bias was assessed on each scenario. For all missingness mechanisms, not imputing and using MCMC method, both combined with mixed-model analysis, showed lowest standardized bias. Conversely, Propensity Score showed worst bias values. When missingness pattern is MCAR or MAR and rate small, we recommend using mixed models. Nevertheless, when missingness percentage is high, in order to gain sample size and statistical power, MCMC is preferred, although there are no bias differences compared with the mixed models without imputation. For a MNAR scenario, a further sensitivity analysis should be made.
References
Altman DG. Missing data. BMJ 2007; 334(7590): 424. http://dx.doi.org/10.1136/bmj.38977.682025.2C DOI: https://doi.org/10.1136/bmj.38977.682025.2C
Barnard J. Applications of multiple imputation in medical studies: from AIDS to NHANES. Stats Methods Med Res 1999; 8(1): 17-36. http://dx.doi.org/10.1191/096228099666230705 DOI: https://doi.org/10.1177/096228029900800103
Little RJA. Statistical analysis with missing data. Wiley; 2002. http://dx.doi.org/10.1002/9781119013563 DOI: https://doi.org/10.1002/9781119013563
Laird NM. Missing data in longitudinal studies. Stat Med 1988; 7(1-2): 305-15. http://dx.doi.org/10.1002/sim.4780070131 DOI: https://doi.org/10.1002/sim.4780070131
Robins J, Rotnitzky A. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 1994; 89: 846-66. http://dx.doi.org/10.1080/01621459.1994.10476818 DOI: https://doi.org/10.1080/01621459.1994.10476818
Robins J, Rotnitzky A. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J Am Stat Assoc 1995; 90: 106-21. http://dx.doi.org/10.1080/01621459.1995.10476493 DOI: https://doi.org/10.1080/01621459.1995.10476493
Molenberghs G. Missing data in clinical Studies. West Sussex, England: John Wiley & Sons; 2007. http://dx.doi.org/10.1002/9780470510445 DOI: https://doi.org/10.1002/9780470510445
Janssen KJM, Donders AR, Harrell J, Vergouwe Y, Chen Q, Grobbee DE. Missing covariate data in medical research: to impute is better than to ignore. J Clin Epidemiol 2010; 63(7): 721-7. http://dx.doi.org/10.1016/j.jclinepi.2009.12.008 DOI: https://doi.org/10.1016/j.jclinepi.2009.12.008
Xie H. Analyzing longitudinal clinical trial data with nonignorable missingness and unknown missingness reasons. Comput Stat Data An 2012; 56(5): 1287-300. http://dx.doi.org/10.1016/j.csda.2010.11.021 DOI: https://doi.org/10.1016/j.csda.2010.11.021
Ibrahim JG. Missing data methods in longitudinal studies: a review. Test 2009; 18(1): 1-43. http://dx.doi.org/10.1007/s11749-009-0138-x DOI: https://doi.org/10.1007/s11749-009-0138-x
Marshall A, Altman DG. Comparison of imputation methods for handling missing covariate data when fitting a Cox
proportional hazards model: a resampling study. BMC Med Res Methodol 2010; 10: 112. http://dx.doi.org/10.1186/1471-2288-10-112 DOI: https://doi.org/10.1186/1471-2288-10-112
White IR. Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med 2010; 29(28): 2920-31. http://dx.doi.org/10.1002/sim.3944 DOI: https://doi.org/10.1002/sim.3944
Garcia-Laencina PJ, Abreu PH, Abreu MH, Afonoso N. Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values. Comput Biol Med 2015; 59: 125-33. http://dx.doi.org/10.1016/j.compbiomed.2015.02.006 DOI: https://doi.org/10.1016/j.compbiomed.2015.02.006
Saini I, Singh D, Khosla A. QRS detection using K-Nearest Neighbor algorithm (KNN) and evaluation on standard ECG databases. J Adv Res 2013; 4(4): 331-44. http://dx.doi.org/10.1016/j.jare.2012.05.007 DOI: https://doi.org/10.1016/j.jare.2012.05.007
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics 2001; 17(6): 520-5. http://dx.doi.org/10.1093/bioinformatics/17.6.520 DOI: https://doi.org/10.1093/bioinformatics/17.6.520
Rosenbaum P. The central role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 1983; 7: 41-55. http://dx.doi.org/10.1093/biomet/70.1.41 DOI: https://doi.org/10.1093/biomet/70.1.41
Rubin DB.& Schenker. Multiple Imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc 1986; 81: 366-74. http://dx.doi.org/10.1080/01621459.1986.10478280 DOI: https://doi.org/10.1080/01621459.1986.10478280
Molenberghs G. Linear mixed models forl Longitudinal data. Springer; 2000. http://dx.doi.org/10.1007/978-1-4419-0300-6 DOI: https://doi.org/10.1007/978-1-4419-0300-6
SAS/STAT 9.3 User's Guide (2nd Edn). Version 9.3. Cary NC: SAS Institute; 2009.
Esteban C, Quintana J, Moraza J, Aburto M, Aguirre U, Aguirregomoscorta J, et al. BODE-Index vs HADO-score in chronic obstructive pulmonary disease: Which one to use in general practice? BMC Med 2010; 8: 28. http://dx.doi.org/10.1186/1741-7015-8-28 DOI: https://doi.org/10.1186/1741-7015-8-28
Jones PW, Quirk FH, Baveystock CM. A self-complete measure of health status for chronic airflow limitation. The St. George's Respiratory Questionnaire. Am Rev Respir Dis 1992; 145(6): 1321-7. http://dx.doi.org/10.1164/ajrccm/145.6.1321 DOI: https://doi.org/10.1164/ajrccm/145.6.1321
Vittinghoff E, Glidden DV, Shiboski SC. Regression Methods in Biostatistics. Linear, Logistic, Survival, and Repeated Measures Models. Second Edition. Springer; 2012. DOI: https://doi.org/10.1007/978-1-4614-1353-0
Burton A, Altman DG, Royston P&HR. The design of simulation studies in medical statistics. Stat Med 2006; 25(24): 4279-92. http://dx.doi.org/10.1002/sim.2673 DOI: https://doi.org/10.1002/sim.2673
Collins LM, Schafer JL. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods 2001; 6(4): 330-51. http://dx.doi.org/10.1037/1082-989X.6.4.330 DOI: https://doi.org/10.1037/1082-989X.6.4.330
R: A language and environment for statistical computing, reference index version 3.0. Version 3.0. Viena, Austria: R Foundation for Statistical Computing; 2014.
Spratt M, Carpenter J, Sterne JAC, Carlin JB, Heron J, Henderson J&TK. Strategies for multiple imputation in longitudinal studies. Am J Epidemiol 2010; 172(4): 478-87. http://dx.doi.org/10.1093/aje/kwq137 DOI: https://doi.org/10.1093/aje/kwq137
Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 2009; 338: b2393. DOI: https://doi.org/10.1136/bmj.b2393
Mazumdar S, Tang G, Houck PR, Dew MA, Begley AE, Scott J, et al. Statistical analysis of longitudinal psychiatric data with dropouts. J Psychiatr Res 2007; 41(12): 1032-41. http://dx.doi.org/10.1016/j.jpsychires.2006.09.007 DOI: https://doi.org/10.1016/j.jpsychires.2006.09.007
Jorgensen AW, Lundstrom LH, Wetterslev J, Astrup A, Gotzsche PC. Comparison of results from different imputation techniques for missing data from an anti-obesity drug trial. PLoS One 2014; 9(11): e111964. DOI: https://doi.org/10.1371/journal.pone.0111964
Peters SA, Bots ML, den Ruijter HM, Palmer MK, Grobbee DE, Crouse JR, III, et al. Multiple imputation of missing repeated outcome measurements did not add to linear mixed-effects models. J Clin Epidemiol 2012; 65(6): 686-95. http://dx.doi.org/10.1016/j.jclinepi.2011.11.012 DOI: https://doi.org/10.1016/j.jclinepi.2011.11.012
Twisk J, de Boer M, de Vente W. Multiple imputation of missing values was not necessary before performing a longitudinal mixed-model analysis. J Clin Epidemiol 2013; 66: 1022-8. http://dx.doi.org/10.1016/j.jclinepi.2013.03.017 DOI: https://doi.org/10.1016/j.jclinepi.2013.03.017
Allison PD. Multiple imputation for missing data: A cautionary tale. Socio Meth Res 2000; 28: 309-10. http://dx.doi.org/10.1177/0049124100028003003 DOI: https://doi.org/10.1177/0049124100028003003
Satty A. Imputation methods for estimating regression parameters under a monotone missing covariate pattern: a comparative analysis. South African Statistical Journal 2012; 46: 327-56.
Schafer JL. Multiple imputation: a primer. Stats Methods Med Res 1999; 8(1): 3-15. http://dx.doi.org/10.1191/096228099671525676 DOI: https://doi.org/10.1191/096228099671525676
Kristman V, Manno M. Loss to follow-up in cohort studies: how much is too much? Eur J Epidemiol 2004; 19(8): 751-60. http://dx.doi.org/10.1023/B:EJEP.0000036568.02655.f8 DOI: https://doi.org/10.1023/B:EJEP.0000036568.02655.f8
Altman DG. Statistics in medical journals: some recent trends. Stat Med 2000; 19(23): 3275-89. http://dx.doi.org/10.1002/1097-0258(20001215)19:23<3275::AID-SIM626>3.0.CO;2-M DOI: https://doi.org/10.1002/1097-0258(20001215)19:23<3275::AID-SIM626>3.0.CO;2-M
Published
How to Cite
Issue
Section
License
Copyright (c) 2015 Urko Aguirre, Inmaculada Arostegui, Cristóbal Esteban, Jose María Quintana
This work is licensed under a Creative Commons Attribution 4.0 International License.
Policy for Journals/Articles with Open Access
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
Policy for Journals / Manuscript with Paid Access
Authors who publish with this journal agree to the following terms:
- Publisher retain copyright .
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work .