Multiple Imputation by Fully Conditional Specification for Dealing with Missing Data in a Large Epidemiologic Study

Authors

  • Yang Liu Division of Analysis, Research, and Practice Integration, National Center for Injury Prevention and Control, U.S. Centers for Disease Control and Prevention, Atlanta, GA 30341, USA
  • Anindya De Division of Global HIV/AIDS, Center for Global Health, U.S. Centers for Disease Control and Prevention, Atlanta, Georgia, 30333, USA

DOI:

https://doi.org/10.6000/1929-6029.2015.04.03.7

Keywords:

Missing data, multiple imputation, fully conditional specification, complete case analysis, blood utilization

Abstract

Missing data commonly occur in large epidemiologic studies. Ignoring incompleteness or handling the data inappropriately may bias study results, reduce power and efficiency, and alter important risk/benefit relationships. Standard ways of dealing with missing values, such as complete case analysis (CCA), are generally inappropriate due to the loss of precision and risk of bias. Multiple imputation by fully conditional specification (FCS MI) is a powerful and statistically valid method for creating imputations in large data sets which include both categorical and continuous variables. It specifies the multivariate imputation model on a variable-by-variable basis and offers a principled yet flexible method of addressing missing data, which is particularly useful for large data sets with complex data structures. However, FCS MI is still rarely used in epidemiology, and few practical resources exist to guide researchers in the implementation of this technique. We demonstrate the application of FCS MI in support of a large epidemiologic study evaluating national blood utilization patterns in a sub-Saharan African country. A number of practical tips and guidelines for implementing FCS MI based on this experience are described.

Author Biographies

Yang Liu, Division of Analysis, Research, and Practice Integration, National Center for Injury Prevention and Control, U.S. Centers for Disease Control and Prevention, Atlanta, GA 30341, USA

Division of Analysis, Research, and Practice Integration

Anindya De, Division of Global HIV/AIDS, Center for Global Health, U.S. Centers for Disease Control and Prevention, Atlanta, Georgia, 30333, USA

Division of Global HIV/AIDS, Center for Global Health

References

Little RJ, Rubin DB. Statistical analysis of missing data, 2nd ed. Hoboken: John Wiley & Sons; 2002. http://dx.doi.org/10.1002/9781119013563 DOI: https://doi.org/10.1002/9781119013563

He YL. Circ Cardiovasc Qual Outcomes 2010; 3: 98-105. http://dx.doi.org/10.1161/CIRCOUTCOMES.109.875658 DOI: https://doi.org/10.1161/CIRCOUTCOMES.109.875658

Pigott TD. Educ Res Eval 2001; 7(4): 353-83. http://dx.doi.org/10.1076/edre.7.4.353.8937 DOI: https://doi.org/10.1076/edre.7.4.353.8937

Graham JW. Annu Rev Psychol 2009; 60: 549-76. http://dx.doi.org/10.1146/annurev.psych.58.110405.085530 DOI: https://doi.org/10.1146/annurev.psych.58.110405.085530

White IR, Carlin JB. Statist Med 2010; 29: 2920-31. http://dx.doi.org/10.1002/sim.3944 DOI: https://doi.org/10.1002/sim.3944

Enders CK. Struct Equ Modelling 2001; 8: 128-41. http://dx.doi.org/10.1207/S15328007SEM0801_7 DOI: https://doi.org/10.1207/S15328007SEM0801_7

Oba S, Sato M, Takemasa I, Monden M, Matsubara K, Ishii S. Bioinformatics 2003; 19: 2088-96. http://dx.doi.org/10.1093/bioinformatics/btg287 DOI: https://doi.org/10.1093/bioinformatics/btg287

Patrician PA. Res Nurs Health 2002; 25: 76-84. http://dx.doi.org/10.1002/nur.10015 DOI: https://doi.org/10.1002/nur.10015

Newgard CD, Haukoos JS. Acad Emerg Med 2007; 14: 669-78. DOI: https://doi.org/10.1111/j.1553-2712.2007.tb01856.x

Buuren SV, Groothuis-Oudshoorn CG. J Stat Softw 2011; 45: 1-67. DOI: https://doi.org/10.18637/jss.v045.i03

Buuren SV, Brand JP, Groothuis-Oudshoorn CG, Rubin DB. J Stat Comput Sim 2006; 76: 1049-64. http://dx.doi.org/10.1080/10629360600810434 DOI: https://doi.org/10.1080/10629360600810434

Azur MJ, Stuart EA, Frangakis C, Leaf PJ. Int J Mehtods Psychiatr Rec 2011; 20: 40-9. http://dx.doi.org/10.1002/mpr.329 DOI: https://doi.org/10.1002/mpr.329

Bernaards CA, Belin TR, Schafer JL. Statist Med 2007; 26: 1368-82. http://dx.doi.org/10.1002/sim.2619 DOI: https://doi.org/10.1002/sim.2619

Joseph L. Schafer, John W. Graham. Psychol Methods 2002; 7: 147-77. http://dx.doi.org/10.1037/1082-989X.7.2.147 DOI: https://doi.org/10.1037/1082-989X.7.2.147

Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR. BMJ 2009; 339: 157-60.

Horton NJ, Kleinman KP. Am Stat 2007; 61: 79-90. http://dx.doi.org/10.1198/000313007X172556 DOI: https://doi.org/10.1198/000313007X172556

Lee KJ, Carlin JB. Am J Epidemiol 2010; 171: 624-32. http://dx.doi.org/10.1093/aje/kwp425 DOI: https://doi.org/10.1093/aje/kwp425

Lee KJ, Carlin JB. Emerg Themes Epidemiol 2012; 9: 1-10. http://dx.doi.org/10.1186/1742-7622-9-3 DOI: https://doi.org/10.1186/1742-7622-9-3

Stuart EA, Azur M, Frangakis C, Leaf P. Am J Epidemiol 2009; 169:1133-9. http://dx.doi.org/10.1093/aje/kwp026 DOI: https://doi.org/10.1093/aje/kwp026

He Y, Zaslavsky AM, Landrum MB, Harrington DP, Catalano P. Stat Methods Med Res 2010; 19: 653-70. http://dx.doi.org/10.1177/0962280208101273 DOI: https://doi.org/10.1177/0962280208101273

Schenker N, Raghunathan TE, Chiu PL, Makuc DM, Zhang GY, Cohen AJ. J Amer Statist Assoc 2006; 101: 924-33. http://dx.doi.org/10.1198/016214505000001375 DOI: https://doi.org/10.1198/016214505000001375

Meza BPL, Lohrke B, Wilkinson R, Pitman JP, Shiraishi RW, Lowrance DW, Kuehnert MJ, Mataranyika M, Basavaraju SV. Blood Transfus 2014; 12(3): 352-61.

Pitman JP, Wilkinson R, Liu Y, Finckenstein B, Sibinga CS, Lowrance DW, Marfin AA, Postma M, Mataranyika M, Basavaraju SV. Transfus Med Rev 2015; 29: 45-51. http://dx.doi.org/10.1016/j.tmrv.2014.11.003 DOI: https://doi.org/10.1016/j.tmrv.2014.11.003

Carlin BP, Louis TA. Bayesian methods for data analysis, 3nd ed. New York, NY: Springer Verlag; 2008. DOI: https://doi.org/10.1201/b14884

Glynn RJ, Laird NM, Rubin DB. J Amer Statist Assoc 1993; 88: 984-93. http://dx.doi.org/10.1080/01621459.1993.10476366 DOI: https://doi.org/10.1080/01621459.1993.10476366

Buuren SV. Stat Methods Med Res 2007; 16: 219-42. http://dx.doi.org/10.1177/0962280206074463 DOI: https://doi.org/10.1177/0962280206074463

Abayomi K, Gelman A, Levy M. Appl Statist 2008; 57: 273-91. http://dx.doi.org/10.1111/j.1467-9876.2007.00613.x DOI: https://doi.org/10.1111/j.1467-9876.2007.00613.x

Bartlett JW, Seaman SR, White IR, Carpenter JR. Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Stat Meth Med Res 2014. http://dx.doi.org/10.1177/0962280214521348 DOI: https://doi.org/10.1177/0962280214521348

Yucel RM. J STAT SOFTW 2011; 45: 1-7. DOI: https://doi.org/10.18637/jss.v045.i01

Dziuraa JD, Posta LA, Zhao Q, Fu ZX, Peduzzi P. Yale J Biol Med 2013; 86: 343-58.

Héraud-Bousquet V, Larsen C, Carpenter J, Desenclos JC, Strat YL. BMC Med Res Methodol 2012; 12: 1-11. http://dx.doi.org/10.1186/1471-2288-12-73 DOI: https://doi.org/10.1186/1471-2288-12-73

Downloads

Published

2015-08-19

How to Cite

Liu, Y., & De, A. (2015). Multiple Imputation by Fully Conditional Specification for Dealing with Missing Data in a Large Epidemiologic Study. International Journal of Statistics in Medical Research, 4(3), 287–295. https://doi.org/10.6000/1929-6029.2015.04.03.7

Issue

Section

General Articles