Comparison of Methods for Clustered Data Analysis in a Non-Ideal Situation: Results from an Evaluation of Predictors of Yellow Fever Vaccine Refusal in the Global TravEpiNet (GTEN) Consortium

Authors

  • Sowmya R. Rao Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, Atlanta, GA, USA
  • Regina C. LaRocque Center for Healthcare Organization and Implementation Research (CHOIR), Bedford VA Medical Center, Bedford, MA, Atlanta, GA, USA
  • Emily S. Jentes Division of Global Migration and Quarantine, Centers for Disease Control and Prevention, Atlanta, GA, USA
  • Stefan H.F. Hagmann Division of Pediatric Infectious Diseases, Bronx Lebanon Hospital Center, Bronx, NY, USA;
  • Edward T. Ryan 4Department of Medicine, Harvard Medical School, Boston, MA, Atlanta, GA, USA
  • Pauline V. Han Division of Global Migration and Quarantine, Centers for Disease Control and Prevention, Atlanta, GA, USA
  • David G. Kleinbaum Division of Healthcare Quality and Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA
  • Global TravEpiNet Consortium

DOI:

https://doi.org/10.6000/1929-6029.2014.03.03.1

Keywords:

Clustering, cluster size, cluster imbalance, data analysis

Abstract

Not accounting for clustering in data from multiple centers might yield biased estimates and their standard errors, potentially leading to incorrect inferences. We fit 15 different models with different correlation structures and with/without adjustment for small clusters, including unadjusted logistic regression, Population-averaged models (Generalized Estimating Equations), Cluster-specific models (linear and non-linear with random intercept) and Survey data analysis methods to study the association of variables with the probability of declining yellow fever vaccine among patients seeking pre-travel health consultations at 18 US practices in the Global TravEpiNet Consortium from 1 January, 2009, to 6 June, 2012. Results varied by the method chosen. Generally, when the odds ratio estimates were similar, adjusting for clustering and the small number of clinics increased the standard errors. We chose the random intercept model with the Morel, Bokossa and Neerchal (MBN) adjustment to be the most preferable method for the GTEN dataset since this was one of the more conservative models that accounted for clustering, small sample sizes and also the random effect due to site. Investigators should not ignore clustering and consider the appropriate adjustments necessary for their studies.

Author Biographies

Sowmya R. Rao, Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, Atlanta, GA, USA

Department of Quantitative Health Sciences

Regina C. LaRocque, Center for Healthcare Organization and Implementation Research (CHOIR), Bedford VA Medical Center, Bedford, MA, Atlanta, GA, USA

Division of Infectious Diseases

Emily S. Jentes, Division of Global Migration and Quarantine, Centers for Disease Control and Prevention, Atlanta, GA, USA

Division of Global Migration and Quarantine

Stefan H.F. Hagmann, Division of Pediatric Infectious Diseases, Bronx Lebanon Hospital Center, Bronx, NY, USA;

Division of Pediatric Infectious Diseases

Edward T. Ryan, 4Department of Medicine, Harvard Medical School, Boston, MA, Atlanta, GA, USA

Department of Medicine

Pauline V. Han, Division of Global Migration and Quarantine, Centers for Disease Control and Prevention, Atlanta, GA, USA

Division of Global Migration and Quarantine

David G. Kleinbaum, Division of Healthcare Quality and Promotion, Centers for Disease Control and Prevention, Atlanta, GA, USA

Division of Healthcare Quality and Promotion

References

Liang K-Y, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986; 73: 13-22. http://dx.doi.org/10.1093/biomet/73.1.13 DOI: https://doi.org/10.1093/biomet/73.1.13

Zeger SL, Liang K-Y. Longitudinal data analysis for discrete and continuous outcomes. Biometrics 1986; 121-30. http://dx.doi.org/10.2307/2531248 DOI: https://doi.org/10.2307/2531248

Fitzmaurice GM, Laird NM, Ware JH. Applied longitudinal analysis. Hoboken, New Jersey: John Wiley & Sons, Inc. 2004.

Kleinbaum DG, Klein M. Logistic Regression: A Self-Learning Text (Chapters 14-16). Third Edition. New York Dordrecht Heidelberg London: Springer Publishers 2010. http://dx.doi.org/10.1007/978-1-4419-1742-3 DOI: https://doi.org/10.1007/978-1-4419-1742-3

Horton NJ, Lipsitz SR. Review of software to fit generalized estimating equation regression models. Am Stat 1999; 53: 160-9. DOI: https://doi.org/10.1080/00031305.1999.10474451

Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. J Am Stat Assoc 1993; 88: 9-25. DOI: https://doi.org/10.1080/01621459.1993.10594284

Binder DA. On the variances of asymptotically normal estimators from complex surveys. Int Stat Rev Int Stat 1983; 279-92. DOI: https://doi.org/10.2307/1402588

Morel JG, Neerchal N, Neerchal NK. Overdispersion models in SAS. Sas Inst 2012.

Mancl LA, DeRouen TA. A Covariance Estimator for GEE with Improved Small-Sample Properties. Biometrics 2001; 57: 126-34. http://dx.doi.org/10.1111/j.0006-341X.2001.00126.x DOI: https://doi.org/10.1111/j.0006-341X.2001.00126.x

Fay MP, Graubard BI. Small-Sample Adjustments for Wald-Type Tests Using Sandwich Estimators. Biometrics 2001; 57: 1198-206. http://dx.doi.org/10.1111/j.0006-341X.2001.01198.x DOI: https://doi.org/10.1111/j.0006-341X.2001.01198.x

Kauermann G, Carroll RJ. A note on the efficiency of sandwich covariance matrix estimation. J Am Stat Assoc 2001; 96: 1387-96. http://dx.doi.org/10.1198/016214501753382309 DOI: https://doi.org/10.1198/016214501753382309

Morel JG. Logistic regression under complex survey designs. Surv Methodol 1989; 15: 203-23.

Morel JG, Bokossa MC, Neerchal NK. Small sample correction for the variance of GEE estimators. Biom J 2003; 45: 395-409. http://dx.doi.org/10.1002/bimj.200390021 DOI: https://doi.org/10.1002/bimj.200390021

LaRocque RC, Rao SR, Lee J, Ansdell V, Yates JA, Schwartz BS, et al. Global TravEpiNet: a national consortium of clinics providing care to international travelers—analysis of demographic characteristics, travel destinations, and pretravel healthcare of high-risk US international travelers, 2009–2011. Clin Infect Dis 2012; 54: 455-62. http://dx.doi.org/10.1093/cid/cir839 DOI: https://doi.org/10.1093/cid/cir839

WHO | WHO regional offices [Internet]. WHO. [cited 2013 Jun 15]. Available from: http://www.who.int/about/regions/ en/index.html. Accessed 15 January 2014.

Indices & Data | Human Development Reports (HDR) | United Nations Development Programme (UNDP) [Internet]. [cited 2013 Jun 15]. Available from: http://hdr.undp.org/en/ statistics/. Accessed 15 January 2014.

Keystone JS. Immigrants returning home to visit friends and relatives (VFRs). Health Inf Int Travel 2012; 547-51.

SAS, Guide SU. Version 9.2. Cary, NC, USA: SAS Institute Inc. 2008.

SUDAAN Language Manual, Volumes 1 and 2, Release 11. Research Triangle Park, NC, USA: Research Triangle Institute 2012.

Downloads

Published

2014-08-15

How to Cite

Rao, S. R., LaRocque, R. C., Jentes, E. S., Hagmann, S. H., Ryan, E. T., Han, P. V., Kleinbaum, D. G., & Consortium, G. T. (2014). Comparison of Methods for Clustered Data Analysis in a Non-Ideal Situation: Results from an Evaluation of Predictors of Yellow Fever Vaccine Refusal in the Global TravEpiNet (GTEN) Consortium. International Journal of Statistics in Medical Research, 3(3), 215–223. https://doi.org/10.6000/1929-6029.2014.03.03.1

Issue

Section

General Articles