A Simple Approach to Sample Size Calculation for Count Data in Matched Cohort Studies

Authors

  • Dexiang Gao Department of Pediatrics, School of Medicine, University of Colorado Denver, USA
  • Gary K. Grunwald Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Denver, Colorado, USA
  • Stanley Xub Institute for Health Research, Kaiser Permanente Colorado, Denver, Colorado, USA

DOI:

https://doi.org/10.6000/1929-6029.2014.03.03.11

Keywords:

Clustered Poisson data, Overdispersion, Subject heterogeneity, Statistical power, Sample size.

Abstract

In matched cohort studies exposed and unexposed individuals are matched on certain characteristics to form clusters to reduce potential confounding effects. Data in these studies are clustered and thus dependent due to matching. When the outcome is a Poisson count, specialized methods have been proposed for sample size estimation. However, in practice the variance of the counts often exceeds the mean (i.e. counts are overdispersed), so that Poisson methods don’t apply. We propose a simple approach for calculating statistical power and sample size for clustered Poisson data when the proportion of exposed subjects in a cluster is constant across clusters. We extend the approach to clustered count data with overdispersion, which is common in practice. We evaluate these approaches with simulation studies and apply them to a matched cohort study examining the association of parental depression with health care utilization. Simulation results show that the methods for estimating power and sample size performed reasonably well under the scenarios examined and were robust in the presence of mixed exposure proportions up to 30%.

Author Biographies

Dexiang Gao, Department of Pediatrics, School of Medicine, University of Colorado Denver, USA

Department of Pediatrics, School of Medicine

Gary K. Grunwald, Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Denver, Colorado, USA

Department of Biostatistics and Informatics

Stanley Xub, Institute for Health Research, Kaiser Permanente Colorado, Denver, Colorado, USA

Institute for Health Research

References

Kornek B, Aboul-Enein F, Rostasy K, et al. Natalizumab therapy for highly active pediatric multiple sclerosis. JAMA Neurol 2013; 70: 469-75. DOI: https://doi.org/10.1001/jamaneurol.2013.923

Rothman KJ, Greenland S. Cohort Studies. In Modern Epidemiology, 2nd edition, Philadelphia, PA: Lippincott-Raven 1998.

Graham PL, Mengersen K, Morton AP. Confidence limits for the ratio of two rates based on likelihood scores: non-iterative method. Statistics in Medicine 2003; 22: 2071-2083. http://dx.doi.org/10.1002/sim.1405 DOI: https://doi.org/10.1002/sim.1405

Cummings P, McKnight B, Greenland S. Matched cohort methods in injury research. Epidemiologic Reviews 2003; 25: 43-50. http://dx.doi.org/10.1093/epirev/mxg002 DOI: https://doi.org/10.1093/epirev/mxg002

Cummings P, McKnight B, Weiss NS. Matched-pair cohort methods in traffic crash research. Accident Analysis and Prevention 2003; 35: 131-141. http://dx.doi.org/10.1016/S0001-4575(01)00108-7 DOI: https://doi.org/10.1016/S0001-4575(01)00108-7

Sills MR, Shetterly S, Xu S, Magid D, Kempe A. The association between parental depression and children’s healthcare utilization. Pediatrics 2007; 119: e829-836. DOI: https://doi.org/10.1542/peds.2006-2399

Ng HKT, Tang ML. Testing the equality of two Poisson means using the rate ratio. Statistics in Medicine 2005; 24: 955-965. http://dx.doi.org/10.1002/sim.1949 DOI: https://doi.org/10.1002/sim.1949

Amatya A, Bhaumik D, Gibbons RD. Sample size determination for clustered count data..Statistics in Medicine 2013; 32: 4162-4179. DOI: https://doi.org/10.1002/sim.5819

Cox DR. Some remarks on overdispersion. Biometrics 1983; 10: 269-274. DOI: https://doi.org/10.1093/biomet/70.1.269

Dean C. Testing for overdispersion in Poisson and binomial regression models. Journal of the American Statistical Association 1992; 87: 451-457. http://dx.doi.org/10.1080/01621459.1992.10475225 DOI: https://doi.org/10.1080/01621459.1992.10475225

Lawless JF. Negative Binomial and Mixed Poisson Regression. The Canadian Journal of Statistics 1987; 15: 209-225. http://dx.doi.org/10.2307/3314912 DOI: https://doi.org/10.2307/3314912

Cameron AC, Trivedi PK. Regression Analysis of Count Data. Cambridge University Press 1998. http://dx.doi.org/10.1017/CBO9780511814365 DOI: https://doi.org/10.1017/CBO9780511814365

Friede T, Schmidli H. Blinded sample size re-estimation with count data: Methods and applications in multiple sclerosis. Statistics in Medicine 2010; 29: 1145-1156. DOI: https://doi.org/10.1002/sim.3861

Gao D. Analysis of clustered longitudinal count data. University of Colorado Health Sciences Center Thesis 2007.

Demidenko E. Poisson regression for clustered data. International Statistical Review 2007; 75: 96-113. http://dx.doi.org/10.1111/j.1751-5823.2006.00003.x DOI: https://doi.org/10.1111/j.1751-5823.2006.00003.x

Diggle PJ, Heagerty P, Liang K-Y, Zeger SL. Analysis of Longitudinal Data. 2nd ed. Oxford University Press: New York 2002.

Breslow N. Test of hypotheses in overdispersion regression and other quasi likelihood models. Journal of the American Statistical Association 1990; 85: 565-571. http://dx.doi.org/10.1080/01621459.1990.10476236 DOI: https://doi.org/10.1080/01621459.1990.10476236

Nagin DS, Land KC. Age, Criminal Careers, and Population Heterogeneity: Specification and Estimation of a Nonparametric, Mixed Poisson Model. Criminology 1993; 31: 501-523. http://dx.doi.org/10.1111/j.1745-9125.1993.tb01133.x DOI: https://doi.org/10.1111/j.1745-9125.1993.tb01133.x

Nagin DS. Group-Based Modeling of Development. Cambridge: Harvard University Press 2005. DOI: https://doi.org/10.4159/9780674041318

Sichel HS. The density and size distribution of diamonds. Bulletin of the International Statistical Institute 1973; 45: 420–427.

Atkinson AC, Yeh L. Inference for Sichel's compound Poisson distribution. Journal of the American Statistical Association 1982; 77: 153-158. http://dx.doi.org/10.1080/01621459.1982.10477779 DOI: https://doi.org/10.1080/01621459.1982.10477779

Manton KG, Woodbury MA, Stallard E. A variance components approach to categorical data models with heterogeneous cell populations: analysis of spatial gradients

in lung cancer mortality rates in north Carolina counties. Biometrics 1981; 37: 259-269. http://dx.doi.org/10.2307/2530416 DOI: https://doi.org/10.2307/2530416

Margolin BH, Kaplan N, Zeiger E. Statistical analysis of the Ames Salmonella Microsome Test. Proceedings of the National Academy of Sciences 1981; 76: 3779-3783. http://dx.doi.org/10.1073/pnas.78.6.3779 DOI: https://doi.org/10.1073/pnas.78.6.3779

Hinde J. Compound Poisson regression models. Lecture Notes in Statistics 1982; 14: 109-121. http://dx.doi.org/10.1007/978-1-4612-5771-4_11 DOI: https://doi.org/10.1007/978-1-4612-5771-4_11

Ord JK, Whitmore GA. The Poisson-inverse Gaussian distribution as a model for species abundance. Communications in Statistics-Theory and Methods 1986; 15: 853-871. http://dx.doi.org/10.1080/03610928608829156 DOI: https://doi.org/10.1080/03610928608829156

Hougaard P, Lee MLT, Whitmore GA. Analysis of overdispersed count data by mixtures of Poisson variables and Poisson processes. Biometrics 1997; 53: 1225-1238. http://dx.doi.org/10.2307/2533492 DOI: https://doi.org/10.2307/2533492

Molenberghs G, Verbeke G, Demétrio CGB. An extended random-effects approach to modeling repeated, overdispersed count data. Lifetime Data Analysis 2007; 13: 513-531. http://dx.doi.org/10.1007/s10985-007-9064-y DOI: https://doi.org/10.1007/s10985-007-9064-y

Ogungbenro K, Aarons L. Sample size/power calculations for population pharmacodynamic experiments involving repeated-count measurements. Journal of Biopharmaceutical Statistics 2010; 20: 1026-1042. http://dx.doi.org/10.1080/10543401003619205 DOI: https://doi.org/10.1080/10543401003619205

Cornfield J. Randomization by group: a formal analysis. American Journal of Epidemiology 1978; 108: 100-102. DOI: https://doi.org/10.1093/oxfordjournals.aje.a112592

Donner A, Klar N. Design and analysis of cluster randomization trials in health research. Arnold: London; 2000. DOI: https://doi.org/10.1191/096228000669355658

Gao D, Grunwald G, Xu S. Statistical Methods for Estimating Within-Cluster Effects for Clustered Poisson Data. J Biomet Biostat 2013; 4: 1-6. DOI: https://doi.org/10.4172/2155-6180.1000159

Downloads

Published

2014-08-25

How to Cite

Gao, D., Grunwald, G. K., & Xub, S. (2014). A Simple Approach to Sample Size Calculation for Count Data in Matched Cohort Studies. International Journal of Statistics in Medical Research, 3(3), 321–330. https://doi.org/10.6000/1929-6029.2014.03.03.11

Issue

Section

General Articles