Establishing Reliability When Multiple Examiners Evaluate a Single Case-Part II: Applications to Symptoms of Post-Traumatic Stress Disorder (PTSD)

Domenic Cicchetti; Alan Fontana; Donald Showalter

doi:10.6000/1929-6029.2015.04.01.4

Authors

Domenic Cicchetti Child Study Center and Departments of Biometry and Psychiatry, Yale University School of Medicine, USA
Alan Fontana Department of Psychiatry, Yale University School of Medicine, USA
Donald Showalter North East Program Evaluation Center (NEPEC), West Haven, CT, USA

DOI:

https://doi.org/10.6000/1929-6029.2015.04.01.4

Keywords:

Multiple Raters, Single Case, Nomothetic, Ideographic, PTSD, Statistical Significance, Clinical Significance.

Abstract

In an earlier article, the authors assessed the clinical significance of each of 19 Clinician Administered PTSD Scale items and composite scores (CAPS-1) [1] when 12 clinicians evaluated a Vietnam era veteran. A second patient was also evaluated by the same 12 clinicians and used for cross-validation purposes [2]. The objectives of this follow-up research are: (1) to describe and apply novel bio-statistical methods for establishing the statistical significance of these reliability estimates when the same 12 examiners evaluated each of the two Vietnam era patients. This approach is also utilized within the broader contexts of the ideographic and nomothetic conceptualizations to science, and the interplay between statistical and clinical or practical significance; (2) to detail the steps for applying the new methodology; and (3) to investigate whether the quality of the symptoms (frequency, intensity); item content; or specific clinician affect the levels of rater reliability. The more typical (nomothetic) reliability research design focuses on group averages and broader principles related to biomedical issues, rather than the focus on the individual case (ideographic approach). Both research designs (ideographic and nomothetic) have been incorporated in this follow-up research endeavor.

Author Biographies

Domenic Cicchetti, Child Study Center and Departments of Biometry and Psychiatry, Yale University School of Medicine, USA

Biometry and Psychiatry

Alan Fontana, Department of Psychiatry, Yale University School of Medicine, USA

Psychiatry

References

Blake DD, Weathers FW, Nagy LM, Kaloupek DG, Gusman FD, Charney DS, et al. The development of a clinician administered PTSD scale (CAPS). J Traumatic Stress 1995; 8: 75-90. http://dx.doi.org/10.1002/jts.2490080106 DOI: https://doi.org/10.1002/jts.2490080106

Cicchetti DV, Fontana A, Showalter, D. Evaluating the reliability of multiple assessments of PTSD symptomatology: Multiple examiners, one patient. Psychiat Res 2009; 166: 269-280. http://dx.doi.org/10.1016/j.psychres.2008.01.014 DOI: https://doi.org/10.1016/j.psychres.2008.01.014

Windelband W, Oakes G. History and natural science. Hist Theory 1894/1980; 19: 165-168. DOI: https://doi.org/10.2307/2504797

Windelband W. A history of philosophy 1901/2001; New Jersey: Paper Tiger.

Allport G. The functional autonomy of motives. Am J Psychol 1937; 50: 141-156. http://dx.doi.org/10.2307/1416626 DOI: https://doi.org/10.2307/1416626

Cicchetti DV. On the psychometrics of neuropsychological measurement: A biostatistical perspective. Oxford handbook of neuropsychology, New York, NY: Oxford University Press 1989.

Robinson OC. The ideographic/nomothetic dichotomy: Tracing historical origins of contemporary confusions. Hist Phil Psychol 2011; 13: 32-39. DOI: https://doi.org/10.53841/bpshpp.2011.13.2.32

Holschu N. Randomization and design I. In: Fisher RA: An appreciation. Fienberg SE, Hinkley DV, Eds. Lecture notes in statistics. New York, NY: Springer 1980; pp. 35-45. DOI: https://doi.org/10.1007/978-1-4612-6079-0_5

Kelly K, Preacher KJ. On effect size. Psychol Meth 2012; 17: 137-152. http://dx.doi.org/10.1037/a0028086 DOI: https://doi.org/10.1037/a0028086

Cohen J. Statistical power analysis for the behavioral sciences. Glendale, NJ: Lawrence Erlbaum Associates 1988.

Borenstein M. The shift from significance testing to effect size estimation. In: Bellak AS, Hershen M, series Eds., and Schooler N, volume editor, Research and methods: Comprehensive clinical psychology. New York, NY: Pergammon 1998; volume 3: pp. 313-349. DOI: https://doi.org/10.1016/B0080-4270(73)00209-1

Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 1960; 23: 37-46. http://dx.doi.org/10.1177/001316446002000104 DOI: https://doi.org/10.1177/001316446002000104

Cohen J. Weighted kappa: Nominal scale agreement with provision for partial credit. Psychol Bull 1968; 70: 213-220. http://dx.doi.org/10.1037/h0026256 DOI: https://doi.org/10.1037/h0026256

Landis JR, Koch GG. The measurement of agreement for categorical data. Biometrics 1977; 33: 159-174. http://dx.doi.org/10.2307/2529310 DOI: https://doi.org/10.2307/2529310

Fleiss J. Statistical methods for rates and proportions. New York, NY: Wiley (2nd ed.) 1981.

Fleiss J, Levin B, Cho-Paik M. Statistical methods for rates and proportions. New York, NY: Wiley (3rd ed.) 2003. http://dx.doi.org/10.1002/0471445428 DOI: https://doi.org/10.1002/0471445428

Cicchetti DV, Sparrow SS. Developing criteria for establishing inter-rater reliability of specific items: Applications to assessment of adaptive behavior. Am J Mental Deficiency 1981; 86: 127-137.

Cicchetti DV, Volkmar F, Klin A, Showalter D. Diagnosing autism using ICD-10 criteria: A comparison of neural networks and standard multivariate procedures. Child Neuropsychol 1995; 1: 26-37. DOI: https://doi.org/10.1080/09297049508401340

Cicchetti DV, Bronen R, Spencer S, Haut S, Berg A, Oliver P, Tyrer P. Rating scales, scales of measurement, issues of reliability: Resolving some critical issues for clinicians and researchers. J Nervous Mental Disease 2006; 194: 557-564. DOI: https://doi.org/10.1097/01.nmd.0000230392.83607.c5

Szalai, JP. The statistics of agreement on a single item or object by multiple raters. Percept Motor Skills 1993; 77: 377-378. http://dx.doi.org/10.2466/pms.1993.77.2.377 DOI: https://doi.org/10.2466/pms.1993.77.2.377

Szalai, JP. Kappa SC. A measure of agreement on a single rating category for a single item or object rated by multiple raters. Psychol Reports 1998; 82: 1321-1322. http://dx.doi.org/10.2466/pr0.1998.82.3c.1321 DOI: https://doi.org/10.2466/pr0.1998.82.3c.1321

Cicchetti DV. Assessing inter-rater reliability for rating scales: Resolving some basic issues. Br J Psychiat 1976; 12: 452-456. http://dx.doi.org/10.1192/bjp.129.5.452 DOI: https://doi.org/10.1192/bjp.129.5.452

Fleiss J, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Pschol Meas 1973; 33: 613-619. http://dx.doi.org/10.1177/001316447303300309 DOI: https://doi.org/10.1177/001316447303300309

Bronen RA, Chan S, Cicchetti DV, Berg AT, Spencer S. Inter-rater agreement for MRI interpretation of epilepsy surgery patients. Am Epilep Soc 2004.

Cicchetti, DV, Lord C, Koenig K, Klin A, Volkmar FR. Reliability of the ADI-R for the single case-Part II: Clinical versus statistical significance. J Autism Develop Disord 2014; 44: 3154-3160. DOI: https://doi.org/10.1007/s10803-014-2177-8