Modeling Association plus Agreement among Multi-Raters for Ordered Categories
Keywords:Global agreement, Partial agreement, Uniform association, Non-uniform association, Log-linear model, Ordinal scales
In square contingency tables, analysis of agreement between the row and column classifications is of interest. In such tables, the kappa-like statistics are used as a measure of reliability. In addition to the kappa coefficients, several authors discussed agreement in terms of log-linear models. Log-linear agreement models are suggested for use to summarize the degree of agreement between nominal variables. To analyze the agreement between ordinal categories, the association models with agreement parameter can be used. In the recent studies, researchers pay more attention to the assessment of agreement among more than two raters’ decisions, especially in areas of medical and behavioral sciences. This article focuses on the approaches to study of uniform and non-uniform association with inter-rater agreement for multi-raters with ordered categories. In this article, we proposed different modifications of association plus agreement models and illustrate use of the approaches over two numerical examples.
A. AGRESTI (1984). Analysis of Ordinal Categorical Data. John Wiley & Sons, New York.
A. AGRESTI (1988). A model for agreement between ratings on an ordinal scale. Biometrics, 44, no. 2, pp. 539–548.
M. ATTANASIO, M. ENEA, L. RIZZO (2010). Some issues concerning the statistical evaluation of a screening test: The arfi ultrasound case. Statistica, 70, no. 3, pp. 311–322.
S. I. BANGDIWALA (1988). The agreement chart. In Technical report,University of North Carolina at Chapel Hill, Department of Biostatistics, Institute of Statistics Mimeo.
M. P. BECKER, A. AGRESTI (1992). Log-linear modelling of pairwise interobserver agreement on a categorical scale. Statistics in Medicine, 11, no. 1, pp. 101–114.
P.W. BERRY, K. J. BERRY, J. E. JOHNSON (2007). The exact variance of weighted kappa with multiple raters. Psychological Reports, 101, pp. 655–660.
P. W. BERRY, K. J. BERRY, J. E. JOHNSON (2008). Resampling probability values for weighted kappa with multiple raters. Psychological Reports, 102, pp. 606–613.
R. L. BRENNAN, D. J. PREDIGER (1981). Coefficient kappa: Some uses, misuses, and alternatives. Educational and Psychological Measurement, 41, pp. 687–699.
D. V. CICCHETTI, T. ALLISON (1971). A new procedure for assessing reliability of scoring EEG sleep recordings. American Journal of EEG Technology, 11, pp. 101–109.
D. V. CICCHETTI, A. R. FEINSTEIN (1990). High agreement but low kappa: II. resolving the paradoxes. Journal of Clinical Epidemiology, 43, no. 6, pp. 551–558.
J. COHEN (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, no. 1, pp. 37–46.
J. COHEN (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, no. 4, pp. 213–220.
A. R. FEINSTEIN, D. V. CICCHETTI (1990). High agreement but low kappa: I. the problems of the two paradoxes. Journal of Clinical Epidemiology, 43, no. 6, pp. 543–549.
J. L. FLEISS (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, no. 5, pp. 378–382.
J. L. FLEISS, J. COHEN (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measure of reliability. Educational and Psychological Measurement, 33, pp. 613–619.
L. A. GOODMAN (1979). Simple models for the analysis of association in cross classifications having ordered categories. Journal of the American Statistical Association, 74, no. 367, pp. 537–552.
K. L. GWET (2012). Handbook of Inter-Rater Reliability. Advanced Analytics, LLC, Maryland.
N. S. HOLMQUIST, C. A. MCMAHON, O. D. WILLIAMS (1967). Variability in classification of carcinoma in situ of the uterine cervix. Archives of Pathology, 84, pp. 334–345.
L. HUBERT (1977). Kappa revisited. Psychological Bulletin, 84, no. 2, pp. 289–297.
J. KOTTNER, L. AUDIGE, S. BRORSON, A. DONNER, B. J. GAJEWSKI, A. HROBJARTSSON, C. ROBERTS, M. SHOUKRI, D. L. STREINER (2011). Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. Journal of Clinical Epidemiology, 64, pp. 96–106.
H. L. KUNDEL, M. POLANSKY (2003). Measurement of observer agreement. Radiology, 228, no. 2, pp. 303–308.
J. R. LANDIS, G. G. KOCH (1977a). An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics, 33, no. 2, pp. 363–374.
J. R. LANDIS, G. G. KOCH (1977b). The measurement of observed agreement for categorical data. Biometrics, 33, no. 1, pp. 159–174.
B. LAWAL (2003). Categorical Data Analysis with SAS and SPSS Applications. Lawrence Erlbaum Associates Inc, New Jersey.
R. J. LIGHT (1971). Measures of response agreement for qualitative data: Some generalizations and alternatives. Psychological Bulletin, 76, pp. 365–377.
B. M. MELIA, M. DIENER-WEST (1994). Modeling Inter Rater Agreement for Pathologic Features of Choroidal Melanoma. JohnWiley & Sons, New York.
P. QUATTO (2004). Testing agreement among multiple raters. Statistica, 64, no. 1, pp. 145–151.
T. SARACBASI (2011). Agreement models for multiraters. Turkish Journal of Medical Science, 41, no. 5, pp. 939–944.
R. R. SOKAL, F. J. ROHLF (1981). Biometry. Freeman, New York.
M. A. TANNER, M. A. YOUNG (1985). Modeling agreement among raters. Journal of the American Statistical Association, 80, no. 389, pp. 175–180.
J. S. UEBERSAX (1992). Modeling approaches for the analysis of observer agreement. Investigative Radiology, 27, no. 9, pp. 738–743.
F. VALET, C. GUINOT, J. Y. MARY (2007). Log-linear non-uniform association models for agreement between two ratings on an ordinal scale. Statistics in Medicine, 26, pp. 647–662.
A. VON EYE, E. Y.MUN (2005). Analyzing Rater Agreement: Manifest Variable Methods. Lawrence Erlbaum Associates Inc, New Jersey.
M. J.WARRENS (2005). Inequalities between multi-rater kappas. Advanced in Data Analysis and Classification, 4, pp. 271–286.