Cohen’s Kappa in Excel tutorial
This tutorial shows how to compute and interpret Cohen’s Kappa to measure the agreement between two assessors, in Excel using XLSTAT.
Dataset to compute and interpret Cohen’s Kappa
Two doctors separately evaluated the presence or the absence of a disease in 62 patients. As shown below, the results were gathered in a crosstab or contingency table, crossing two qualitative variables (doctor 1: healthy or diseased; doctor 2: healthy or diseased). 34 patients were diagnosed as healthy by both doctors; 4 were diagnosed as diseased by doctor 1 and healthy by doctor 2, and so on. Data are fictitious and were created for this tutorial.
Goal of this tutorial on computing and interpreting Cohen’s Kappa
The goal of this tutorial is to measure the agreement between the two doctors on the diagnosis of a disease. This is also called inter-rater reliability. To measure agreement, one could simply compute the percent cases for which both doctors agree (cases in the contingency table’s diagonal), that is (34 + 21)*100 / 62 = 89%. This statistic has an important weakness. It does not account for agreement randomly occurring. In contrast, Cohen’s Kappa measures agreement while removing the effects due to randomness, thus ensuring a good reproducibility.
Setting up Cohen’s Kappa statistic in XLSTAT
Once XLSTAT is activated, select the XLSTAT / Correlation/Association tests / Tests on contingency tables command (see below). Once you have clicked on the button, the dialog box appears. Activate the Contingency Table option, and select your data in the Contingency Table field. In the Outputs tab, make sure you activate the Association Coefficients option.
Interpreting Cohen’s Kappa coefficient
After you have clicked on the OK button, the results including several association coefficients appear: Similarly to Pearson’s correlation coefficient, Cohen’s Kappa varies between -1 and +1 with: - -1 reflecting total disagreement
- +1 reflecting total agreement
- 0 reflecting total randomness
Good agreement thresholds change from one field or question to another. However, Landis and Koch (1977) have established the scale below to describe agreement quality according to Kappa values: < 0: no agreement 0 - 0.2: small 0.2 - 0.4: fair agreement 0.4 - 0.6: moderate 0.6 - 0.8: substantial 0.8 – 1: almost perfect In our case, Cohen’s Kappa value is 0.76 which indicates a substantial agreement according to the above scale.
Going further: Gage R&R for attributes
The Gage R&R (Reproducibility & Repeatability) analysis for attributes uses Cohen’s Kappa to measure notably how much assessors are in agreement with themselves.
Was this article useful?