Cohen’s Kappa in Excel tutorial

This tutorial shows how to compute and interpret Cohen’s Kappa to measure the agreement between two assessors, in Excel using XLSTAT.

Dataset to compute and interpret Cohen’s Kappa

Two doctors separately evaluated the presence or the absence of a disease in 62 patients. As shown below, the results were gathered in a crosstab or contingency table, crossing two qualitative variables (doctor 1: healthy or diseased; doctor 2: healthy or diseased). 34 patients were diagnosed as healthy by both doctors; 4 were diagnosed as diseased by doctor 1 and healthy by doctor 2, and so on. Cohen's Kappa in XLSTAT, dataset Data are fictitious and were created for this tutorial.

Goal of this tutorial on computing and interpreting Cohen’s Kappa

The goal of this tutorial is to measure the agreement between the two doctors on the diagnosis of a disease. This is also called inter-rater reliability. To measure agreement, one could simply compute the percent cases for which both doctors agree (cases in the contingency table’s diagonal), that is (34 + 21)*100 / 62 = 89%. This statistic has an important weakness. It does not account for agreement randomly occurring. In contrast, Cohen’s Kappa measures agreement while removing the effects due to randomness, thus ensuring a good reproducibility.

Setting up Cohen’s Kappa statistic in XLSTAT

Once XLSTAT is activated, select the XLSTAT / Correlation/Association tests / Tests on contingency tables command (see below). XLSTAT correlation/association tests menu Once you have clicked on the button, the dialog box appears. XLSTAT tests on contingency tables dialog box, general tab Activate the Contingency Table option, and select your data in the Contingency Table field. In the Outputs tab, make sure you activate the Association Coefficients option.

XLSTAT tests on contingency tables dialog box, outputs tab

Interpreting Cohen’s Kappa coefficient

After you have clicked on the OK button, the results including several association coefficients appear: XLSTAT Cohen's Kappa results Similarly to Pearson’s correlation coefficient, Cohen’s Kappa varies between -1 and +1 with: - -1 reflecting total disagreement

+1 reflecting total agreement
0 reflecting total randomness

Good agreement thresholds change from one field or question to another. However, Landis and Koch (1977) have established the scale below to describe agreement quality according to Kappa values: < 0: no agreement 0 - 0.2: small 0.2 - 0.4: fair agreement 0.4 - 0.6: moderate 0.6 - 0.8: substantial 0.8 – 1: almost perfect In our case, Cohen’s Kappa value is 0.76 which indicates a substantial agreement according to the above scale.

Going further: Gage R&R for attributes

The Gage R&R (Reproducibility & Repeatability) analysis for attributes uses Cohen’s Kappa to measure notably how much assessors are in agreement with themselves.

Was this article useful?