# Cohen’s Kappa in Excel tutorial

2017-10-20
This tutorial shows how to compute and interpret Cohen’s Kappa to measure the agreement between two assessors, in Excel using XLSTAT.

## Dataset to compute and interpret Cohen’s Kappa

An Excel sheet with both the data and the results can be downloaded by clicking on the button below:

Two doctors separately evaluated the presence or the absence of a disease in 62 patients. As shown below, the results were gathered in a crosstab or contingency table, crossing two qualitative variables (doctor 1: healthy or diseased; doctor 2: healthy or diseased).
34 patients were diagnosed as healthy by both doctors; 4 were diagnosed as diseased by doctor 1 and healthy by doctor 2, and so on.

Data are fictitious and were created for this tutorial.

## Goal of this tutorial on computing and interpreting Cohen’s Kappa

The goal of this tutorial is to measure the agreement between the two doctors on the diagnosis of a disease. This is also called inter-rater reliability.
To measure agreement, one could simply compute the percent cases for which both doctors agree (cases in the contingency table’s diagonal), that is (34 + 21)*100 / 62 = 89%.
This statistic has an important weakness. It does not account for agreement randomly occurring. In contrast, Cohen’s Kappa measures agreement while removing the effects due to randomness, thus ensuring a good reproducibility.

## Setting up Cohen’s Kappa statistic in XLSTAT

Once XLSTAT is activated, select the XLSTAT / Correlation/Association tests / Tests on contingency tables command (see below).

Once you have clicked on the button, the dialog box appears.

Activate the Contingency Table option, and select your data in the Contingency Table field.
In the Outputs tab, make sure you activate the Association Coefficients option.

## Interpreting Cohen’s Kappa coefficient

After you have clicked on the OK button, the results including several association coefficients appear:

Similarly to Pearson’s correlation coefficient, Cohen’s Kappa varies between -1 and +1 with:
• -1 reflecting total disagreement
• +1 reflecting total agreement
• 0 reflecting total randomness
Good agreement thresholds change from one field or question to another. However, Landis and Koch (1977) have established the scale below to describe agreement quality according to Kappa values:
< 0: no agreement
0 - 0.2: small
0.2 - 0.4: fair agreement
0.4 - 0.6: moderate
0.6 - 0.8: substantial
0.8 – 1: almost perfect
In our case, Cohen’s Kappa value is 0.76 which indicates a substantial agreement according to the above scale.

## Going further: Gage R&R for attributes

The Gage R&R (Reproducibility & Repeatability) analysis for attributes uses Cohen’s Kappa to measure notably how much assessors are in agreement with themselves.

1c26995d494fb3061dd0ae8571ffc0a4@xlstat.desk-mail.com
https://cdn.desk.com/
false
desk