What is a McNemar test?
The McNemar test, also known as a test of difference between two correlated for proportions, is a special case of the Cochran's Q test (nonparametric test to compare 2 or more treatments with binary responses for randomized complete blocks) in the case where there are 2 treatments.
As a reminder, the Cochran's Q test is itself a special case of the Friedman test (nonparametric test to compare 2 or more treatments with ordinal or continuous responses for randomized complete blocks) where the responses are binary.
All these tests are used on randomized complete blocks, meaning the data are paired (each block is submitted to all treatments) and that there is no missing data (complete). A block is a factor that is homogeneous (patients in medicine, consumers in marketing), but its effect is of no interest for the specific study. Treatments are the different categories of the factor of interest (for example a drug, or a product to evaluate). The randomization is necessary to avoid that apart from the controlled factor (the block) and the factor of interest (treatments) other nuisance factors do not influence the study. For example, if the order in which the treatments are applied could have an effect, then one should randomize the order in which the blocks are submitted to the treatments to avoid any undesired effect on the study.
Dataset to run a McNemar test in Excel using XLSTAT
An Excel sheet with both the data and the results can be downloaded by clicking here.
The data used in this tutorial come from the "Nonparametric Statistical Methods" book by Hollander and Wolfe (1999). These data were extracted from a study by Andrews (Bodily shame as a mediator between abusive experiences and depression. Journal of Abnormal Psychology, Vol 104(2), May 1995, 277-285). The goal of the below analysis is to determine if, within a group of 101 women in Islington (London, England), the abuse in childhood (sexual or physical) has an effect on depression or not. Statistically speaking, we have here two treatments, abuse and depression, and 101 blocks.
The proportion of women who face a depression in adulthood after an abuse in childhood (17/31=0.55) is higher than for women who did not face an abuse (22/70=0.31). Notice that here the proportions are not independent but “correlated”, because we have blocks. They would be independent if, for example we would be comparing the proportion of depressed women between two countries. Now is this statistically significant?
The null hypothesis of the McNemar test is that there is no treatment effect, which can be reworded as, there is no association between the treatments, meaning in our case, that abuse would have no influence on depression. The null hypothesis of the McNemar test can be reworded as there is no difference between the correlated proportions.
Setting up a McNemar test
After opening XLSTAT, select the XLSTAT / Nonparametric tests / McNemar test, or click on the corresponding button of the Nonparametric tests toolbar.
Once you've clicked the button, the dialog box appears.
We select the 2x2 option for the “data format” option and select the table with the labels.
In the options tab, we choose the two-sided alternative hypothesis, which is equivalent to saying there is no association between abuse and depression. Another alternative hypothesis could be used if we would like to know if abuse or depression happens significantly more than there other.
The computations begin once you have clicked the OK button. The results are then displayed.
Interpreting the results of a McNemar test
Then, the results of the McNemar test and their interpretation are displayed.
We see that the null hypothesis is not rejected if we use a significance level of 0.05. As a reminder, choosing a significance level of 0.05 is the same as deciding that we want to be right in 95% of cases (and wrong in 5% of cases) when rejecting H0.
So, although the proportion of women who face a depression in adulthood after an abuse in childhood is higher than for women who did not face an abuse, the difference is not statistically different.