Shapiro-Wilk and other normality tests in Excel
Why do we need to run a normality test?
Normality tests enable you to know whether your dataset follows a normal distribution. Moreover, normality of residuals is a required assumption in common statistical modeling methods.
Normality tests involve the null hypothesis that the variable from which the sample is drawn follows a normal distribution. Thus, a low p-value indicates a low risk of being wrong when stating that the data are not normal. In other words, if p-value < alpha risk threshold, the data are significantly not normal.
How do normality tests work?
We calculate the test statistic below on our dataset :
If its values are below the bounds defined in the Shapiro-Wilk table for a set alpha threshold, then the associated p-value is less than alpha and the null hypothesis is rejected and the data does not follow a normal distribution.
Dataset for Shapiro-Wilk and other normality tests
The data represent two samples each containing the average math score of 1000 students.
Setting up a Shapiro-Wilk and other normality tests
We then want to test the normality of the two samples. Select the XLSTAT / Describing data / Normality tests, or click on the corresponding button of the Describing data menu.
Once you've clicked on the button, the dialog box appears. Select the two samples in the Data field.
The Q-Q plot option is activated to allow us to visually check the normality of the samples.
The computations begin once you have clicked on the OK button, and the results are displayed on a new sheet.
Interpreting the results of a Shapiro-Wilk and other normality tests
The results are first displayed for the first sample and then for the second sample.
The first result displayed is the Q-Q plot for the first sample. The Q-Q plot allows us to compare the cumulative distribution function (cdf) of the sample (abscissa) to the cumulative distribution function of the normal distribution with the same mean and standard deviation (ordinates). In the case of a sample following a normal distribution, we should observe an alignment with the first bisecting line. In the other cases some deviations from the bisecting line should be observed.
We can see here that the empiric cdf is very close to the bisecting line. The Shapiro-Wilk and Jarque-Bera confirm that we cannot reject the normality assumption for the sample. We notice that with the Shapiro-Wilk test, the risk of being wrong when rejecting the null assumption is smaller than with the Jarque-Bera test.
The following results are for the second sample. Contrary to what we have observed for the first sample, we notice here on the Q-Q plot that there are two strong deviations indicating that the distribution is most probably not normal.
This gap is confirmed by the normality tests (see below) which allow us to assert with no hesitation that we need to reject the hypothesis that the sample might be normally distributed.
In conclusion, in this tutorial, we have seen how to test two samples for normality using Shapiro-Wilk and Jarque-Bera tests. These tests did not reject the normality assumption for the first sample and allowed us to reject it for the second sample.
Was this article useful?