This tutorial will help you run and interpret a Kolmogorov-Smirnov test to compare two distributions in Excel using XLSTAT.
Not sure this is the statistical test you are looking for? Check out this guide.
Dataset for running a Kolmogorov-Smirnov test to compare two distributions
An Excel sheet with both the data and the results can be downloaded by clicking here.
The data correspond to scores (0 – 30) measuring the quality of two brands of shoes (brand A and brand B). Scores were computed based on a survey addressed to customers using either brand. 15 customers answered for brand A and 8 different clients for brand B.
Goal of this tutorial
This tutorial is divided into two parts:
In the first part we compare the distributions of the two samples without making assumptions on underlying theoretical distributions (normal distribution for example). We use the non-parametric Kolmogorov-Smirnov test, which is well suited in this case.
In the second part, we use the Kolmogorov-Smirnov test to compare the distribution of one sample to a theoretical distribution.
Part 1: Running a Kolmogorov-Smirnov test to compare two observed distributions in Excel
Here, we are interested in comparing the distributions of the two samples.
First of all, what do these distributions look like? Histograms are a good tool to visualise continuous distributions: XLSTAT / Visualizing data / Histograms. In the General tab, select both samples inside the Data cell range. In the Options tab, activate the minimum option and enter 0 in the box. This will force histograms to have the same lower bound on the x axis making their comparison easier. Click on the OK button.
The histograms appear in the results sheet:
Without making any theoretical assumption, we may say that the distribution of sample B is more skewed towards low values compared to the distribution of sample A. We use the Kolmogorov-Smirnov non-parametric test to compare the two distributions: XLSTAT / Nonparametric tests / Comparison of two distributions.
Select the Brand A column in Sample 1 and the Brand B column in sample 2. The Kolmogorov-Smirnov test allows samples to be unbalanced such as in our data: sample B contains fewer scores than sample A. In the Options tab, notice it is possible to select a one-tailed alternative hypothesis and/or an exact computation of the p-value. In the Charts tab, activate the Cumulative histograms option. Click on the OK button.
The results sheet contains the Kolmogorov-Smirnov statistic (0.475) that can be easily extracted (see further, the cumulative histograms chart). This statistic is associated to a p-value (0.133) indicating that the two distributions are not significantly different at alpha = 0.05.
The cumulative distributions chart presents the studied variable (survey scores) on the x axis. For a given point on the x axis, a brand’s cumulative relative frequency is the proportion of scores smaller than this point among the scores of the brand. Thus, as previously suggested by the histograms, brand B seems to start cumulating scores earlier than brand A along the x axis. Let’s take a look at the medians, which are the scores corresponding to a cumulative relative frequency of 0.5. The median score for brand B (~20) seems to be higher than the median score for brand A (~17).
Kolmogorov-Smirnov’s D statistic is the highest deviation occurring between the two curves. In our example, this deviation falls inside the median region, but this may not necessarily be the case when using other data. The higher the D statistic, the lower the p-value and the more significant the difference between the two distributions.
Part 2: Running a Kolmogorov-Smirnov test to compare an observed distribution to a theoretical one in Excel
Suppose that the quality scores of brand A were obtained in France. For US customers, this score follows a normal distribution with a mean of 21.5 and a standard deviation of 2.3. We may ask ourselves if the French scores distribution is significantly different from the theoretical distribution of the US scores. Here again, we use the Kolmogorov-Smirnov test. The only difference with the previous part is that we aim at comparing an observed distribution to a theoretical one instead of comparing two different distributions.
XLSTAT / Nonparametric tests / Distribution fitting. In the General tab, select the brand A data, the normal distribution, activate the Enter option and enter the following parameters : µ = 21.5 and sigma = 2.3. In the Charts tab, activate the Cumulative histograms option. Click on the OK button.
In the results sheet, the histogram (on the left below) shows that the observed distribution lays on low score values compared to the theoretical curve reflecting the US scores distribution (red line).
The Kolmogorov-Smirnov test is associated to a p-value of 0.000 suggesting that the observed distribution is significantly different from the theoretical one at alpha = 0.05.
Not sure you chose the right test? This guide will let you know.