Your data analysis solution

Kolmogorov-Smirnov test in Excel tutorial

2017-10-20

This tutorial will help you run and interpret a Kolmogorov-Smirnov test to compare two distributions in Excel using XLSTAT.
Not sure this is the statistical test you are looking for? Check out this guide.

Dataset for running a Kolmogorov-Smirnov test to compare two distributions

An Excel sheet with both the data and the results can be downloaded by clicking here.

The data correspond to scores (0 – 30) measuring the quality of two brands of shoes (brand A and brand B). Scores were computed based on a survey addressed to customers using either brand. 15 customers answered for brand A and 8 different clients for brand B.  

Goal of this tutorial

This tutorial is divided into two parts:

In the first part we compare the distributions of the two samples without making assumptions on underlying theoretical distributions (normal distribution for example). We use the non-parametric Kolmogorov-Smirnov test, which is well suited in this case.

In the second part, we use the Kolmogorov-Smirnov test to compare the distribution of one sample to a theoretical distribution.

Part 1: Running a Kolmogorov-Smirnov test to compare two observed distributions in Excel

Here, we are interested in comparing the distributions of the two samples.

First of all, what do these distributions look like? Histograms are a good tool to visualize continuous distributions: XLSTAT / Visualizing data / Histograms.
XLSTAT Function Histogram
In the General tab, select both samples inside the Data cell range. In the Options tab, activate the minimum option and enter 0 in the box. This will force histograms to have the same lower bound on the x axis making their comparison easier. Click on the OK button.

 Preliminary histogram setting-up

The histograms appear in the results sheet: 
Histograms for Brand A and Brand B

Without making any theoretical assumption, we may say that the distribution of sample B is more skewed towards low values compared to the distribution of sample A. We will now use the Kolmogorov-Smirnov non-parametric test to compare the two distributions. Go to  XLSTAT / Nonparametric tests / Comparison of two distributions.
XLSTAT Function Comparison of two distributions

Kolmogorov-Smirnov test setting up the dialog box

Select the Brand A column in Sample 1 and the Brand B column in sample 2. The Kolmogorov-Smirnov test allows samples to be unbalanced such as in our data: sample B contains fewer scores than sample A. In the Options tab, notice it is possible to select a one-tailed alternative hypothesis and/or an exact computation of the p-value. In the Charts tab, activate the Cumulative histograms option. Click on the OK button.

The results sheet contains the Kolmogorov-Smirnov statistic (0.475) that can be easily extracted (see further, the cumulative histograms chart). This statistic is associated to a p-value (0.133) indicating that the two distributions are not significantly different at alpha = 0.05.

Kolmogorov 2 samples summary statistics

Kolmogorov-Smirnov test: interpreting the results

The cumulative distributions chart presents the studied variable (survey scores) on the x axis. For a given point on the x axis, a brand’s cumulative relative frequency is the proportion of scores smaller than this point among the scores of the brand. Thus, as previously suggested by the histograms, brand B seems to start cumulating scores earlier than brand A along the x axis. Let’s take a look at the medians, which are the scores corresponding to a cumulative relative frequency of 0.5. The median score for brand B (~20) seems to be higher than the median score for brand A (~17).

Kolmogorov-Smirnov’s D statistic is the highest deviation occurring between the two curves. In our example, this deviation falls inside the median region, but this may not necessarily be the case when using other data. The higher the D statistic, the lower the p-value and the more significant the difference between the two distributions.

Part 2: Running a Kolmogorov-Smirnov test to compare an observed distribution to a theoretical one in Excel

Suppose that the quality scores of brand A were obtained in France. For US customers, this score follows a normal distribution with a mean of 21.5 and a standard deviation of 2.3. We may ask ourselves if the French scores distribution is significantly different from the theoretical distribution of the US scores. Here again, we will use the Kolmogorov-Smirnov test. The only difference with the previous part is that we aim at comparing an observed distribution to a theoretical one instead of comparing two different distributions.

Go to XLSTAT / Nonparametric tests / Distribution fitting.
XLSTAT Function Distribution fitting
In the General tab, select the brand A data, the normal distribution, activate the Enter option and enter the following parameters: µ = 21.5 and sigma = 2.3. In the Charts tab, activate the Cumulative histograms option. Click on the OK button.

Kolmogorov mono-sample setting up

In the results sheet, the histogram (on the left below) shows that the observed distribution lays on low score values compared to the theoretical curve reflecting the US scores distribution (red line).

Kolmogorov mono-sample charts

Kolmogorov-Smirnov test for one sample: p-value

The Kolmogorov-Smirnov test is associated to a p-value of 0.000 suggesting that the observed distribution is significantly different from the theoretical one at alpha = 0.05.

Not sure you chose the right test? This guide will let you know.

1c26995d494fb3061dd0ae8571ffc0a4@xlstat.desk-mail.com
https://cdn.desk.com/
false
desk
Loading
seconds ago
a minute ago
minutes ago
an hour ago
hours ago
a day ago
days ago
about
false
Invalid characters found
/customer/portal/articles/autocomplete
9283