Box-Cox transformation tutorial in Excel
Dataset for variable transformation
In this tutorial, we show how to create and transform a variable to be closer to the Normal distribution.
The dataset contains the measurements of waste in production for 47 batches. We would like to make a regression with several process variables, but the hypothesis of Normality of the variable Level of waste is not acceptable.
The results of the Normality test are displayed below.
We need to make a transformation of this variable before attempting a multilinear regression. After showing different options to transform data, we will use the Box-Cox transformation of XLSTAT.
Variable transformation in XLSTAT
There are several ways to transform data in XLSTAT.
Variable transformation with Microsoft Excel tools
First, you can take advantage of Microsoft Excel and use the available function in the software.
Place the cursor where you would like to have the results displayed.
Click on the fx icon above the spreadsheet to access the menu Insert Function.
Select one of the functions listed under either Financial, Math & Trig, Statistical, Database or XLSTAT (last entry).
Variable transformation with XLSTAT tools
In XLSTAT, we offer you the opportunity to use some more specific functions. You will find them in the option Preparing data / Variables transformation.
Setting up a Box-Cox transformation
Select the XLSTAT / Preparing data / Variables transformation. The dialog box pops up.
Select the variable Level of waste to transform it.
Tick the Column labels options.
Select the batch identifications in column A as observation labels.
Tick the option Other.
Select the option Box-Cox transformation, as we are trying to get the variable “Level of waste” closer to a Normal distribution.
Select the option Optimize to let XLSTAT find the best Lambda.
Click on OK.
Results of the Box-Cox transformation
In the result sheet called Variables transformation, you will find the transformed data with the value of Lambda used.
A general formula is displayed, as well as the formula used to transform this particular variable.
You can now compute the Normality test on those transformed data. As you can see below, the transformed variable Level of waste is following a Normal distribution.
Was this article useful?