This tutorial shows how to perform a Johnson transformation on your data in Excel using the XLSTAT statistical software.
Included inXLSTAT-Base XLSTAT-Sensory XLSTAT-Marketing XLSTAT-Forecast XLSTAT-Biomed XLSTAT-Ecology XLSTAT-Psy XLSTAT-Quality XLSTAT-Premium
Goal of this tutorial
Many statistical methods assume that either the original data or some outputs are normally distributed. While some phenomena naturally follow a normal distribution, this is most of the time, not the case. While this might not always be a problem for making decisions when the conditions on normality are mild or if the decisions are not critical, this can become a real problem if the decision can be of high impact. In such cases, a transformation is required. In this example, we will use the Johnson transformation.
Dataset for variable transformation
An Excel sheet with both the data and the results can be downloaded by clicking on the button below:
Download the data
The dataset contains the measurements of waste in the production for 47 batches. We would like to perform a regression with several process variables but the hypothesis of Normality of the variable Level of waste is violated. This can be easily verified using the Normality test and the Histogram XLSTAT functions. The results are displayed below.
We, therefore, need to make a transformation of this variable before attempting a linear regression.
Setting up a Johnson transformation within XLSTAT
Once XLSTAT is activated, select the XLSTAT / Preparing data / Variables transformation command.
The Variables transformation dialog box opens. In the General tab, select the variables you wish to transform in the Data field. In this example, we select the variable Level of waste in column B.
The most general transformation is an unbiased standardization (Standardize (n-1)). However, there are more transformations available when you activate the option Other.
In the Transformations tab, select the option Johnson transformation. Set the alpha equal to 0.05 and select the Shapiro-Wilk test.
For the Johnson transformation, the algorithm described by Chou et al. (1998) based on the percentiles has been implemented in XLSTAT. The method allows to both find the best form of the three possible Johnson distributions and to estimate good values of the parameters. While the authors use the Shapiro Wilk test to assess the normality, XLSTAT allows you to use this same test or the Anderson-Darling test.
Click on OK to start the computations.
Results of the Johnson transformation
In the result sheet called Variables transformation, you will find the Transformed data table.
Let's check now the Normality tests and the histograms on the transformed data. The p-value of all four normality tests are higher than 0.05 thus we cannot reject the null hypothesis that the variable from which the sample was extracted follows a Normal distribution. The histogram also indicates that the transformed variable Level of waste follows a Normal distribution.