# Johnson transformation tutorial in Excel

This tutorial shows how to perform a Johnson transformation on your data in Excel using the XLSTAT statistical software.

## Goal of this tutorial

Many statistical methods assume that either the original data or some outputs are normally distributed. While some phenomena naturally follow a normal distribution, this is most of the time, not the case. While this might not always be a problem for making decisions when the conditions on normality are mild or if the decisions are not critical, this can become a real problem if the decision can be of high impact. In such cases, a transformation is required. In this example, we will use the Johnson transformation.

## Dataset for variable transformation

The dataset contains the measurements of waste in production for 47 batches. We would like to perform a regression with several process variables, but the hypothesis of Normality of the variable Level of waste is violated. This can be easily verified using the Normality test and the Histogram XLSTAT functions. The results are displayed below. We, therefore, need to make a transformation of this variable before attempting a linear regression.

## Setting up a Johnson transformation within XLSTAT

• Open XLSTAT.

• Select the XLSTAT / Preparing data / Variables transformation.

• Select the variables you wish to transform in the Data field. In this example, we select the variable Level of waste in column B.

• Click on Other.

• Select the option Johnson transformation.

• Set the alpha equal to 0.05 and select the Shapiro-Wilk test.

• Select the Shapiro-Wilk test. (N.B.: While the authors use the Shapiro-Wilk test to assess the normality, XLSTAT allows you to use this same test or the Anderson-Darling test.)

• Click on OK. ## Results of the Johnson transformation

In the result sheet called Variables transformation, you will find the Transformed data table and the formula used to transform this variable.  Let's check now the Normality tests and the histograms on the transformed data. The p-value of all four normality tests are higher than 0.05 thus we cannot reject the null hypothesis that the variable from which the sample was extracted follows a Normal distribution. The histogram also indicates that the transformed variable Level of waste follows a Normal distribution.  