Breusch-Pagan & White heteroscedasticity tests in Excel

This tutorial will help you run and interpret heteroscedasticity tests - Breusch-Pagan & White tests - in Excel using the XLSTAT software.

Dataset for running Breusch-Pagan and White heteroscedasticity tests in XLSTAT

For this tutorial we use an artificial dataset we built on purpose to compare a homoscedastic model to another one with strong heteroscedasticity. The data correspond to an experiment aiming at testing the effect of age (measured in days) on sugar content and on size of a new fruit variety.

Two simple linear regressions were carried out considering age as the explanatory variable. We used sugar content and size as dependent variables in the first and in the second regression, respectively. Residuals of the two regressions are displayed in the dataset.

The first regression (sugar content) presents homoscedasticity whereas the second (size) is strongly heteroscedastic.

To see how we technically generated the data, please go to the last section of this tutorial.

Goal of this tutorial

The aim of this tutorial is to check if the variability of a dependent variable (for example: sugar content or size) changes with an explanatory variable (age) in a linear regression. Technically, we are asking if regression residuals are heterogeneously distributed along the explanatory variable. If that is the case, we speak about heteroscedasticity. Very often, the size of an organism is more variable with age. Compare babies to adults: babies have relatively “standard” heights whereas adult heights are very variable. This is a typical case of heteroscedasticity.

We will use the Breusch-Pagan and White heteroscedasticity tests to show how these tests work in two extreme situations: homoscedasticity and strong heteroscedasticity.

Breusch-Pagan and White heteroscedasticity tests: what hypothesis are we testing?

Heteroscedasticity tests imply the two following hypotheses.

H0 (null hypothesis): data is homoscedastic.

Ha (alternative hypothesis): data is heteroscedastic.

Therefore, if the p-value associated to a heteroscedasticity test falls below a certain threshold (0.05 for example), we would conclude that the data is significantly heteroscedastic.

Performing Breusch-Pagan and White heteroscedasticity tests in XLSTAT

Open the XLSTAT menu and click on Time / Tests for heteroscedasticity. Select the Residuals(Sugar) column in the Residuals box, and the Age column in the explanatory variables box. Check the White test checkbox and launch the analysis by clicking on the OK button. The results of this first analysis are displayed in a new sheet.

Heteroscedasticity tests dialog box

Repeat the same procedure with the Residuals(Size) column selected in the Residuals box.

Interpretation

For the Sugar content variable, the residuals / Age chart shows a relatively homogeneous distribution of residuals along the Age variable.

Heteroscedasticity: Sugar residuals/Age chart

Furthermore, both tests show high p-values (0.322 for the Breusch-Pagan test and 0.296 for the White test) suggesting that we cannot reject the null hypothesis that the residuals are homoscedatstic.

Heteroscedasticity: BP test result for sugar content

For the Size variable, the residuals / Age chart shows that residuals clearly become more variable as the fruits grow older. This cone-like shape is a very common case of heteroscedasticity.

Heteroscedasticity: Size residuals/Age chart

In addition, we can see that the p-value for both tests is by far below the significance level of 0.05. So we need to reject the null hypothesis that the residuals are homoscedastic, which matches what is suggested by the chart.

Heteroscedasticity: BP test result for size

Additional information: how the dataset of this tutorial was generated

The Sugar content dependent variable is constructed as the sum of twice the Age variable and a random normal error centered around zero. This error represents residuals. This is a typical case where residuals are independent and identically distributed. For the second dependent variable (Size), residuals become the product of Age by the random normal error. In this case the residuals are obviously not independent. For further information, please take a look at the additional info sheet included in the tutorial’s dataset.

Was this article useful?