Elastic net regression in Excel
This tutorial explains how to set up and interpret Elastic regression in Excel using the XLSTAT statistical software.
Dataset for Elastic net regression
The original data are from Osborne and al. (1984). The dataset contains, for 30 cookies, the discretizations of near infrared spectra. Each spectrum is observed at wavelengths ranging from 1100 to 2460 nanometers, and spaced by 40 nanometers, which leads to 35 explanatory variables in the dataset. The data table also contains the flour composition of each cookie.
The goal is to predict the flour composition of the different cookies.
Goal of this tutorial
The goal of this tutorial is to set up and interpret an Elastic net regression on the Cookies dataset.
Setting up an Elastic net regression in XLSTAT
Once XLSTAT is open, click on Modeling data / Elastic net regression as shown below:
The Elastic net regression dialog box appears.
You can then select the data on the Excel sheet. Here, the Dependent variable (or variable to model) is the flour composition of the cookies.
The Quantitative explanatory variables correspond to all columns numbered L1 to L35. Here we want to explain the variability of the flour content in the cookies as a function of the spectrum.
Only the first 20 cookies of the dataset are selected to be featured in the modeling, the others will be used for the prediction.
As the name of each variable is displayed at the top of the table, we must check the Variable labels checkbox.
You can also check the Observation labels box and select the name of the different cookies.
In the Options tab, we use Cross-validation to find the optimal Alpha and Lambda parameters. We choose to form 5 folds for the cross-validation and to test 100 values of each parameter.
In the Prediction tab, select the data on which you want to get a prediction. Here we select the last 10 cookies in the dataset.
Configure the Outputs tab as follows:
The computations begin once you have clicked on the OK button.
Interpreting an Elastic net régression
The table “Cross-validation results” gives the values of the parameters retained by the cross-validation: the value of the pair (Lambda ; Alpha) selected by XLSTAT is the one that minimizes the MSE, here the pair (0.012 ; 0.616)
The table “Model parameters” gives details on the model. This table is helpful when predictions are needed or to analyze the importance of variables in the model. We see that only 15 of the 35 variables were kept in the modeling, the others have a zero coefficient.
The last table gives, following the estimation of the model, the predictions of the flour composition of the 10 cookies using in the prediction dataset.
Conclusion on Elastic net regression
Finally, Elastic net regression has modeled the flour composition of the different cookies with only 15 of the 35 available variables: the other variables were judged not explanatory enough or as containing information too similar to one of the already selected variables. Given the same dataset, the LASSO regression would have chosen to keep 11 variables in the model, while the Ridge regression would have kept all 35 variables available. This fact confirms that the Elastic net regression is a compromise method between the Ridge and LASSO regressions.
Was this article useful?