# RIDGE regression in Excel

This tutorial explains how to set up and interpret Ridge regression in Excel using the XLSTAT statistical software.

**Dataset for Ridge regression**

The original data are from Osborne and al. (1984). The dataset contains, for 30 cookies, the discretizations of near infrared spectra. Each spectrum is observed at wavelengths ranging from 1100 to 2460 nanometers, and spaced by 40 nanometers, which leads to 35 explanatory variables in the dataset. The data table also contains the water composition of each cookie.

**Goal of this tutorial**

The goal of this tutorial is to set up and interpret a Ridge regression in order to predict the water composition of the different cookies.

**Setting up a Ridge regression in XLSTAT**

Once XLSTAT is open, click on **Modeling data** / **Ridge regression** as shown below:

The Ridge regression dialog box appears.

You can then select the data on the Excel sheet. Here, the **Dependent variable** (or variable to model) is the water composition of the cookies.

The **Quantitative explanatory variables** correspond to all columns numbered L1 to L35. Here we want to explain the variability of the water content in the cookies as a function of the spectrum.

Only the first 20 cookies of the dataset are selected to be featured in the modeling, the others will be used for the prediction.

As the name of each variable is displayed at the top of the table, we must check the **Variable labels** checkbox.

You can also check the **Observation labels** box and select the name of the different cookies.

In the **Options** tab, we use **Cross-validation** to find the optimal lambda regularization parameter. We choose to form 5 folds for the cross-validation and to test 100 lambda values.

In the **Prediction** tab, select the data on which you want to get a prediction. Here we select the last 10 cookies in the dataset.

Configure the **Outputs** tab as follows:

In the **Charts** tab, you can activate the option **Evolution of the MSE (Cross-validation)** which allows you to have access to the evolution of the MSE (Mean Squared Error) depending on the regularization parameter.

The computations begin once you have clicked on the **OK** button.

**Interpreting a Ridge regression**

The table “**Model parameters**” gives details on the model. This table is helpful when predictions are needed or to analyze the importance of variables in the model. We see that all 35 variables were kept in the modeling with low-value and relatively homogeneous coefficients.

The following chart shows the **evolution of the MSE** depending on the Lambda regularization parameter. The Lambda value selected by XLSTAT is the one that minimizes the MSE, that is to say here approximately **0.133.**

The last table gives, following the estimation of the model, the predictions of the water composition of the 10 cookies using in the prediction dataset.

**Conclusion on Ridge regression**

Finally, distinct from LASSO regression, Ridge regression has modeled the water composition of the different cookies with all 35 variables available. At the same time, unlike the LASSO regression, no variable really stands out with a large coefficient in the modeling constructed by the Ridge regression.

Was this article useful?

- Yes
- No