# MISSING DATA IMPUTATION USING EM ALGORITHM IN EXCEL

This tutorial shows how to impute missing data in Excel using the EM (Expectation Maximization) algorithm with the XLSTAT software.

## Dataset for completing missing data with EM algorithm

The dataset used to illustrate missing values imputation with the EM algorithm is the famous Fisher iris dataset with missing values randomly introduced.

## Setting up missing data imputation in XLSTAT

Select the **XLSTAT/ Preparing data / Missing data** feature as shown below:

The **Missing data dialog box** appears.

In the **Quantitative data** field, select the B columns from H to K that correspond to the dataset with the missing values introduced randomly. Choose to estimate the missing data using the **EM algorithm**. Once you click the **OK** button, the calculations start and the results are displayed.

## Results of the EM imputation process in XLSTAT

The **chart** representing missing data in red is displayed. There are no particular patterns in the data structure.

**Descriptive statistics** before and after imputation follow.

Finally, the **completed** **data set** is provided where data initially missing are displayed in bold.

If we compare the imputed data (table above) with the original data set without missing values (table below), we notice that the imputed values are close to the true ones. For example, we get 32.8 instead of 33 for the first observation. This method is therefore much more relevant in this case than another type of imputation such as replacement by the mean.

Was this article useful?

- Yes
- No