MISSING DATA IMPUTATION USING EM ALGORITHM IN EXCEL
This tutorial shows how to impute missing data in Excel using the EM (Expectation Maximization) algorithm with the XLSTAT software.
Dataset for completing missing data with EM algorithm
The dataset used to illustrate missing values imputation with the EM algorithm is the famous Fisher iris dataset with missing values randomly introduced.
Setting up missing data imputation in XLSTAT
Select the XLSTAT/ Preparing data / Missing data feature as shown below:
The Missing data dialog box appears.
In the Quantitative data field, select the B columns from H to K that correspond to the dataset with the missing values introduced randomly. Choose to estimate the missing data using the EM algorithm. Once you click the OK button, the calculations start and the results are displayed.
Results of the EM imputation process in XLSTAT
The chart representing missing data in red is displayed. There are no particular patterns in the data structure.
Descriptive statistics before and after imputation follow.
Finally, the completed data set is provided where data initially missing are displayed in bold.
If we compare the imputed data (table above) with the original data set without missing values (table below), we notice that the imputed values are close to the true ones. For example, we get 32.8 instead of 33 for the first observation. This method is therefore much more relevant in this case than another type of imputation such as replacement by the mean.
Was this article useful?
- Yes
- No