Your data analysis solution

Removing duplicates in Excel

2017-10-20
This tutorial will show you how to quickly remove duplicate rows in Excel using the XLSTAT software .

Dataset for removing duplicates

An Excel sheet with both the data and the results can be downloaded by clicking on the button below:
Download the data

The data are fictitious and were created for this tutorial. They represent a sample of sales records of an online shop including the order ID, the customer ID and invoice amount.

Goal of this tutorial

Deduping is necessary when observations are mistakenly duplicated (or repeated) due to input errors. Here, we want to clean the data from duplicated rows in order to obtain a table with the unique sales records.

Setting up a duplicate removal with XLSTAT

1. Once XLSTAT is open, select the Data Management command under the Preparing data menu as shown below.
Preparing Data menu in XLSTAT
2. The Data management dialog box appears.
Data management dialog box in XLSTAT
3. Select columns A, B and C in the Data field. Then select the Dedupe method. Headers are included in our data selection, so we check the Variable labels.

Click on the OK button. An XLSTAT report will be generated in a new sheet named Dedupe.

Results of a duplicate removal

Three duplicated records were detected and removed from the initial data. A comparison between the initial table and the deduped table, generated by XLSTAT, is shown below:
XLSTAT output: deduped table
1c26995d494fb3061dd0ae8571ffc0a4@xlstat.desk-mail.com
https://cdn.desk.com/
false
desk
Loading
seconds ago
a minute ago
minutes ago
an hour ago
hours ago
a day ago
days ago
about
false
Invalid characters found
/customer/portal/articles/autocomplete
9283