Removing duplicates in Excel
This tutorial will show you how to quickly remove duplicate rows in Excel using the XLSTAT software..
Dataset for removing duplicates
The data are fictitious and were created for this tutorial. They represent a sample of sales records of an online shop including the order ID, the customer ID and invoice amount.
Goal of this tutorial
Deduping is necessary when observations are mistakenly duplicated (or repeated) due to input errors. Here, we want to clean the data from duplicated rows in order to obtain a table with the unique sales records.
Setting up a duplicate removal with XLSTAT
- Once XLSTAT is open, select the Data Management command under the Preparing data menu as shown below.
- The Data management dialog box appears.
- Select columns A, B and C in the Data field. Then select the Dedupe method. Headers are included in our data selection, so we check the Variable labels.
Click on the OK button. An XLSTAT report will be generated in a new sheet named Dedupe.
Results of a duplicate removal
Three duplicated records were detected and removed from the initial data. A comparison between the initial table and the deduped table, generated by XLSTAT, is shown below:
Was this article useful?
- Yes
- No