Q

Removing duplicate records

My database has several duplicate records, and I want to remove those duplicate records, despite the fact that one of those will remain in the database.

The fact that you have duplicate rows indicates you have no primary or unique key which is also an indication of bad database design. The easiest way to solve this problem is to add a unique key to your table, either temporarily or permanently, so that individual rows can be identified.

So if for example you have a table with the following structure

 CREATE TABLE tblDuplicates ( col1 INT, col2 CHAR(2), col3 MONEY )
And you INSERT the following duplicate rows within your database:
 INSERT tblDuplicates VALUES(1,'AB',123) INSERT tblDuplicates VALUES(1,'AB',123) INSERT tblDuplicates VALUES(1,'AB',123) INSERT tblDuplicates VALUES(2,'CD',456) INSERT tblDuplicates VALUES(2,'CD',456) INSERT tblDuplicates VALUES(3,'EF',789) INSERT tblDuplicates VALUES(4,'GH',912)
When we SELECT from this table we can see more clearly that we have a problem.
 SELECT * FROM tblDuplicates col1 col2 col3 ----------- ---- --------------------- 1 AB 123.0000 1 AB 123.0000 1 AB 123.0000 2 CD 456.0000 2 CD 456.0000 3 EF 789.0000 4 GH 912.0000
However now if we add a column to our table that contains a unique value for every row we will be able to identify each row individually. The easiest way to do this is with an IDENTITY column.
 ALTER TABLE tblDuplicates ADD colUnique INT IDENTITY(1,1)
Now when we SELECT from our table we can see that the new column in each row, including the duplicate rows, has a unique value.
 SELECT * FROM tblDuplicates col1 col2 col3 colUnique ----------- ---- --------------------- ----------- 1 AB 123.0000 1 1 AB 123.0000 2 1 AB 123.0000 3 2 CD 456.0000 4 2 CD 456.0000 5 3 EF 789.0000 6 4 GH 912.0000 7
Now we can identify each row individually we can do something about removing the duplicated rows. The following DELETE statement removes all rows that have columns that match exactly with another row, except the row that has the minimum colUnqiue value (thereby always leaving one row remaining after the delete for each set of duplicated rows).
 DELETE tblDuplicates FROM tblDuplicates td1 WHERE colUnique<>( SELECT MIN(colUnique) FROM tblDuplicates td2 WHERE td1.Col1 = td2.Col1 AND td1.Col2 = td2.Col2 AND td1.Col3 = td2.Col3 )
We can now remove our temporary unique column:
 ALTER TABLE tblDuplicates DROP COLUMN colUnique
And again we SELECT from our table to retrieve all new set of non-duplicated rows.
 SELECT * FROM tblDuplicates col1 col2 col3 ----------- ---- --------------------- 1 AB 123.0000 2 CD 456.0000 3 EF 789.0000 4 GH 912.0000
However as our table doesn't enforce uniqueness this situation is likely to happen again. You best bet is to rework you database design to include a permanent primary key on the table so you are always able to individually identify rows. And always, always make sure you have a good backup before running this type of operation on your tables as it is easy to make a mistake.
This was first published in June 2002

Dig deeper on Microsoft SQL Server Installation

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchDataCenter

SearchDataManagement

SearchAWS

SearchOracle

SearchContentManagement

SearchWindowsServer

Close