Problem solve Get help with specific problems with your technologies, process and projects.

Using DISTINCT in SQL to eliminate duplicate rows

Learn how to use the DISTINCT keyword to eliminate duplicate rows in this excerpt from "SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in SQL, Second Edition." You'll learn how to use DISTINCT on either single or multiple columns when working with SELECT statements in SQL.

Eliminating Duplicate Rows

When working with SELECT statements, you'll inevitably come across result sets with duplicate rows. There is no cause for alarm if you see such a result set. Use the DISTINCT keyword in your SELECT statement, and the result set will be free and clear of all duplicate rows. Figure 4–9 shows the syntax diagram for the DISTINCT keyword.

Figure 4–9 The syntax for the DISTINCT keyword

As the diagram illustrates, DISTINCT is an optional keyword that precedes the list of columns specified in the SELECT clause. The DISTINCT keyword asks your database system to evaluate the values of all the columns as a single unit on a row-by-row basis and eliminate any redundant rows it finds. The remaining unique rows are then returned to the result set. The following example shows what a difference the DISTINCT keyword can make under the appropriate circumstances.

Let's say you're posing the following request to the database.

"Which cities are represented by our bowling league membership?"

The question seems easy enough, so you take it through the translation process.

Translation Select city from the bowlers table
Clean Up Select city from the bowlers table
FROM Bowlers

The problem is that the result set for this SELECT statement shows every occurrence of each city name found in the Bowlers table. For example, if there are 20 people from Bellevue and 7 people from Kent and 14 people from Seattle, the result set displays 20 occurrences of Bellevue, 7 occurrences of Kent, and 14 occurrences of Seattle. Clearly, this redundant information is unnecessary. All you want to see is a single occurrence of each city name found in the Bowlers table. You resolve this problem by using the DISTINCT keyword in the SELECT statement to eliminate the redundant information.

Let's run the request through the translation process once again using the DISTINCT keyword. Note that we now include the word "distinct" in both the Translation step and the Clean Up step.

"Which cities are represented by our bowling league membership?"

Translation Select the distinct city values from the bowlers table
Clean Up Select the distinct city values from the bowlers table
FROM Bowlers

The result set for this SELECT statement displays exactly what you're looking for–a single occurrence of each distinct (or unique) city found in the Bowlers table.

You can use the DISTINCT keyword on multiple columns as well. Let's modify the previous example by requesting both the city and the state from the Bowlers table. Our new SELECT statement looks like this.


This SELECT statement returns a result set that contains unique records and shows definite distinctions between cities with the same name. For example, it shows the distinction between "Portland, ME," "Portland, OR," "Hollywood, CA," and "Hollywood, FL." It's worthwhile to note that most database systems sort the output in the sequence in which you specify the columns, so you'll see these values in the sequence "Hollywood, CA," "Hollywood, FL," "Portland, ME," and "Portland, OR." However, the SQL Standard does not require the result to be sorted in this order. If you want to guarantee the sort sequence, read on to the next section to learn about the ORDER BY clause.

The DISTINCT keyword is a very useful tool under the right circumstances. Use it only when you really want to see unique rows in your result set.

Caution: For database systems that include a graphical interface, you can usually request that the result set of a query be displayed in an up-datable grid of rows and columns. You can type a new value in a column on a row, and the database system updates the value stored in your table. (Your database system actually executes an UPDATE query on your behalf behind the scenes—you can read more about that in Chapter 15, Updating Sets of Data.) However, in all database systems that we studied, when you include the DISTINCT keyword, the resulting set of rows cannot be updated. To be able to update a column in a row, your database system needs to be able to uniquely identify the specific row and column you want to change. When you use DISTINCT, the values you see in each row are the result of evaluating perhaps dozens of duplicate rows. If you try to update one of the columns, your database won't know which specific row to change. Your database system also doesn't know if perhaps you mean to change all the rows with the same duplicate value.


Exchange 2007 Storage SystemsThis chapter excerpt from  SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in SQL (2nd Edition) by John L. Viescas and Michael J. Hernandez, is printed with permission from Addison-Wesley Professional, Copyright 2007.

Click here for the chapter download or purchase the book here.

This was last published in July 2008

Dig Deeper on Microsoft SQL Server Performance Monitoring and Tuning



Find more PRO+ content and other member only offers, here.

Join the conversation


Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

This is a good introduction to the concept of DISTINCT, but it doesn't cover one of the major faults of DISTINCT which is that it can be fairly slow, particularly on large data sets.  This seems applicable for anyone whom might not be reading the book but rather viewing parts of it online, particularly if they are playing with a database that is for more than just learning (E.G. Production).
Distinct is one keyword that could perhaps be used more, but I wonder how many apps have queries that maybe aren't correctly constructed.  For example a query with certain joins may or may not discard parts of the query if key parts have nulls in them.
I tend to use DISTINCT on 1 or 2 table queries. Anything more than 2 can be tricky to get the correct results. If you need to, join or select from a subset with it's own DISTINCT.