Using DISTINCT in SQL to eliminate duplicate rows

Learn how to use the DISTINCT keyword to eliminate duplicate rows in this excerpt from "SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in SQL, Second Edition." You'll learn how to use DISTINCT on either single or multiple columns when working with SELECT statements in SQL.

This Content Component encountered an error

Eliminating Duplicate Rows


When working with SELECT statements, you'll inevitably come across result sets with duplicate rows. There is no cause for alarm if you see such a result set. Use the DISTINCT keyword in your SELECT statement, and the result set will be free and clear of all duplicate rows. Figure 4–9 shows the syntax diagram for the DISTINCT keyword.


Figure 4–9 The syntax for the DISTINCT keyword

As the diagram illustrates, DISTINCT is an optional keyword that precedes the list of columns specified in the SELECT clause. The DISTINCT keyword asks your database system to evaluate the values of all the columns as a single unit on a row-by-row basis and eliminate any redundant rows it finds. The remaining unique rows are then returned to the result set. The following example shows what a difference the DISTINCT keyword can make under the appropriate circumstances.

Let's say you're posing the following request to the database.

"Which cities are represented by our bowling league membership?"

The question seems easy enough, so you take it through the translation process.

Translation Select city from the bowlers table
Clean Up Select city from the bowlers table
SQL SELECT City
FROM Bowlers
 

The problem is that the result set for this SELECT statement shows every occurrence of each city name found in the Bowlers table. For example, if there are 20 people from Bellevue and 7 people from Kent and 14 people from Seattle, the result set displays 20 occurrences of Bellevue, 7 occurrences of Kent, and 14 occurrences of Seattle. Clearly, this redundant information is unnecessary. All you want to see is a single occurrence of each city name found in the Bowlers table. You resolve this problem by using the DISTINCT keyword in the SELECT statement to eliminate the redundant information.

Let's run the request through the translation process once again using the DISTINCT keyword. Note that we now include the word "distinct" in both the Translation step and the Clean Up step.

"Which cities are represented by our bowling league membership?"

Translation Select the distinct city values from the bowlers table
Clean Up Select the distinct city values from the bowlers table
SQL SELECT DISTINCT City
FROM Bowlers
 

The result set for this SELECT statement displays exactly what you're looking for–a single occurrence of each distinct (or unique) city found in the Bowlers table.

You can use the DISTINCT keyword on multiple columns as well. Let's modify the previous example by requesting both the city and the state from the Bowlers table. Our new SELECT statement looks like this.

SELECT DISTINCT City, State FROM Bowlers

This SELECT statement returns a result set that contains unique records and shows definite distinctions between cities with the same name. For example, it shows the distinction between "Portland, ME," "Portland, OR," "Hollywood, CA," and "Hollywood, FL." It's worthwhile to note that most database systems sort the output in the sequence in which you specify the columns, so you'll see these values in the sequence "Hollywood, CA," "Hollywood, FL," "Portland, ME," and "Portland, OR." However, the SQL Standard does not require the result to be sorted in this order. If you want to guarantee the sort sequence, read on to the next section to learn about the ORDER BY clause.

The DISTINCT keyword is a very useful tool under the right circumstances. Use it only when you really want to see unique rows in your result set.

Caution: For database systems that include a graphical interface, you can usually request that the result set of a query be displayed in an up-datable grid of rows and columns. You can type a new value in a column on a row, and the database system updates the value stored in your table. (Your database system actually executes an UPDATE query on your behalf behind the scenes—you can read more about that in Chapter 15, Updating Sets of Data.) However, in all database systems that we studied, when you include the DISTINCT keyword, the resulting set of rows cannot be updated. To be able to update a column in a row, your database system needs to be able to uniquely identify the specific row and column you want to change. When you use DISTINCT, the values you see in each row are the result of evaluating perhaps dozens of duplicate rows. If you try to update one of the columns, your database won't know which specific row to change. Your database system also doesn't know if perhaps you mean to change all the rows with the same duplicate value.

TABLE OF CONTENTS


Exchange 2007 Storage SystemsThis chapter excerpt from  SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in SQL (2nd Edition) by John L. Viescas and Michael J. Hernandez, is printed with permission from Addison-Wesley Professional, Copyright 2007.

Click here for the chapter download or purchase the book here.

This was first published in July 2008

Dig deeper on Microsoft SQL Server Performance Monitoring and Tuning

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchDataCenter

SearchDataManagement

SearchAWS

SearchOracle

SearchContentManagement

SearchWindowsServer

Close