Book Chapter

Using DISTINCT in SQL to eliminate duplicate rows

Eliminating Duplicate Rows

When working with SELECT statements, you'll inevitably come across result sets with duplicate rows. There is no cause for alarm if you see such a result set. Use the DISTINCT keyword in your SELECT statement, and the result set will be free and clear of all duplicate rows. Figure 4–9 shows the syntax diagram for the DISTINCT keyword.


Figure 4–9 The syntax for the DISTINCT keyword

As the diagram illustrates, DISTINCT is an optional keyword that precedes the list of columns specified in the SELECT clause. The DISTINCT keyword asks your database system to evaluate the values of all the columns as a single unit on a row-by-row basis and eliminate any redundant rows it finds. The remaining unique rows are then returned to the result set. The following example shows what a difference the DISTINCT keyword can make under the appropriate circumstances.

Let's say you're posing the following request to the database.

"Which cities are represented by our bowling league membership?"

The question seems easy enough, so you take it through the translation process.

Translation Select city from the bowlers table
Clean Up Select city from the bowlers table
SQL SELECT City

    Requires Free Membership to View


FROM Bowlers

 
You are reading part 3 from "How to use the SELECT statement in SQL," excerpted from SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in SQL (2nd Edition), by John L. Viescas and Michael J. Hernandez, copyright 2007, printed with permission from Addison-Wesley Professional.

The problem is that the result set for this SELECT statement shows every occurrence of each city name found in the Bowlers table. For example, if there are 20 people from Bellevue and 7 people from Kent and 14 people from Seattle, the result set displays 20 occurrences of Bellevue, 7 occurrences of Kent, and 14 occurrences of Seattle. Clearly, this redundant information is unnecessary. All you want to see is a single occurrence of each city name found in the Bowlers table. You resolve this problem by using the DISTINCT keyword in the SELECT statement to eliminate the redundant information.

Let's run the request through the translation process once again using the DISTINCT keyword. Note that we now include the word "distinct" in both the Translation step and the Clean Up step.

"Which cities are represented by our bowling league membership?"

Translation Select the distinct city values from the bowlers table
Clean Up Select the distinct city values from the bowlers table
SQL SELECT DISTINCT City
FROM Bowlers

The result set for this SELECT statement displays exactly what you're looking for–a single occurrence of each distinct (or unique) city found in the Bowlers table.

You can use the DISTINCT keyword on multiple columns as well. Let's modify the previous example by requesting both the city and the state from the Bowlers table. Our new SELECT statement looks like this.

SELECT DISTINCT City, State FROM Bowlers

This SELECT statement returns a result set that contains unique records and shows definite distinctions between cities with the same name. For example, it shows the distinction between "Portland, ME," "Portland, OR," "Hollywood, CA," and "Hollywood, FL." It's worthwhile to note that most database systems sort the output in the sequence in which you specify the columns, so you'll see these values in the sequence "Hollywood, CA," "Hollywood, FL," "Portland, ME," and "Portland, OR." However, the SQL Standard does not require the result to be sorted in this order. If you want to guarantee the sort sequence, read on to the next section to learn about the ORDER BY clause.

The DISTINCT keyword is a very useful tool under the right circumstances. Use it only when you really want to see unique rows in your result set.

 

Caution: For database systems that include a graphical interface, you can usually request that the result set of a query be displayed in an up-datable grid of rows and columns. You can type a new value in a column on a row, and the database system updates the value stored in your table. (Your database system actually executes an UPDATE query on your behalf behind the scenes—you can read more about that in Chapter 15, Updating Sets of Data.) However, in all database systems that we studied, when you include the DISTINCT keyword, the resulting set of rows cannot be updated. To be able to update a column in a row, your database system needs to be able to uniquely identify the specific row and column you want to change. When you use DISTINCT, the values you see in each row are the result of evaluating perhaps dozens of duplicate rows. If you try to update one of the columns, your database won't know which specific row to change. Your database system also doesn't know if perhaps you mean to change all the rows with the same duplicate value.



 

TABLE OF CONTENTS
   Part 1: How to use the SELECT statement in SQL
   Part 2: Translating information requests into SQL SELECT statements
   Part 3: Using DISTINCT in SQL to eliminate duplicate rows
   Part 4: Using the ORDER BY clause of the SELECT query in SQL
   Part 5: SQL SELECT statement and SELECT query samples

 

 

This chapter excerpt from  SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in SQL (2nd Edition) by John L. Viescas and Michael J. Hernandez, is printed with permission from Addison-Wesley Professional, Copyright 2007.

Click here for the chapter download or purchase the book here.

 

This was first published in July 2008

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: