Home > Using DISTINCT in SQL to eliminate duplicate rows
Book Chapter:
EMAIL THIS

Using DISTINCT in SQL to eliminate duplicate rows

30 Jul 2008 | John L. Viescas and Michael J. Hernandez

Expert advice on database administration
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google

Eliminating Duplicate Rows

When working with SELECT statements, you'll inevitably come across result sets with duplicate rows. There is no cause for alarm if you see such a result set. Use the DISTINCT keyword in your SELECT statement, and the result set will be free and clear of all duplicate rows. Figure 4–9 shows the syntax diagram for the DISTINCT keyword.


Figure 4–9 The syntax for the DISTINCT keyword

As the diagram illustrates, DISTINCT is an optional keyword that precedes the list of columns specified in the SELECT clause. The DISTINCT keyword asks your database system to evaluate the values of all the columns as a single unit on a row-by-row basis and eliminate any redundant rows it finds. The remaining unique rows are then returned to the result set. The following example shows what a difference the DISTINCT keyword can make under the appropriate circumstances.

Let's say you're posing the following request to the database.

"Which cities are represented by our bowling league membership?"

The question seems easy enough, so you take it through the translation process.

Translation Select city from the bowlers table
Clean Up Select city from the bowlers table
SQL SELECT City
FROM Bowlers

You are reading part 3 from "How to use the SELECT statement in SQL," excerpted from SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in SQL (2nd Edition), by John L. Viescas and Michael J. Hernandez, copyright 2007, printed with permission from Addison-Wesley Professional.

The problem is that the result set for this SELECT statement shows every occurrence of each city name found in the Bowlers table. For example, if there are 20 people from Bellevue and 7 people from Kent and 14 people from Seattle, the result set displays 20 occurrences of Bellevue, 7 occurrences of Kent, and 14 occurrences of Seattle. Clearly, this redundant information is unnecessary. All you want to see is a single occurrence of each city name found in the Bowlers table. You resolve this problem by using the DISTINCT keyword in the SELECT statement to eliminate the redundant information.

Let's run the request through the translation process once again using the DISTINCT keyword. Note that we now include the word "distinct" in both the Translation step and the Clean Up step.

"Which cities are represented by our bowling league membership?"
Translation Select the distinct city values from the bowlers table
Clean Up Select the distinct city values from the bowlers table
SQL SELECT DISTINCT City
FROM Bowlers

The result set for this SELECT statement displays exactly what you're looking for–a single occurrence of each distinct (or unique) city found in the Bowlers table.

You can use the DISTINCT keyword on multiple columns as well. Let's modify the previous example by requesting both the city and the state from the Bowlers table. Our new SELECT statement looks like this.

SELECT DISTINCT City, State FROM Bowlers

This SELECT statement returns a result set that contains unique records and shows definite distinctions between cities with the same name. For example, it shows the distinction between "Portland, ME," "Portland, OR," "Hollywood, CA," and "Hollywood, FL." It's worthwhile to note that most database systems sort the output in the sequence in which you specify the columns, so you'll see these values in the sequence "Hollywood, CA," "Hollywood, FL," "Portland, ME," and "Portland, OR." However, the SQL Standard does not require the result to be sorted in this order. If you want to guarantee the sort sequence, read on to the next section to learn about the ORDER BY clause.

The DISTINCT keyword is a very useful tool under the right circumstances. Use it only when you really want to see unique rows in your result set.

Caution: For database systems that include a graphical interface, you can usually request that the result set of a query be displayed in an up-datable grid of rows and columns. You can type a new value in a column on a row, and the database system updates the value stored in your table. (Your database system actually executes an UPDATE query on your behalf behind the scenes—you can read more about that in Chapter 15, Updating Sets of Data.)

However, in all database systems that we studied, when you include the DISTINCT keyword, the resulting set of rows cannot be updated. To be able to update a column in a row, your database system needs to be able to uniquely identify the specific row and column you want to change. When you use DISTINCT, the values you see in each row are the result of evaluating perhaps dozens of duplicate rows. If you try to update one of the columns, your database won't know which specific row to change. Your database system also doesn't know if perhaps you mean to change all the rows with the same duplicate value.


TABLE OF CONTENTS
   Part 1: How to use the SELECT statement in SQL
   Part 2: Translating information requests into SQL SELECT statements
   Part 3: Using DISTINCT in SQL to eliminate duplicate rows
   Part 4: Using the ORDER BY clause of the SELECT query in SQL
   Part 5: SQL SELECT statement and SELECT query samples

Exchange 2007 Storage Systems This chapter excerpt from SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in SQL (2nd Edition) by John L. Viescas and Michael J. Hernandez, is printed with permission from Addison-Wesley Professional, Copyright 2007.

Click here for the chapter download or purchase the book here.



Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


RELATED CONTENT
SQL/Transact SQL (T-SQL)
Working with sparse columns in SQL Server 2008
Determining the source of full transaction logs in SQL Server
New GROUP BY option provides better data control in SQL Server 2008
Using the OPENROWSET function in SQL Server
Loading data files with SQL Server's BULK INSERT statement
Importing and exporting bulk data with SQL Server's bcp utility
Testing transaction log autogrowth behavior in SQL Server
Securing SQL Server with access control, login monitoring and DDL triggers
Top 10 SQL Server development tips of 2008
The sqlcmd utility in SQL Server
SQL/Transact SQL (T-SQL) Research

Microsoft SQL Server Performance Monitoring and Tuning
Performance implications of transaction log autogrowth in SQL Server
The short course on how SQL Server really works
Determining the source of full transaction logs in SQL Server
Improving SQL Server full-text search performance
New GROUP BY option provides better data control in SQL Server 2008
Microsoft SQL Server 2008 Resource Governor primer
Examining data files when SQL Server tempdb is full
Testing transaction log autogrowth behavior in SQL Server
Meeting business needs with SQL Server full-text search
Using dynamic management views to improve SQL Server index effectiveness

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
ACID  (SearchSQLServer.com)
commit  (SearchSQLServer.com)
DAO  (SearchSQLServer.com)
fetch  (SearchSQLServer.com)
OLE DB  (SearchSQLServer.com)
query  (SearchSQLServer.com)
SQL  (SearchSQLServer.com)
T-SQL  (SearchSQLServer.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary




Secure SQL - Data Security for Your Database
HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts