SQL Server 2005 Transact-SQL includes a set of functions that lets you rank the rows in your result set. By including one or more of these SQL Server rank functions in the SELECT clause of your query, you can automatically assign a rank to each row. The way in which the rows are ranked depends on the function you use.
T-SQL currently supports four ranking functions: ROW_NUMBER, RANK, DENSE_RANK, and NTILE. I'll define these rank functions in SQL Server and show you how they work, but let's first look at the test environment I'll be using to demonstrate these functions.
To obtain the data I needed, I used the following code to create the Sales.Quota table in the SQL Server 2005 AdventureWorks sample database:
--Drop Sales.Quotas table if it exists
IF OBJECT_ID (N'Sales.Quotas', N'U') IS NOT NULL
DROP TABLE Sales.Quotas
--Create Sales.Quotas table
SELECT e.FirstName, e.LastName, q.SalesQuota AS Quota,
DATENAME(m,q.QuotaDate) AS [Month], YEAR(q.QuotaDate) AS [Year]
FROM Sales.SalesPersonQuotaHistory q
INNER JOIN HumanResources.vEmployee e
ON q.SalesPersonID = e.EmployeeID
WHERE SalesQuota BETWEEN 210000 and 280000
ORDER BY e.LastName, q.QuotaDate
As you can see, I simply pull data from a couple other tables in the database in order to create a set of meaningful test data.
Here's the SELECT statement I use to query the new table:
ROW_NUMBER() OVER(ORDER BY Quota DESC) AS [RowNumber],
RANK() OVER(ORDER BY Quota DESC) AS [Rank],
DENSE_RANK() OVER(ORDER BY Quota DESC) AS [DenseRank],
NTILE(5) OVER(ORDER BY Quota DESC) AS [NTile],
LastName, Quota, [Month], [Year]
The SELECT statement uses all four ranking functions to rank the rows. I include all the functions in one statement, so you can compare the results returned by each function, as shown in the following result set:
(15 row(s) affected)
As you work through the following sections, refer back to the SELECT statement and the result set as necessary to better understand how the ranking function works.
The ROW_NUMBER function is the most basic of the ranking functions. As you can see in the result set (the RowNumber column), the function numbers each row sequentially, beginning with 1. If you refer back to the query, you'll see that the first element in the SELECT clause is the ROW_NUMBER function. When you use this function, first specify the function name, followed by the empty parentheses. You do not pass any values into the function.
After the ranking function, specify the OVER function. For this function, you pass in an ORDER BY clause as an argument. The clause specifies the column (or columns) you want to rank. In this case, I am ranking the values in the Quota column -- in descending order. As a result, the rows in the result set are ranked starting with the highest Quota amount. If you refer again to the result set, you'll see the row with the highest Quota value is ranked 1 and the row with the lowest value is ranked 15. (The result set contains 15 rows.)
That's all there is to using the ROW_NUMBER function, and the other ranking functions work in much the same way, only the results are slightly different.
RANK function in SQL Server
The next ranking function in the SELECT list is RANK. Once again, you specify the function name, followed by the OVER function, which again includes the ORDER BY clause. However, as you can see in the result set (the Rank column),
the ranked values are slightly different than what you saw for the ROW_NUMBER function. Yes, the highest Quota value is ranked 1, but, because two rows share the same highest value, they are both ranked 1.
When you use the RANK function, all shared values will be ranked the same. But notice that the rank value itself is based on the row's position in the result set, not on the sequential number of the row. For example, the Quota value in the third row is 267,000. That is the second highest Quota value, yet because it falls in the third row, it receives a ranking of 3, rather than 2. The RANK function skips the 2 because the second row matches the first row. If the fourth row shared the same value as the third row, it would also be ranked as 3. But because the value is lower and it is in the fourth row, it is ranked 4.
The DENSE_RANK function takes a different approach. Like the RANK function, the first two rows are assigned a value of 1. However, the DENSE_RANK function uses sequential numbering, rather than tying the rank to the row number. As a result, the third row is assigned a value of 2 because the Quota column contains the second highest value, and the fourth row is assigned a value of 3 because it is the third highest value, and so on.
The ROW_NUMBER, RANK, and DENSE_RANK functions are similar in how they return results. The difference is in whether the numbering is sequential and whether it is tied to the row number. The NTILE function, however, is a bit different than these three functions.
If you refer back to the SELECT statement, you can see that when you specify the NTILE function, you pass in an integer as an argument to the function -- unlike the other ranking functions where you pass in no argument. The NTILE function divides the result set
into the number of groups specified by this argument. For example, in the SELECT statement, I specify 5, which means the result set will be split into five groups. Because there are 15 rows in the result set, each group will contain three rows. The rows are grouped together based on the value in the Quota column.
As a result, the three rows with the highest Quota values are in the first group and receive a ranking of 1. The three rows with the next highest Quota values are in the second group and receive a ranking of 2. and so on. Because there are only five groups, the highest ranking is 5, which is assigned to the group with the three lowest Quota values. Again, refer back to the result set to better understand how the NTILE function groups data and then ranks each group.
As you can see, rank functions in SQL Server are fairly straightforward, and they make ranking your result set a relatively simple process. Each function gives you a slightly different way to rank the result set. For more information about any of these functions, refer to Microsoft SQL Server Books Online.
ABOUT THE AUTHOR
Robert Sheldon is a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. You can find more information at http://www.rhsheldon.com.
This was first published in July 2008