Simplify queries with SQL Server 2005 common table expressions (CTEs)

Common table expressions (CTEs) are a handy alternative to using derived tables and views in SQL Server 2005 for retrieving data. There's no need to repeat complex code because CTEs separate code into unique units and they're self-referencing within your query. Here's a CTE how-to with examples, including details for one of its most valuable uses – creating a recursive query.

SQL Server 2005 introduced a valuable new Transact-SQL language component – a common table expression (CTE) --

that can be a handy alternative to derived tables and views. By using CTEs, you can create named result sets to reference from within your SELECT, INSERT, UPDATE, and DELETE statements without having to persist any metadata about the result set structure. In this article, I explain how to create CTEs in SQL Server 2005 – including how to create a recursive query using a CTE – and provide several examples that demonstrate how they work. Note that for these examples I use the AdventureWorks sample database that ships with SQL Server 2005.

Creating a basic CTE in SQL Server 2005

You construct CTEs by adding a WITH clause before the SELECT, INSERT, UPDATE, or DELETE statement. The following syntax shows the basic structure of the WITH clause and CTE definition:

[WITH <CTE_definition> [,...n]]
<SELECT, INSERT, UPDATE, or DELETE statement that
calls the CTEs>

<CTE_definition>::=
CTE_name [(column_name [,...n ])]
AS
(
CTE_query
)

As the syntax shows, you can define multiple CTEs within the optional WITH clause. The CTE definition includes the name of the CTE, the CTE column names, the AS keyword and the CTE query enclosed in parentheses. Note that the number of CTE column names must match the number of columns returned by the CTE query. In addition, the column names are optional if the CTE query supplies all column names.

Now that you have a basic idea of the CTE syntax in SQL Server, let's look at an example of a CTE definition to help better understand this syntax. The following example defines a CTE named ProductSold and then references the CTE within a SELECT statement:

WITH ProductSold (ProductID, TotalSold)
AS
(
SELECT ProductID, SUM(OrderQty)
FROM Sales.SalesOrderDetail
GROUP BY ProductID
)
SELECT p.ProductID, p.Name, p.ProductNumber,
ps.TotalSold
FROM Production.Product AS p
INNER JOIN ProductSold AS ps
ON p.ProductID = ps.ProductID

As you can see, a WITH clause precedes the SELECT statement that references the CTE. The first line of the WITH clause includes the name of the CTE (ProductSold) and the names of the two columns within the CTE (ProductID and TotalSold). Next comes the AS keyword, followed by the CTE query in parentheses. In this case, the CTE query returns the total number of individual products sold.

The SELECT statement that follows the WITH clause references the CTE by name when joining the Product table to the CTE, based on the ProductID columns. The statement calls the CTE as it would a table or view. However, unlike a table or view, the CTE is available only to the statement that immediately follows the WITH clause. If you reference the CTE in a subsequent statement -- without redefining the CTE -- you'll receive an error.

One advantage to using common table expressions is that you can reference a CTE multiple times in the calling statement. For example, the following statement defines a CTE named Employees and then calls that CTE twice in the SELECT statement that follows the WITH clause:

WITH Employees (EmpID, MgrID, FName, LName, Email)
AS
(
SELECT e.EmployeeID, e. ManagerID,
c.FirstName, c.LastName, c.EmailAddress
FROM HumanResources.Employee e
INNER JOIN Person.Contact c
ON e.ContactID = c.ContactID
)
SELECT e.FName + ' ' + e.LName AS EmpName,
e.Email AS EmpEmail,
m.FName + ' ' + m.LName AS MgrName,
m.Email AS MgrEmail
FROM Employees e
LEFT OUTER JOIN Employees m
ON e.MgrID = m.EmpID

In this example, the SQL Server CTE query returns a list of employee IDs, names and email addresses, as well as their managers' IDs. The SELECT statement following the WITH clause then joins the CTE with itself to return the managers' names and email addresses. You can achieve the same results by using derived tables (subqueries), but that means repeating the same subquery multiple times and working with code that's more complex.

Creating multiple CTEs in a WITH clause

As you saw in the CTE syntax, you can define multiple CTEs in your WITH clause and then call each of those CTEs as often as necessary in the statement that follows. Take a look at the example below that demonstrates how this works. The following WITH clause includes two CTE definitions:

WITH
Cost (ProductID, AvgCost)
AS
(
SELECT ProductID, AVG(StandardCost)
FROM Production.ProductCostHistory
GROUP BY ProductID
),
Sold (ProductID, AvgSold)
AS
(
SELECT ProductID, AVG(OrderQty)
FROM Sales.SalesOrderDetail
GROUP BY ProductID
)
SELECT p.ProductID, p.Name,
(AvgCost * AvgSold)AS TotalCost
FROM Sold s
INNER JOIN Production.Product p
ON s.ProductID = p.ProductID
INNER JOIN Cost c
ON p.ProductID = c.ProductID

The first CTE definition defines a CTE named Cost, and the second definition defines one named Sold. The Cost CTE returns the average cost of each product based on its cost history. The Sold CTE returns the average number of products sold per order. The SELECT statement after the WITH clause joins these two CTEs with the Product table to return the total cost of each product based on the average number sold and the average cost of the product.

You can easily define multiple CTEs within the WITH clause, but you can take this a step further by defining CTEs that reference the CTEs defined before it. For example, the WITH clause below defines three CTEs (Cost, Sold, and Total):

WITH
Cost (ProductID, AvgCost)
AS
(

SELECT ProductID, AVG(StandardCost)
FROM Production.ProductCostHistory
GROUP BY ProductID
),
Sold (ProductID, AvgSold)
AS
(
SELECT ProductID, AVG(OrderQty)
FROM Sales.SalesOrderDetail
GROUP BY ProductID
),
Total (ProductID, TotalCost)
AS
(
SELECT c.ProductID, (AvgCost * AvgSold)
FROM Cost c
INNER JOIN Sold s
ON c.ProductID = s.ProductID
)
SELECT p.ProductID, p.Name, t.TotalCost
FROM Production.Product p
INNER JOIN Total t
ON p.ProductID = t.ProductID

Notice that the CTE query in the Total CTE definition joins the Cost and Sold CTEs, whose CTE definitions precede Total. You cannot reference an undefined CTE. For example, you cannot reference the Total CTE within the Sold CTE. Now, the SELECT statement that follows the WITH clause needs to join the Product table to the Total CTE only, rather than joining to both the Cost and Sold CTEs.

Creating a recursive common table expression

One of the most valuable features of a CTE in SQL Server is its ability to create a recursive query -- a type of query that repeatedly references itself in order to return subsets of data. A recursive query is most commonly used to return hierarchical data. For instance, the Employee table in the AdventureWorks database includes the manager ID of each employee. The manager ID is actually the employee ID of that manager. As a result, the Employee table contains the entire hierarchical direct reports structure from the CEO on down.

You can define a CTE to retrieve this hierarchical structure by creating a CTE query that uses a UNION ALL, UNION, INTERSECT, or EXCEPT operator to join multiple SELECT statements. The best way to show you how this works is through an example.

In this WITH clause, the CTE query includes two SELECT statements joined by a UNION ALL operator:

WITH Reports (EmpLevel, EmpID, ContactID, MgrID)
AS
(
SELECT 1, EmployeeID, ContactID, ManagerID
FROM HumanResources.Employee
WHERE ManagerID IS NULL
UNION ALL
SELECT r.EmpLevel + 1, e.EmployeeID, e.ContactID,
e.ManagerID
FROM HumanResources.Employee e
INNER JOIN Reports r
ON e.ManagerID = r.EmpID
)
SELECT r.EmpLevel, r.EmpID,
c.FirstName + ' ' + c.LastName ASEmpName,
MgrID
FROM Reports r
INNER JOIN Person.Contact c
ON r.ContactID = c.ContactID
ORDER BY r.EmpLevel, r.MgrID, r.EmpID

The first SELECT statement in the CTE query retrieves only the top-level employee -- the CEO. To select the top-level employee, use a WHERE clause that specifies that the ManagerID value is null.

Tips on tuning SQL Server performance:

 In other words, this is the person who does not directly report to another manager. Note: The first column in the SELECT statement is 1. This is used to designate that the employee returned by this query is at the top level, Level 1.

The second SELECT statement (after the UNION ALL operator) joins the Employee table to the Reports CTE itself based on the manager and employee IDs. By self-referencing the CTE in this way, SQL Server automatically treats this as a recursive query and repeats the query as many times as necessary to return each level of employees. Every time the query runs, the first column adds 1 to the value so that each level is incremented by 1.

The SELECT statement that follows the WITH clause joins the Reports CTE to the Contact table to retrieve the employees' name. The following query results show a sample of the data returned by this statement:

 

 
EmpLevel EmpID EmpName MgrID
1 109 Ken Sanchez NULL
2 6 David Bradley 109
2 12 Terri Duffy 109
2 42 Jean Trenary 109
2 140 Laura Norman 109
2 148 James Hamilton 109
2 273 Brian Welcker 109
3 2 Kevin Brown 6
3 46 Sariya Harnpadoungsataya 6
3 106 Mary Gibson 6
3 119 Jill Williams 6
3 203 Terry Eminhizer 6
3 269 Wanida Benshoof 6
3 271 John Wood 6
3 272 Mary Dempsey 6
3 3 Roberto Tamburello 12
3 66 Janaina Bueno 42
3 102 Dan Bacon 42
3 117 François Ajenstat 42
3 128 Dan Wilson 42
3 149 Ramesh Meyyappan 42
3 150 Stephanie Conroy 42
3 176 Karen Berg 42
3 30 Paula Barreto de Mattos 140
3 71 Wendy Kahn 140
3 103 David Barber 140
3 139 David Liu 140
3 21 Peter Krebs 148
3 44 A. Scott Wright 148
3 200 Hazem Abolrous 148
3 218 Gary Altman 148
3 268 Stephen Jiang 273
3 284 Amy Alberts 273
3 288 Syed Abbas 273
4 4 Rob Walters 3
4 9 Gail Erickson 3
4 11 Jossef Goldberg 3

The results are listed according to the employee's level. Notice that the second through seventh rows show a MgrID value of 109, which is the ID of the top-level employee shown in the first row. The subsequent rows reflect the same hierarchical nature of the data.

Recursive CTEs, like other common table expressions in SQL Server, provide powerful tools for retrieving data. Unlike views, you don't have to persist the metadata. And unlike derived tables, you don't have to repeat code unnecessarily. CTEs help simplify complex code by letting you more easily separate your code into discrete units. And when it comes to recursive queries, CTEs can't be beat. When you first start using CTEs, you might have to play around a little to become comfortable with them, but once you do, you'll never go back.

ABOUT THE AUTHOR
Robert Sheldon is a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. You can find more information at  http://www.rhsheldon.com.

This was first published in February 2008

Dig deeper on SQL-Transact SQL (T-SQL)

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchBusinessAnalytics

SearchDataCenter

SearchDataManagement

SearchAWS

SearchOracle

SearchContentManagement

SearchWindowsServer

Close