Navigate SQL Server hierarchies using recursive common table expressions

Modeling a hierarchical chart for your company, such as an employee table, may seem simple -- but the simplest data structures are typically the most difficult to query. Learn how to use common table expressions to get around this problem.

Our world is, by and large, composed of hierarchies. Within both the natural and business worlds, we see different

types of entities related to each other in a hierarchical fashion. Universes are made up of solar systems, which are made up of planets, each with moons. Living things are classified scientifically using the hierarchy of domain, kingdom, phylum, class, order, family, genus and species. And the populated areas of our planet are hierarchically classified using designators such as countries, states, counties, cities and postal codes.

In our business databases, we know different hierarchies. They might involve classes of salable items, narrowing in scope from broad categories to more specific categories, and eventually resolving into SKUs. The most common example within a database context is probably that of employees within an organization. Virtually every firm has some sort of organizational chart and -- matrix and other nouveau management techniques not withstanding -- these charts are hierarchical in nature. Modeling them seems so very simple.

Adjacency lists

This simple technique for modeling hierarchies is known as an adjacency list:


CREATE TABLE Employee
(
 EmployeeID INT NOT NULL 
  PRIMARY KEY,
 ManagerID INT
  REFERENCES Employee (EmployeeID)
)

It's the most straightforward way of representing a tree. Yet, as most database professionals have discovered, these seemingly simplistic data structures can be maddeningly difficult to query. For instance, given the above table, how would one determine if a given employee was a subordinate of a manager several levels up? Or how would a developer write a query to pull back the entire tree, in an order suitable for creation of an organizational chart? The following query does just that, using the Employee table in the SQL Server 2005 Adventure Works sample database:


SELECT
 COALESCE(Level5, Level4, Level3, Level2, Level1) AS EmployeeID,
 CASE COALESCE(Level5, Level4, Level3, Level2, Level1)
  WHEN Level1 THEN 1
  WHEN Level2 THEN 2
  WHEN Level3 THEN 3
  WHEN Level4 THEN 4
  WHEN Level5 THEN 5
 END AS Level
FROM
(
SELECT 
 CEO.EmployeeID AS Level1,
 CASE 
  WHEN CEO.EmployeeID = SeniorManagement.EmployeeID THEN NULL
  ELSE SeniorManagement.EmployeeID
 END AS Level2, 
 CASE
  WHEN SeniorManagement.EmployeeID = MiddleManagement.EmployeeID THEN NULL
  ELSE MiddleManagement.EmployeeID 
 END AS Level3, 
 CASE
  WHEN MiddleManagement.EmployeeID = JuniorManagement.EmployeeID THEN NULL
  ELSE JuniorManagement.EmployeeID 
 END AS Level4,
 CASE
  WHEN JuniorManagement.EmployeeID = PeopleWhoDoWork.EmployeeID THEN NULL
  ELSE PeopleWhoDoWork.EmployeeID 
 END AS Level5
FROM HumanResources.Employee CEO
LEFT JOIN HumanResources.Employee SeniorManagement ON SeniorManagement.ManagerID = CEO.EmployeeID
 OR CEO.EmployeeID = SeniorManagement.EmployeeID
LEFT JOIN HumanResources.Employee MiddleManagement ON (MiddleManagement.ManagerID = SeniorManagement.EmployeeID
 OR SeniorManagement.EmployeeID = MiddleManagement.EmployeeID)
 AND CEO.EmployeeID <> SeniorManagement.EmployeeID
LEFT JOIN HumanResources.Employee JuniorManagement ON (JuniorManagement.ManagerID = MiddleManagement.EmployeeID
 OR MiddleManagement.EmployeeID = JuniorManagement.EmployeeID)
 AND MiddleManagement.EmployeeID <> SeniorManagement.EmployeeID
LEFT JOIN HumanResources.Employee PeopleWhoDoWork ON (PeopleWhoDoWork.ManagerID = JuniorManagement.EmployeeID
 OR JuniorManagement.EmployeeID = PeopleWhoDoWork.EmployeeID)
 AND JuniorManagement.EmployeeID <> MiddleManagement.EmployeeID
WHERE CEO.ManagerID IS NULL
) ManagementTree
ORDER BY
 ManagementTree.Level1,
 ManagementTree.Level2,
 ManagementTree.Level3,
 ManagementTree.Level4,
 ManagementTree.Level5

This query is obviously quite complex -- but it's also nowhere near perfect. The main issue, aside from that of readability, is maintainability. What will happen if senior management decides that a new level of managers-in-training should be added between the junior managers (see the correlation name JuniorManagement) and the lowest level employees (using the correlation name PeopleWhoDoWork)? Other than making working conditions even worse for those on the bottom rungs, this decision will mean that a developer will have to edit this query, adding in yet another join to the Employee table in order to facilitate getting the data for the new level. This is hardly an optimal solution.

Getting around adjacency list limitations

A variety of techniques have been created for getting around these and other limitations of adjacency lists. Itzik Ben-Gan wrote an excellent article on representing hierarchies using materialized paths. Joe Celko popularized a technique called the Nested Sets Model. And more recently, Vadim Tropashko created a technique called the Nested Intervals Model. Although each of these techniques greatly improves upon adjacency lists when it comes to querying, none are as simple or easy to manage when it comes to actually populating and maintaining the data.

The best solution would be one that would allow data modelers to make use of the simplicity of adjacency lists, but ease the pain of querying them. Recursive Common Table Expressions (CTEs) are a new feature in SQL Server 2005 that meets these requirements. A common table expression is a construct very similar to a derived table. The following two queries, the first using a derived table and the second a (non-recursive) CTE, are functionally equivalent:

SELECT AVG(BonusTotal)
FROM
(
 SELECT 
  SUM(Bonus) AS BonusTotal
 FROM Sales.SalesPerson
 GROUP BY TerritoryID
) SalesBonusTotals

WITH SalesBonusTotals
AS
(
 SELECT 
  SUM(Bonus) AS BonusTotal
 FROM Sales.SalesPerson
 GROUP BY TerritoryID
)
SELECT AVG(BonusTotal)
FROM SalesBonusTotals

Both of these queries compute the average total bonus per territory of the sales team in the Adventure Works database. The common table expression in the second query acts just like the derived table in the first, and can be referenced by its correlation name (SalesBonusTotals) throughout the query. In this way, derived tables and CTEs are virtually identical.

However, CTEs have a feature that derived tables do not: The ability to recursively refer to themselves. This recursion is specified by using a UNION ALL within the CTE. The second query in the UNION ALL should refer to the CTE itself, using it as a key to the next level of recursion. This is best shown by example. The following query can be used to find the EmployeeID and ManagerID of the CEO of Adventure Works:

SELECT
 EmployeeID
FROM HumanResources.Employee
WHERE ManagerID IS NULL

We know that any employee without a manager must be the CEO of the firm -- the top level of the hierarchy represented in the Employee table. To find all of the subordinates of this top level employee (every employee in the company), we can the following recursive CTE:

WITH EmployeeCTE 
AS
(
 SELECT
  EmployeeID
 FROM HumanResources.Employee
 WHERE ManagerID IS NULL

 UNION ALL

 SELECT
  E.EmployeeID
 FROM HumanResources.Employee E
 JOIN EmployeeCTE x ON x.EmployeeID = E.ManagerID
)
SELECT EmployeeID
FROM EmployeeCTE

The results of this query aren't too interesting; all it does is display a list of EmployeeIDs from the Employee table. But what is interesting is what it's actually doing. The query first gets the CEO's EmployeeID, then calls the recursive piece, querying for every employee managed by the CEO (x.EmployeeID = E.ManagerID). For each row returned by that recursive call, the same logic is repeated again and again until the recursion is terminated by a lack of rows returned in the recursive step.

This query can be extended. For instance, we might want to know the level of each employee within the hierarchy:

WITH EmployeeCTE 
AS
(
 SELECT
  EmployeeID,
  1 AS Level
 FROM HumanResources.Employee
 WHERE ManagerID IS NULL

 UNION ALL

 SELECT
  E.EmployeeID,
  x.Level + 1 AS Level
 FROM HumanResources.Employee E
 JOIN EmployeeCTE x ON x.EmployeeID = E.ManagerID
)
SELECT 
 EmployeeID,
 Level
FROM EmployeeCTE

In this case, the level of the CEO defaults to 1. Each time the recursive step is called, the level will be incremented for those rows.

At this point, the query is returning almost the same data as the flawed query above, yet it is much more readable and will require no maintenance when management makes an asinine decision. The query will dynamically adapt to new levels in the hierarchy without a problem.

The only issue left to solve is that the order of the rows returned by this query may not be quite right for use in an organizational chart. The rows should be returned in the order as if one were walking the tree from the root (the CEO), and visiting each node (employee). This can be solved by taking a cue from Ben-Gan's article and materializing a path as the CTE works:

WITH EmployeeCTE 
AS
(
 SELECT
  EmployeeID,
  1 AS Level,
  CONVERT(VARBINARY(MAX), EmployeeID) AS thePath
 FROM HumanResources.Employee
 WHERE ManagerID IS NULL

 UNION ALL

 SELECT
  E.EmployeeID,
  x.Level + 1 AS Level,
  x.thePath + CONVERT(VARBINARY(MAX), E.EmployeeID) AS thePath
 FROM HumanResources.Employee E
 JOIN EmployeeCTE x ON x.EmployeeID = E.ManagerID
)
SELECT
 EmployeeID,
 Level
FROM EmployeeCTE
ORDER BY thePath

By converting the EmployeeID into binary format and appending on each recursion, the query forms a trail of visited nodes. Each path representation will begin with 0x0000006D, the binary representation of 109, the Adventure Works CEO's EmployeeID. Following that will be the binary for each EmployeeID down the hierarchy, thereby creating an ordered representation that can be used in the ORDER BY clause to produce the correct results.

Using this technique, any tree or hierarchy represented by an adjacency list can easily be navigated, without using complex techniques such as the Nested Sets or Nested Intervals models. Given that hierarchical data makes up such a large part of our world, this is a very powerful feature for SQL Server 2005!

About the author: Adam Machanic is a senior database engineer for GetConnected Inc. He has several years of experience developing a variety of applications using SQL Server as a data repository and is active on numerous online technical forums. He is a Microsoft Certified Professional and a SQL Server MVP. Machanic, who also serves as our SQL Server 2005 expert, welcomes your questions.


More information from SearchSQLServer.com

  • Tip: Nested queries vs. temporary tables
  • Ask the Experts: Joining a derived table
  • Ask the Experts: Using a derived table instead of a view


  • This was first published in July 2005

    Dig deeper on SQL Server Database Modeling and Design

    Pro+

    Features

    Enjoy the benefits of Pro+ membership, learn more and join.

    0 comments

    Oldest 

    Forgot Password?

    No problem! Submit your e-mail address below. We'll send you an email containing your password.

    Your password has been sent to:

    SearchBusinessAnalytics

    SearchDataCenter

    SearchDataManagement

    SearchAWS

    SearchOracle

    SearchContentManagement

    SearchWindowsServer

    Close