Home > SQL Server Tips > Database Administrator > Navigate hierarchies using recursive common table expressions
SQL Server Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

DATABASE ADMINISTRATOR

Navigate hierarchies using recursive common table expressions


Adam Machanic
07.14.2005
Rating: --- (out of 5)


Expert advice on database administration
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


Our world is, by and large, composed of hierarchies. Within both the natural and business worlds, we see different types of entities related to each other in a hierarchical fashion. Universes are made up of solar systems, which are made up of planets, each with moons. Living things are classified scientifically using the hierarchy of domain, kingdom, phylum, class, order, family, genus and species. And the populated areas of our planet are hierarchically classified using designators such as countries, states, counties, cities and postal codes.

In our business databases, we know different hierarchies. They might involve classes of salable items, narrowing in scope from broad categories to more specific categories, and eventually resolving into SKUs. The most common example within a database context is probably that of employees within an organization. Virtually every firm has some sort of organizational chart and -- matrix and other nouveau management techniques not withstanding -- these charts are hierarchical in nature. Modeling them seems so very simple.

Adjacency lists

This simple technique for modeling hierarchies is known as an adjacency list:


CREATE TABLE Employee
(
 EmployeeID INT NOT NULL 
  PRIMARY KEY,
 ManagerID INT
  REFERENCES Employee (EmployeeID)
)

It's the most straightforward way of representing a tree. Yet, as most database professionals have discovered, these seemingly simplistic data structures can be maddeningly difficult to query. For instance, given the above table, how would one determine if a given employee was a subordinate of a manager several levels up? Or how would a developer write a query to pull back the entire tree, in an order suitable for creation of an organizational chart? The following query does just that, using the Employee table in the SQL Server 2005 Adventure Works sample database:


SELECT
 COALESCE(Level5, Level4, Level3, Level2, Level1) AS EmployeeID,
 CASE COALESCE(Level5, Level4, Level3, Level2, Level1)
  WHEN Level1 THEN 1
  WHEN Level2 THEN 2
  WHEN Level3 THEN 3
  WHEN Level4 THEN 4
  WHEN Level5 THEN 5
 END AS Level
FROM
(
SELECT 
 CEO.EmployeeID AS Level1,
 CASE 
  WHEN CEO.EmployeeID = SeniorManagement.EmployeeID THEN NULL
  ELSE SeniorManagement.EmployeeID
 END AS Level2, 
 CASE
  WHEN SeniorManagement.EmployeeID = MiddleManagement.EmployeeID THEN NULL
  ELSE MiddleManagement.EmployeeID 
 END AS Level3, 
 CASE
  WHEN MiddleManagement.EmployeeID = JuniorManagement.EmployeeID THEN NULL
  ELSE JuniorManagement.EmployeeID 
 END AS Level4,
 CASE
  WHEN JuniorManagement.EmployeeID = PeopleWhoDoWork.EmployeeID THEN NULL
  ELSE PeopleWhoDoWork.EmployeeID 
 END AS Level5
FROM HumanResources.Employee CEO
LEFT JOIN HumanResources.Employee SeniorManagement ON SeniorManagement.ManagerID = CEO.EmployeeID
 OR CEO.EmployeeID = SeniorManagement.EmployeeID
LEFT JOIN HumanResources.Employee MiddleManagement ON (MiddleManagement.ManagerID = SeniorManagement.EmployeeID
 OR SeniorManagement.EmployeeID = MiddleManagement.EmployeeID)
 AND CEO.EmployeeID <> SeniorManagement.EmployeeID
LEFT JOIN HumanResources.Employee JuniorManagement ON (JuniorManagement.ManagerID = MiddleManagement.EmployeeID
 OR MiddleManagement.EmployeeID = JuniorManagement.EmployeeID)
 AND MiddleManagement.EmployeeID <> SeniorManagement.EmployeeID
LEFT JOIN HumanResources.Employee PeopleWhoDoWork ON (PeopleWhoDoWork.ManagerID = JuniorManagement.EmployeeID
 OR JuniorManagement.EmployeeID = PeopleWhoDoWork.EmployeeID)
 AND JuniorManagement.EmployeeID <> MiddleManagement.EmployeeID
WHERE CEO.ManagerID IS NULL
) ManagementTree
ORDER BY
 ManagementTree.Level1,
 ManagementTree.Level2,
 ManagementTree.Level3,
 ManagementTree.Level4,
 ManagementTree.Level5

This query is obviously quite complex -- but it's also nowhere near perfect. The main issue, aside from that of readability, is maintainability. What will happen if senior management decides that a new level of managers-in-training should be added between the junior managers (see the correlation name JuniorManagement) and the lowest level employees (using the correlation name PeopleWhoDoWork)? Other than making working conditions even worse for those on the bottom rungs, this decision will mean that a developer will have to edit this query, adding in yet another join to the Employee table in order to facilitate getting the data for the new level. This is hardly an optimal solution.

Getting around adjacency list limitations

A variety of techniques have been created for getting around these and other limitations of adjacency lists. Itzik Ben-Gan wrote an excellent article on representing hierarchies using materialized paths. Joe Celko popularized a technique called the Nested Sets Model. And more recently, Vadim Tropashko created a technique called the Nested Intervals Model. Although each of these techniques greatly improves upon adjacency lists when it comes to querying, none are as simple or easy to manage when it comes to actually populating and maintaining the data.

The best solution would be one that would allow data modelers to make use of the simplicity of adjacency lists, but ease the pain of querying them. Recursive Common Table Expressions (CTEs) are a new feature in SQL Server 2005 that meets these requirements. A common table expression is a construct very similar to a derived table. The following two queries, the first using a derived table and the second a (non-recursive) CTE, are functionally equivalent:

SELECT AVG(BonusTotal)
FROM
(
 SELECT 
  SUM(Bonus) AS BonusTotal
 FROM Sales.SalesPerson
 GROUP BY TerritoryID
) SalesBonusTotals

WITH SalesBonusTotals
AS
(
 SELECT 
  SUM(Bonus) AS BonusTotal
 FROM Sales.SalesPerson
 GROUP BY TerritoryID
)
SELECT AVG(BonusTotal)
FROM SalesBonusTotals

Both of these queries compute the average total bonus per territory of the sales team in the Adventure Works database. The common table expression in the second query acts just like the derived table in the first, and can be referenced by its correlation name (SalesBonusTotals) throughout the query. In this way, derived tables and CTEs are virtually identical.

However, CTEs have a feature that derived tables do not: The ability to recursively refer to themselves. This recursion is specified by using a UNION ALL within the CTE. The second query in the UNION ALL should refer to the CTE itself, using it as a key to the next level of recursion. This is best shown by example. The following query can be used to find the EmployeeID and ManagerID of the CEO of Adventure Works:

SELECT
 EmployeeID
FROM HumanResources.Employee
WHERE ManagerID IS NULL

We know that any employee without a manager must be the CEO of the firm -- the top level of the hierarchy represented in the Employee table. To find all of the subordinates of this top level employee (every employee in the company), we can the following recursive CTE:

WITH EmployeeCTE 
AS
(
 SELECT
  EmployeeID
 FROM HumanResources.Employee
 WHERE ManagerID IS NULL

 UNION ALL

 SELECT
  E.EmployeeID
 FROM HumanResources.Employee E
 JOIN EmployeeCTE x ON x.EmployeeID = E.ManagerID
)
SELECT EmployeeID
FROM EmployeeCTE

The results of this query aren't too interesting; all it does is display a list of EmployeeIDs from the Employee table. But what is interesting is what it's actually doing. The query first gets the CEO's EmployeeID, then calls the recursive piece, querying for every employee managed by the CEO (x.EmployeeID = E.ManagerID). For each row returned by that recursive call, the same logic is repeated again and again until the recursion is terminated by a lack of rows returned in the recursive step.

This query can be extended. For instance, we might want to know the level of each employee within the hierarchy:

WITH EmployeeCTE 
AS
(
 SELECT
  EmployeeID,
  1 AS Level
 FROM HumanResources.Employee
 WHERE ManagerID IS NULL

 UNION ALL

 SELECT
  E.EmployeeID,
  x.Level + 1 AS Level
 FROM HumanResources.Employee E
 JOIN EmployeeCTE x ON x.EmployeeID = E.ManagerID
)
SELECT 
 EmployeeID,
 Level
FROM EmployeeCTE

In this case, the level of the CEO defaults to 1. Each time the recursive step is called, the level will be incremented for those rows.

At this point, the query is returning almost the same data as the flawed query above, yet it is much more readable and will require no maintenance when management makes an asinine decision. The query will dynamically adapt to new levels in the hierarchy without a problem.

The only issue left to solve is that the order of the rows returned by this query may not be quite right for use in an organizational chart. The rows should be returned in the order as if one were walking the tree from the root (the CEO), and visiting each node (employee). This can be solved by taking a cue from Ben-Gan's article and materializing a path as the CTE works:

WITH EmployeeCTE 
AS
(
 SELECT
  EmployeeID,
  1 AS Level,
  CONVERT(VARBINARY(MAX), EmployeeID) AS thePath
 FROM HumanResources.Employee
 WHERE ManagerID IS NULL

 UNION ALL

 SELECT
  E.EmployeeID,
  x.Level + 1 AS Level,
  x.thePath + CONVERT(VARBINARY(MAX), E.EmployeeID) AS thePath
 FROM HumanResources.Employee E
 JOIN EmployeeCTE x ON x.EmployeeID = E.ManagerID
)
SELECT
 EmployeeID,
 Level
FROM EmployeeCTE
ORDER BY thePath

By converting the EmployeeID into binary format and appending on each recursion, the query forms a trail of visited nodes. Each path representation will begin with 0x0000006D, the binary representation of 109, the Adventure Works CEO's EmployeeID. Following that will be the binary for each EmployeeID down the hierarchy, thereby creating an ordered representation that can be used in the ORDER BY clause to produce the correct results.

Using this technique, any tree or hierarchy represented by an adjacency list can easily be navigated, without using complex techniques such as the Nested Sets or Nested Intervals models. Given that hierarchical data makes up such a large part of our world, this is a very powerful feature for SQL Server 2005!

About the author: Adam Machanic is a senior database engineer for GetConnected Inc. He has several years of experience developing a variety of applications using SQL Server as a data repository and is active on numerous online technical forums. He is a Microsoft Certified Professional and a SQL Server MVP. Machanic, who also serves as our SQL Server 2005 expert, welcomes your questions.


More information from SearchSQLServer.com

  • Tip: Nested queries vs. temporary tables
  • Ask the Experts: Joining a derived table
  • Ask the Experts: Using a derived table instead of a view


  • Rate this Tip
    To rate tips, you must be a member of SearchSQLServer.com.
    Register now to start rating these tips. Log in if you are already a member.




    Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


    RELATED CONTENT
    SQL Server database design and modeling
    SQL Server database design disasters: How it all starts
    Physical data storage in SQL Server 2005 and 2008
    SQL Server 2008 data types: Datetime, string, user-defined and more
    SQL Server and data manipulation in T-SQL
    Enforcing data integrity in a SQL Server database
    Supertype and subtype tables in SQL Server
    SQL Server database design disasters: What not to do
    Check SQL Server database and log file size with this stored procedure
    SQL Server tempdb best practices increase performance
    FAQ: SQL Server databases how-to

    Database Administrator
    SQL Server database design disasters: How it all starts
    SQL Server database design disasters: What not to do
    How to create a SQL Server linked server to DB2
    Virtual database storage for SQL Server: Friend or foe?
    How to restore SQL Server database to transition server during upgrade
    Storage area network (SAN) basics every SQL Server DBA must know
    SQL Server backups using SAN database snapshots
    Sarbanes-Oxley compliance checklist: IT security and SQL audits
    SQL Server 2005 log shipping setup using the wizard
    Track changes to SQL Server 2000 and 2005 with one simple utility

    SQL/Transact SQL (T-SQL)
    Using DATEADD and DATEDIFF to calculate SQL Server datetime values
    Manipulate column names in a SQL Server table
    SQL Server trigger vs. stored procedure to receive data notification
    Physical data storage in SQL Server 2005 and 2008
    SQL Server 2008 data types: Datetime, string, user-defined and more
    SQL Server and data manipulation in T-SQL
    Enforcing data integrity in a SQL Server database
    Supertype and subtype tables in SQL Server
    Using SQL Server datetime functions GETDATE, DATENAME and DATEPART
    Ordering the results of a SQL query
    SQL/Transact SQL (T-SQL) Research

    RELATED GLOSSARY TERMS
    Terms from Whatis.com − the technology online dictionary
    binary tree  (SearchSQLServer.com)
    block  (SearchSQLServer.com)
    data structure  (SearchSQLServer.com)
    DDBMS  (SearchSQLServer.com)
    entity-relationship model  (SearchSQLServer.com)
    initial extent  (SearchSQLServer.com)
    primary key  (SearchSQLServer.com)
    segment  (SearchSQLServer.com)
    tablespace  (SearchSQLServer.com)
    view  (SearchSQLServer.com)

    RELATED RESOURCES
    2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
    Search Bitpipe.com for the latest white papers and business webcasts
    Whatis.com, the online computer dictionary

    DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.

    HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
    About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
    SEARCH 
    TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

    TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




    All Rights Reserved, Copyright 2005 - 2008, TechTarget | Read our Privacy Policy
      TechTarget - The IT Media ROI Experts