One of the new Transact-SQL features implemented in SQL Server 2008 is the GROUPING SETS option in the GROUP BY clause of a SELECT statement. Compared with the ROLLUP and CUBE options, GROUPING SETS allows for greater control over how data is grouped into aggregated sets of information.
To understand how GROUPING SETS works, let's first review the
Requires Free Membership to View
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND
(SalesYTD > 3000000)
GROUP BY TerritoryName, LastName
ORDER BY TerritoryName, LastName
The statement retrieves data from the vSalesPerson view and groups it by the TerritoryName and LastName columns. For each group, the statement uses the SUM function to add up the amounts in the SalesYTD column to provide the total sales. The following table shows the query result, and as you can see, a total is provided for each territory/name set.
| TerritoryName | LastName | YTDSales |
| Central | Carson | 3857163.6332 |
| Northeast | Blythe | 4557045.0459 |
| Northwest | Campbell | 3587378.4257 |
| Southwest | Ito | 3018725.4858 |
| Southwest | Mitchell | 5200475.2313 |
Although this information is useful, it does not provide the total sales for all territories or the totals for individual territories. To retrieve this sort of data, you can use the ROLLUP option to calculate the additional amounts, as shown in the following statement:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND
(SalesYTD > 3000000)
GROUP BY ROLLUP (TerritoryName, LastName)
ORDER BY TerritoryName, LastName
Notice that the ROLLUP option specifies the TerritoryName and LastName columns. Because the query includes both columns, the results will contain a rollup of the sales amount for each territory/name set. In addition, the results will also include the total sales for all territories and the totals for the individual territories, as you can see in the following table.
| TerritoryName | LastName | YTDSales |
| NULL | NULL | 20220787.8219 |
| Central | NULL | 3857163.6332 |
| Central | Carson | 3857163.6332 |
| Northeast | NULL | 4557045.0459 |
| Northeast | Blythe | 4557045.0459 |
| Northwest | NULL | 3587378.4257 |
| Northwest | Campbell | 3587378.4257 |
| Southwest | NULL | 8219200.7171 |
| Southwest | Ito | 3018725.4858 |
| Southwest | Mitchell | 5200475.2313 |
You can include additional aggregations by using the CUBE option rather than ROLLUP, as shown in the following SELECT statement:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND
(SalesYTD > 3000000)
GROUP BY CUBE (TerritoryName, LastName)
ORDER BY TerritoryName, LastName
The results will now include totals for the individual salesperson, as well as the individual territory (which is shown in the following table). If a salesperson had been associated with multiple territories, those totals would be reflected here. In this case, however, the totals are the same as the totals for the territory/salesperson sets.
| TerritoryName | LastName | YTDSales |
| NULL | NULL | 20220787.8219 |
| NULL | Blythe | 4557045.0459 |
| NULL | Campbell | 3587378.4257 |
| NULL | Carson | 3857163.6332 |
| NULL | Ito | 3018725.4858 |
| NULL | Mitchell | 5200475.2313 |
| Central | NULL | 3857163.6332 |
| Central | Carson | 3857163.6332 |
| Northeast | NULL | 4557045.0459 |
| Northeast | Blythe | 4557045.0459 |
| Northwest | NULL | 3587378.4257 |
| Northwest | Campbell | 3587378.4257 |
| Southwest | NULL | 8219200.7171 |
| Southwest | Ito | 3018725.4858 |
| Southwest | Mitchell | 5200475.2313 |
One last thing to note about the ROLLUP and CUBE options is that the syntax used here was introduced in SQL Server 2008. The original syntax is still supported, but the new syntax adheres to the latest ANSI (or American National Standards Institute) standards.
Now that you have an overview of how the GROUP BY clause aggregated data prior to SQL Server 2008, let's look at the new GROUPING SETS option. The following SELECT statement uses the option to define four distinct sets:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND
(SalesYTD > 3000000)
GROUP BY GROUPING SETS
((TerritoryName, LastName), TerritoryName, LastName, ())
ORDER BY TerritoryName, LastName
The first set is the combination of the TerritoryName and LastName columns. Notice that they're enclosed in their own set of parentheses: As a result, a sum will be provided for each territory/name set. The next two grouping sets are the same columns listed individually. This means that a total will be provided for each territory and name. The final set is a set of empty parentheses, which indicates that a total will be provided for all sales.
As it turns out, the GROUPING SETS option in this statement returns the same results as the previous CUBE example. This is because I've specified the same set of columns, both individually and together. In other words, for the two columns specified in both statements, all possible combinations are aggregated. The CUBE operator acts as a shorthand for specifying every possible combination in GROUPING SETS.
The advantage of the GROUPING SETS option, however, is that you can specify any combination of columns. The following statement, for example, uses GROUPING SETS to define only three sets:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND
(SalesYTD > 3000000)
GROUP BY GROUPING SETS
((TerritoryName, LastName), TerritoryName, ())
ORDER BY TerritoryName, LastName
Unlike the preceding example, the GROUPING SETS option does not include a LastName set. Interestingly, the results returned by this statement will be the same as those returned by the ROLLUP example above. In this case, ROLLUP acts as shorthand for this particular combination of sets, in which the second column name in the first set (in this case, LastName) is not included as a standalone set.
However, the GROUPING SETS groupings are not limited to combinations that must match either CUBE or ROLLUP, which is why GROUPING SETS is an important new feature. In the following example, GROUPING SETS includes only the TerritoryName/LastName set and the total (empty parentheses) set:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND
(SalesYTD > 3000000)
GROUP BY GROUPING SETS ((TerritoryName, LastName), ())
ORDER BY TerritoryName, LastName
Now your results will include only total sales for each territory/name pair and for all sales, as shown in the following table:
| TerritoryName | LastName | YTDSales |
| NULL | NULL | 20220787.8219 |
| Central | Carson | 3857163.6332 |
| Northeast | Blythe | 4557045.0459 |
| Northwest | Campbell | 3587378.4257 |
| Southwest | Ito | 3018725.4858 |
| Southwest | Mitchell | 5200475.2313 |
Except for the query that returns the total for all sales, these results are very similar to a basic GROUP BY clause. If you were to drop the total set from the GROUPING SETS option, as shown in the following statement, your results would be the same as a basic GROUP BY clause:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND
(SalesYTD > 3000000)
GROUP BY GROUPING SETS ((TerritoryName, LastName))
ORDER BY TerritoryName, LastName
But your goal, of course, is not to try to duplicate a basic GROUP BY clause any more than it is to try to duplicate the ROLLUP or CUBE options. In fact, the power of the GROUPING SETS option becomes more apparent when you're aggregations become more complex. For example, the following SELECT statement uses the GROUPING SETS option to define three groups, one of which includes three columns:
SELECT CountryRegionName, TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE SalesYTD > 3000000
GROUP BY GROUPING SETS ((CountryRegionName, TerritoryName, LastName),
CountryRegionName, TerritoryName)
ORDER BY CountryRegionName, TerritoryName, LastName
The first set is based on the CountryRegionName, TerritoryName, and LastName columns, so the total amount of sales will be provided for each region/territory/name set. The next set of totals is based on the CountryRegionName column only, so totals will be provided for each region. The same is true for territories. The following table shows the results generated by this statement.
| TerritoryName | LastName | YTDSales | |
| NULL | Canada | NULL | 3189356.2465 |
| NULL | Central | NULL | 3857163.6332 |
| NULL | France | NULL | 3827950.238 |
| NULL | Northeast | NULL | 4557045.0459 |
| NULL | Northwest | NULL | 3587378.4257 |
| NULL | Southwest | NULL | 8219200.7171 |
| NULL | United Kingdom | NULL | 5015682.3752 |
| Canada | NULL | NULL | 3189356.2465 |
| Canada | Canada | Saraiva | 3189356.2465 |
| France | NULL | NULL | 3827950.238 |
| France | France | Varkey Chudukatil | 3827950.238 |
| United Kingdom | NULL | NULL | 5015682.3752 |
| United Kingdom | United Kingdom | Pak | 5015682.3752 |
| United States | NULL | NULL | 20220787.8219 |
| United States | Central | Carson | 3857163.6332 |
| United States | Northeast | Blythe | 4557045.0459 |
| United States | Northwest | Campbell | 3587378.4257 |
| United States | Southwest | Ito | 3018725.4858 |
| United States | Southwest | Mitchell | 5200475.2313 |
As you can see, the GROUPING SETS option provides you greater control over how you group data in order to aggregate related values. Prior to SQL Server 2008, you would have had to create a complex set of UNION ALL statements to achieve the same results. Now, you need to create only one statement. You should continue to use the ROLLUP and CUBE options when it makes sense, because the syntax is simpler, but when you need to return specific sets of data that fall outside the range of ROLLUP and CUBE, you'll find the GROUPING SETS option to be both useful and easy to use.
ABOUT THE AUTHOR
Robert Sheldon is a technical consultant and the author of numerous books, articles, and
training material related to Microsoft Windows, various relational database management systems, and
business intelligence design and implementation. You can find more information at http://www.rhsheldon.com.
This was first published in May 2009

Join the conversationComment
Share
Comments
Results
Contribute to the conversation