One of the new Transact-SQL features implemented in SQL Server 2008 is the GROUPING SETS option in the GROUP BY clause of a SELECT statement. Compared with the ROLLUP and CUBE options, GROUPING SETS allows for greater control over how data is grouped into aggregated sets of information.
To understand how GROUPING SETS works, let's first review the GROUP BY clause. As you're no doubt aware, this clause allows you to group data together in order to aggregate specific sets of data associated with those groups. The following SELECT statement, for example, uses a GROUP BY clause to aggregate data from the AdventureWorks2008 database:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY TerritoryName, LastName
ORDER BY TerritoryName, LastName
The statement retrieves data from the vSalesPerson view and groups it by the TerritoryName and LastName columns. For each group, the statement uses the SUM function to add up the amounts in the SalesYTD column to provide the total sales. The following table shows the query result, and as you can see, a total is provided for each territory/name set.
TerritoryName
LastName
YTDSales
Central
Carson
3857163.6332
Northeast
Blythe
4557045.0459
Northwest
Campbell
3587378.4257
Southwest
Ito
3018725.4858
Southwest
Mitchell
5200475.2313
Although this information is useful, it does not provide the total sales for all territories or the totals for individual territories. To retrieve this sort of data, you can use the ROLLUP ...
To continue reading for free, register below or login
To read more you must become a member of SearchSQLServer.com
');
// -->

option to calculate the additional amounts, as shown in the following statement:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY ROLLUP (TerritoryName, LastName)
ORDER BY TerritoryName, LastName
Notice that the ROLLUP option specifies the TerritoryName and LastName columns. Because the query includes both columns, the results will contain a rollup of the sales amount for each territory/name set. In addition, the results will also include the total sales for all territories and the totals for the individual territories, as you can see in the following table.
TerritoryName
LastName
YTDSales
NULL
NULL
20220787.8219
Central
NULL
3857163.6332
Central
Carson
3857163.6332
Northeast
NULL
4557045.0459
Northeast
Blythe
4557045.0459
Northwest
NULL
3587378.4257
Northwest
Campbell
3587378.4257
Southwest
NULL
8219200.7171
Southwest
Ito
3018725.4858
Southwest
Mitchell
5200475.2313
You can include additional aggregations by using the CUBE option rather than ROLLUP, as shown in the following SELECT statement:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY CUBE (TerritoryName, LastName)
ORDER BY TerritoryName, LastName
The results will now include totals for the individual salesperson, as well as the individual territory (which is shown in the following table). If a salesperson had been associated with multiple territories, those totals would be reflected here. In this case, however, the totals are the same as the totals for the territory/salesperson sets.
TerritoryName
LastName
YTDSales
NULL
NULL
20220787.8219
NULL
Blythe
4557045.0459
NULL
Campbell
3587378.4257
NULL
Carson
3857163.6332
NULL
Ito
3018725.4858
NULL
Mitchell
5200475.2313
Central
NULL
3857163.6332
Central
Carson
3857163.6332
Northeast
NULL
4557045.0459
Northeast
Blythe
4557045.0459
Northwest
NULL
3587378.4257
Northwest
Campbell
3587378.4257
Southwest
NULL
8219200.7171
Southwest
Ito
3018725.4858
Southwest
Mitchell
5200475.2313
One last thing to note about the ROLLUP and CUBE options is that the syntax used here was introduced in SQL Server 2008. The original syntax is still supported, but the new syntax adheres to the latest ANSI (or American National Standards Institute) standards.
Now that you have an overview of how the GROUP BY clause aggregated data prior to SQL Server 2008, let's look at the new GROUPING SETS option. The following SELECT statement uses the option to define four distinct sets:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY GROUPING SETS
((TerritoryName, LastName), TerritoryName, LastName, ())
ORDER BY TerritoryName, LastName
The first set is the combination of the TerritoryName and LastName columns. Notice that they're enclosed in their own set of parentheses: As a result, a sum will be provided for each territory/name set. The next two grouping sets are the same columns listed individually. This means that a total will be provided for each territory and name. The final set is a set of empty parentheses, which indicates that a total will be provided for all sales.
As it turns out, the GROUPING SETS option in this statement returns the same results as the previous CUBE example. This is because I've specified the same set of columns, both individually and together. In other words, for the two columns specified in both statements, all possible combinations are aggregated. The CUBE operator acts as a shorthand for specifying every possible combination in GROUPING SETS.
The advantage of the GROUPING SETS option, however, is that you can specify any combination of columns. The following statement, for example, uses GROUPING SETS to define only three sets:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY GROUPING SETS
((TerritoryName, LastName), TerritoryName, ())
ORDER BY TerritoryName, LastName
Unlike the preceding example, the GROUPING SETS option does not include a LastName set. Interestingly, the results returned by this statement will be the same as those returned by the ROLLUP example above. In this case, ROLLUP acts as shorthand for this particular combination of sets, in which the second column name in the first set (in this case, LastName) is not included as a standalone set.
However, the GROUPING SETS groupings are not limited to combinations that must match either CUBE or ROLLUP, which is why GROUPING SETS is an important new feature. In the following example, GROUPING SETS includes only the TerritoryName/LastName set and the total (empty parentheses) set:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY GROUPING SETS ((TerritoryName, LastName), ())
ORDER BY TerritoryName, LastName
Now your results will include only total sales for each territory/name pair and for all sales, as shown in the following table:
TerritoryName
LastName
YTDSales
NULL
NULL
20220787.8219
Central
Carson
3857163.6332
Northeast
Blythe
4557045.0459
Northwest
Campbell
3587378.4257
Southwest
Ito
3018725.4858
Southwest
Mitchell
5200475.2313
Except for the query that returns the total for all sales, these results are very similar to a basic GROUP BY clause. If you were to drop the total set from the GROUPING SETS option, as shown in the following statement, your results would be the same as a basic GROUP BY clause:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY GROUPING SETS ((TerritoryName, LastName))
ORDER BY TerritoryName, LastName
But your goal, of course, is not to try to duplicate a basic GROUP BY clause any more than it is to try to duplicate the ROLLUP or CUBE options. In fact, the power of the GROUPING SETS option becomes more apparent when you're aggregations become more complex. For example, the following SELECT statement uses the GROUPING SETS option to define three groups, one of which includes three columns:
SELECT CountryRegionName, TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE SalesYTD > 3000000
GROUP BY GROUPING SETS ((CountryRegionName, TerritoryName, LastName),
CountryRegionName, TerritoryName)
ORDER BY CountryRegionName, TerritoryName, LastName
The first set is based on the CountryRegionName, TerritoryName, and LastName columns, so the total amount of sales will be provided for each region/territory/name set. The next set of totals is based on the CountryRegionName column only, so totals will be provided for each region. The same is true for territories. The following table shows the results generated by this statement.
TerritoryName
LastName
YTDSales
NULL
Canada
NULL
3189356.2465
NULL
Central
NULL
3857163.6332
NULL
France
NULL
3827950.238
NULL
Northeast
NULL
4557045.0459
NULL
Northwest
NULL
3587378.4257
NULL
Southwest
NULL
8219200.7171
NULL
United Kingdom
NULL
5015682.3752
Canada
NULL
NULL
3189356.2465
Canada
Canada
Saraiva
3189356.2465
France
NULL
NULL
3827950.238
France
France
Varkey Chudukatil
3827950.238
United Kingdom
NULL
NULL
5015682.3752
United Kingdom
United Kingdom
Pak
5015682.3752
United States
NULL
NULL
20220787.8219
United States
Central
Carson
3857163.6332
United States
Northeast
Blythe
4557045.0459
United States
Northwest
Campbell
3587378.4257
United States
Southwest
Ito
3018725.4858
United States
Southwest
Mitchell
5200475.2313
As you can see, the GROUPING SETS option provides you greater control over how you group data in order to aggregate related values. Prior to SQL Server 2008, you would have had to create a complex set of UNION ALL statements to achieve the same results. Now, you need to create only one statement. You should continue to use the ROLLUP and CUBE options when it makes sense, because the syntax is simpler, but when you need to return specific sets of data that fall outside the range of ROLLUP and CUBE, you'll find the GROUPING SETS option to be both useful and easy to use.
ABOUT THE AUTHOR:
[IMAGE]
Robert Sheldon is a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. His books include Beginning MySQL (part of the Wrox Programmer-to-Programmer series), SQL: A Beginner's Guide (based on the SQL:1999 standard), MCSE Training Kit: Designing Highly Available Web Solutions with Microsoft Windows 2000 Server Technologies, and MCSE Training Kit: Microsoft SQL Server 2000 Database Design and Implementation. You can find more information at http://www.rhsheldon.com.