One of the new Transact-SQL features implemented in SQL Server 2008 is the GROUPING SETS option in the GROUP BY clause of a SELECT statement. Compared with the ROLLUP and CUBE options, GROUPING SETS allows for greater control over how data is grouped into aggregated sets of information.
To understand how GROUPING SETS works, let's first review the GROUP BY clause. As you're no doubt aware, this clause allows you to group data together in order to aggregate specific sets of data associated with those groups. The following SELECT statement, for example, uses a GROUP BY clause to aggregate data from the AdventureWorks2008 database:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY TerritoryName, LastName
ORDER BY TerritoryName, LastName
The statement retrieves data from the vSalesPerson view and groups it by the TerritoryName and LastName columns. For each group, the statement uses the SUM function to add up the amounts in the SalesYTD column to provide the total sales. The following table shows the query result, and as you can see, a total is provided for each territory/name set.
| TerritoryName |
LastName |
YTDSales |
| Central |
Carson |
3857163.6332 |
| Northeast |
Blythe |
4557045.0459 |
| Northwest |
Campbell |
3587378.4257 |
| Southwest |
Ito |
3018725.4858 |
| Southwest |
Mitchell |
5200475.2313 |
Although this information is useful, it does not provide the total sales for all territories or the totals for individual territories. To retrieve this sort of data, you can use the ROLLUP option to calculate the additional amounts, as shown in the following statement:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY ROLLUP (TerritoryName, LastName)
ORDER BY TerritoryName, LastName
Notice that the ROLLUP option specifies the TerritoryName and LastName columns. Because the query includes both columns, the results will contain a rollup of the sales amount for each territory/name set. In addition, the results will also include the total sales for all territories and the totals for the individual territories, as you can see in the following table.
| TerritoryName |
LastName |
YTDSales |
| NULL |
NULL |
20220787.8219 |
| Central |
NULL |
3857163.6332 |
| Central |
Carson |
3857163.6332 |
| Northeast |
NULL |
4557045.0459 |
| Northeast |
Blythe |
4557045.0459 |
| Northwest |
NULL |
3587378.4257 |
| Northwest |
Campbell |
3587378.4257 |
| Southwest |
NULL |
8219200.7171 |
| Southwest |
Ito |
3018725.4858 |
| Southwest |
Mitchell |
5200475.2313 |
You can include additional aggregations by using the CUBE option rather than ROLLUP, as shown in the following SELECT statement:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY CUBE (TerritoryName, LastName)
ORDER BY TerritoryName, LastName
The results will now include totals for the individual salesperson, as well as the individual territory (which is shown in the following table). If a salesperson had been associated with multiple territories, those totals would be reflected here. In this case, however, the totals are the same as the totals for the territory/salesperson sets.
| TerritoryName |
LastName |
YTDSales |
| NULL |
NULL |
20220787.8219 |
| NULL |
Blythe |
4557045.0459 |
| NULL |
Campbell |
3587378.4257 |
| NULL |
Carson |
3857163.6332 |
| NULL |
Ito |
3018725.4858 |
| NULL |
Mitchell |
5200475.2313 |
| Central |
NULL |
3857163.6332 |
| Central |
Carson |
3857163.6332 |
| Northeast |
NULL |
4557045.0459 |
| Northeast |
Blythe |
4557045.0459 |
| Northwest |
NULL |
3587378.4257 |
| Northwest |
Campbell |
3587378.4257 |
| Southwest |
NULL |
8219200.7171 |
| Southwest |
Ito |
3018725.4858 |
| Southwest |
Mitchell |
5200475.2313 |
One last thing to note about the ROLLUP and CUBE options is that the syntax used here was introduced in SQL Server 2008. The original syntax is still supported, but the new syntax adheres to the latest ANSI (or American National Standards Institute) standards.
Now that you have an overview of how the GROUP BY clause aggregated data prior to SQL Server 2008, let's look at the new GROUPING SETS option. The following SELECT statement uses the option to define four distinct sets:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY GROUPING SETS
((TerritoryName, LastName), TerritoryName, LastName, ())
ORDER BY TerritoryName, LastName
The first set is the combination of the TerritoryName and LastName columns. Notice that they're enclosed in their own set of parentheses: As a result, a sum will be provided for each territory/name set. The next two grouping sets are the same columns listed individually. This means that a total will be provided for each territory and name. The final set is a set of empty parentheses, which indicates that a total will be provided for all sales.
As it turns out, the GROUPING SETS option in this statement returns the same results as the previous CUBE example. This is because I've specified the same set of columns, both individually and together. In other words, for the two columns specified in both statements, all possible combinations are aggregated. The CUBE operator acts as a shorthand for specifying every possible combination in GROUPING SETS.
The advantage of the GROUPING SETS option, however, is that you can specify any combination of columns. The following statement, for example, uses GROUPING SETS to define only three sets:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY GROUPING SETS
((TerritoryName, LastName), TerritoryName, ())
ORDER BY TerritoryName, LastName
Unlike the preceding example, the GROUPING SETS option does not include a LastName set. Interestingly, the results returned by this statement will be the same as those returned by the ROLLUP example above. In this case, ROLLUP acts as shorthand for this particular combination of sets, in which the second column name in the first set (in this case, LastName) is not included as a standalone set.
However, the GROUPING SETS groupings are not limited to combinations that must match either CUBE or ROLLUP, which is why GROUPING SETS is an important new feature. In the following example, GROUPING SETS includes only the TerritoryName/LastName set and the total (empty parentheses) set:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY GROUPING SETS ((TerritoryName, LastName), ())
ORDER BY TerritoryName, LastName
Now your results will include only total sales for each territory/name pair and for all sales, as shown in the following table:
| TerritoryName |
LastName |
YTDSales |
| NULL |
NULL |
20220787.8219 |
| Central |
Carson |
3857163.6332 |
| Northeast |
Blythe |
4557045.0459 |
| Northwest |
Campbell |
3587378.4257 |
| Southwest |
Ito |
3018725.4858 |
| Southwest |
Mitchell |
5200475.2313 |
Except for the query that returns the total for all sales, these results are very similar to a basic GROUP BY clause. If you were to drop the total set from the GROUPING SETS option, as shown in the following statement, your results would be the same as a basic GROUP BY clause:
SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY GROUPING SETS ((TerritoryName, LastName))
ORDER BY TerritoryName, LastName
But your goal, of course, is not to try to duplicate a basic GROUP BY clause any more than it is to try to duplicate the ROLLUP or CUBE options. In fact, the power of the GROUPING SETS option becomes more apparent when you're aggregations become more complex. For example, the following SELECT statement uses the GROUPING SETS option to define three groups, one of which includes three columns:
SELECT CountryRegionName, TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE SalesYTD > 3000000
GROUP BY GROUPING SETS ((CountryRegionName, TerritoryName, LastName),
CountryRegionName, TerritoryName)
ORDER BY CountryRegionName, TerritoryName, LastName
The first set is based on the CountryRegionName, TerritoryName, and LastName columns, so the total amount of sales will be provided for each region/territory/name set. The next set of totals is based on the CountryRegionName column only, so totals will be provided for each region. The same is true for territories. The following table shows the results generated by this statement.
| TerritoryName |
LastName |
YTDSales |
|
| NULL |
Canada |
NULL |
3189356.2465 |
| NULL |
Central |
NULL |
3857163.6332 |
| NULL |
France |
NULL |
3827950.238 |
| NULL |
Northeast |
NULL |
4557045.0459 |
| NULL |
Northwest |
NULL |
3587378.4257 |
| NULL |
Southwest |
NULL |
8219200.7171 |
| NULL |
United Kingdom |
NULL |
5015682.3752 |
| Canada |
NULL |
NULL |
3189356.2465 |
| Canada |
Canada |
Saraiva |
3189356.2465 |
| France |
NULL |
NULL |
3827950.238 |
| France |
France |
Varkey Chudukatil |
3827950.238 |
| United Kingdom |
NULL |
NULL |
5015682.3752 |
| United Kingdom |
United Kingdom |
Pak |
5015682.3752 |
| United States |
NULL |
NULL |
20220787.8219 |
| United States |
Central |
Carson |
3857163.6332 |
| United States |
Northeast |
Blythe |
4557045.0459 |
| United States |
Northwest |
Campbell |
3587378.4257 |
| United States |
Southwest |
Ito |
3018725.4858 |
| United States |
Southwest |
Mitchell |
5200475.2313 |
As you can see, the GROUPING SETS option provides you greater control over how you group data in order to aggregate related values. Prior to SQL Server 2008, you would have had to create a complex set of UNION ALL statements to achieve the same results. Now, you need to create only one statement. You should continue to use the ROLLUP and CUBE options when it makes sense, because the syntax is simpler, but when you need to return specific sets of data that fall outside the range of ROLLUP and CUBE, you'll find the GROUPING SETS option to be both useful and easy to use.
| ABOUT THE AUTHOR: |
|
Robert Sheldon is a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. His books include Beginning MySQL (part of the Wrox Programmer-to-Programmer series), SQL: A Beginner's Guide (based on the SQL:1999 standard), MCSE Training Kit: Designing Highly Available Web Solutions with Microsoft Windows 2000 Server Technologies, and MCSE Training Kit: Microsoft SQL Server 2000 Database Design and Implementation. You can find more information at http://www.rhsheldon.com.
|
|
 |

|
Rate this Tip
|
To rate tips, you must be a member of SearchSQLServer.com. Register now
to start rating these tips. Log in if you are already a member.
|


');
// -->
DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.
|
 |