New GROUP BY option provides better data control in SQL Server 2008

GROUPING SETS in the GROUP BY clause trumps the ROLLUP and CUBE options by providing a more efficient and specific data aggregator in SQL Server 2008 – and greater data control.

One of the new Transact-SQL features implemented in SQL Server 2008 is the GROUPING SETS option in the GROUP BY

clause of a SELECT statement. Compared with the ROLLUP and CUBE options, GROUPING SETS allows for greater control over how data is grouped into aggregated sets of information.

To understand how GROUPING SETS works, let's first review the GROUP BY clause. As you're no doubt aware, this clause allows you to group data together in order to aggregate specific sets of data associated with those groups. The following SELECT statement, for example, uses a GROUP BY clause to aggregate data from the AdventureWorks2008 database:

SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY TerritoryName, LastName
ORDER BY TerritoryName, LastName

The statement retrieves data from the vSalesPerson view and groups it by the TerritoryName and LastName columns. For each group, the statement uses the SUM function to add up the amounts in the SalesYTD column to provide the total sales. The following table shows the query result, and as you can see, a total is provided for each territory/name set.

TerritoryName   LastName   YTDSales
Central Carson 3857163.6332  
Northeast Blythe 4557045.0459
Northwest Campbell 3587378.4257
Southwest Ito 3018725.4858
Southwest Mitchell 5200475.2313

Although this information is useful, it does not provide the total sales for all territories or the totals for individual territories. To retrieve this sort of data, you can use the ROLLUP option to calculate the additional amounts, as shown in the following statement:

SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY ROLLUP (TerritoryName, LastName)
ORDER BY TerritoryName, LastName

Notice that the ROLLUP option specifies the TerritoryName and LastName columns. Because the query includes both columns, the results will contain a rollup of the sales amount for each territory/name set. In addition, the results will also include the total sales for all territories and the totals for the individual territories, as you can see in the following table.

TerritoryName   LastName   YTDSales
NULL NULL 20220787.8219  
Central NULL 3857163.6332
Central Carson 3857163.6332
Northeast NULL 4557045.0459
Northeast Blythe 4557045.0459
Northwest NULL 3587378.4257
Northwest Campbell 3587378.4257
Southwest NULL 8219200.7171
Southwest Ito 3018725.4858
Southwest Mitchell 5200475.2313

You can include additional aggregations by using the CUBE option rather than ROLLUP, as shown in the following SELECT statement:

SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY CUBE (TerritoryName, LastName)
ORDER BY TerritoryName, LastName

The results will now include totals for the individual salesperson, as well as the individual territory (which is shown in the following table). If a salesperson had been associated with multiple territories, those totals would be reflected here. In this case, however, the totals are the same as the totals for the territory/salesperson sets.

TerritoryName   LastName   YTDSales
NULL NULL 20220787.8219  
NULL Blythe 4557045.0459
NULL Campbell 3587378.4257
NULL Carson 3857163.6332
NULL Ito 3018725.4858
NULL Mitchell 5200475.2313
Central NULL 3857163.6332
Central Carson 3857163.6332
Northeast NULL 4557045.0459
Northeast Blythe 4557045.0459
Northwest NULL 3587378.4257
Northwest Campbell 3587378.4257
Southwest NULL 8219200.7171
Southwest Ito 3018725.4858
Southwest Mitchell 5200475.2313

One last thing to note about the ROLLUP and CUBE options is that the syntax used here was introduced in SQL Server 2008. The original syntax is still supported, but the new syntax adheres to the latest ANSI (or American National Standards Institute) standards.

Now that you have an overview of how the GROUP BY clause aggregated data prior to SQL Server 2008, let's look at the new GROUPING SETS option. The following SELECT statement uses the option to define four distinct sets:

SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY GROUPING SETS
((TerritoryName, LastName), TerritoryName, LastName, ())
ORDER BY TerritoryName, LastName

The first set is the combination of the TerritoryName and LastName columns. Notice that they're enclosed in their own set of parentheses: As a result, a sum will be provided for each territory/name set. The next two grouping sets are the same columns listed individually. This means that a total will be provided for each territory and name. The final set is a set of empty parentheses, which indicates that a total will be provided for all sales.

As it turns out, the GROUPING SETS option in this statement returns the same results as the previous CUBE example. This is because I've specified the same set of columns, both individually and together. In other words, for the two columns specified in both statements, all possible combinations are aggregated. The CUBE operator acts as a shorthand for specifying every possible combination in GROUPING SETS.

The advantage of the GROUPING SETS option, however, is that you can specify any combination of columns. The following statement, for example, uses GROUPING SETS to define only three sets:

SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY GROUPING SETS
((TerritoryName, LastName), TerritoryName, ())
ORDER BY TerritoryName, LastName

Unlike the preceding example, the GROUPING SETS option does not include a LastName set. Interestingly, the results returned by this statement will be the same as those returned by the ROLLUP example above. In this case, ROLLUP acts as shorthand for this particular combination of sets, in which the second column name in the first set (in this case, LastName) is not included as a standalone set.

However, the GROUPING SETS groupings are not limited to combinations that must match either CUBE or ROLLUP, which is why GROUPING SETS is an important new feature. In the following example, GROUPING SETS includes only the TerritoryName/LastName set and the total (empty parentheses) set:

SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY GROUPING SETS ((TerritoryName, LastName), ())
ORDER BY TerritoryName, LastName

Now your results will include only total sales for each territory/name pair and for all sales, as shown in the following table:

TerritoryName   LastName   YTDSales
NULL NULL 20220787.8219  
Central Carson 3857163.6332
Northeast Blythe 4557045.0459
Northwest Campbell 3587378.4257
Southwest Ito 3018725.4858
Southwest Mitchell 5200475.2313

Except for the query that returns the total for all sales, these results are very similar to a basic GROUP BY clause. If you were to drop the total set from the GROUPING SETS option, as shown in the following statement, your results would be the same as a basic GROUP BY clause:

SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY GROUPING SETS ((TerritoryName, LastName))
ORDER BY TerritoryName, LastName

But your goal, of course, is not to try to duplicate a basic GROUP BY clause any more than it is to try to duplicate the ROLLUP or CUBE options. In fact, the power of the GROUPING SETS option becomes more apparent when you're aggregations become more complex. For example, the following SELECT statement uses the GROUPING SETS option to define three groups, one of which includes three columns:

SELECT CountryRegionName, TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE SalesYTD > 3000000
GROUP BY GROUPING SETS ((CountryRegionName, TerritoryName, LastName),
CountryRegionName, TerritoryName)
ORDER BY CountryRegionName, TerritoryName, LastName

The first set is based on the CountryRegionName, TerritoryName, and LastName columns, so the total amount of sales will be provided for each region/territory/name set. The next set of totals is based on the CountryRegionName column only, so totals will be provided for each region. The same is true for territories. The following table shows the results generated by this statement.

TerritoryName   LastName   YTDSales    
NULL Canada NULL 3189356.2465 
NULL Central NULL 3857163.6332
NULL France NULL 3827950.238
NULL Northeast NULL 4557045.0459
NULL Northwest NULL 3587378.4257
NULL Southwest NULL 8219200.7171
NULL United Kingdom NULL 5015682.3752
Canada NULL NULL 3189356.2465
Canada Canada Saraiva 3189356.2465
France NULL NULL 3827950.238
France France Varkey Chudukatil 3827950.238
United Kingdom NULL NULL 5015682.3752
United Kingdom United Kingdom Pak 5015682.3752
United States NULL NULL 20220787.8219
United States Central Carson 3857163.6332
United States Northeast Blythe 4557045.0459
United States Northwest Campbell 3587378.4257
United States Southwest Ito 3018725.4858
United States Southwest Mitchell 5200475.2313

As you can see, the GROUPING SETS option provides you greater control over how you group data in order to aggregate related values. Prior to SQL Server 2008, you would have had to create a complex set of UNION ALL statements to achieve the same results. Now, you need to create only one statement. You should continue to use the ROLLUP and CUBE options when it makes sense, because the syntax is simpler, but when you need to return specific sets of data that fall outside the range of ROLLUP and CUBE, you'll find the GROUPING SETS option to be both useful and easy to use.

ABOUT THE AUTHOR
Robert Sheldon is a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. You can find more information at  http://www.rhsheldon.com.

This was first published in May 2009

Dig deeper on Microsoft SQL Server 2008

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchBusinessAnalytics

SearchDataCenter

SearchDataManagement

SearchAWS

SearchOracle

SearchContentManagement

SearchWindowsServer

Close