Home > SQL Server Tips > > New GROUP BY option provides better data control in SQL Server 2008
SQL Server Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 


New GROUP BY option provides better data control in SQL Server 2008


Robert Sheldon, Contributor
05.05.2009
Rating: -4.64- (out of 5)


Expert advice on database administration
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


One of the new Transact-SQL features implemented in SQL Server 2008 is the GROUPING SETS option in the GROUP BY clause of a SELECT statement. Compared with the ROLLUP and CUBE options, GROUPING SETS allows for greater control over how data is grouped into aggregated sets of information.

To understand how GROUPING SETS works, let's first review the GROUP BY clause. As you're no doubt aware, this clause allows you to group data together in order to aggregate specific sets of data associated with those groups. The following SELECT statement, for example, uses a GROUP BY clause to aggregate data from the AdventureWorks2008 database:

SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY TerritoryName, LastName
ORDER BY TerritoryName, LastName

The statement retrieves data from the vSalesPerson view and groups it by the TerritoryName and LastName columns. For each group, the statement uses the SUM function to add up the amounts in the SalesYTD column to provide the total sales. The following table shows the query result, and as you can see, a total is provided for each territory/name set.

TerritoryName   LastName   YTDSales
Central Carson 3857163.6332  

Northeast Blythe 4557045.0459

Northwest Campbell 3587378.4257
Southwest Ito 3018725.4858

Southwest Mitchell 5200475.2313

Although this information is useful, it does not provide the total sales for all territories or the totals for individual territories. To retrieve this sort of data, you can use the ROLLUP ...


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



RELATED CONTENT
Microsoft SQL Server Performance Monitoring and Tuning
Using traces in SQL Server Profiler
SQL Server Mailbag: CALs, witnesses and unwanted changes
SQL Server Mailbag: Data restoration and DB property management
Working with IntelliSense in SQL Server 2008 Management Studio
SQL Server Mailbag: Stored procedures, triggers and SSRS reports
Troubleshooting Distributed Transaction Coordinator errors in SQL Server
Clearing the Windows page file and its effect on server performance
Optimizing SQL Server indexes –- even when they're not your indexes
Performance implications of transaction log autogrowth in SQL Server
The short course on how SQL Server really works

SQL/Transact SQL (T-SQL)
Using DELETE and TRUNCATE TABLE statements to delete data in SQL Server
SQL language crash course (just enough to be dangerous)
Working with IntelliSense in SQL Server 2008 Management Studio
SQL Server Mailbag: Stored procedures, triggers and SSRS reports
Working with sparse columns in SQL Server 2008
Determining the source of full transaction logs in SQL Server
Using the OPENROWSET function in SQL Server
Loading data files with SQL Server's BULK INSERT statement
Importing and exporting bulk data with SQL Server's bcp utility
Testing transaction log autogrowth behavior in SQL Server
SQL/Transact SQL (T-SQL) Research

Microsoft SQL Server 2008
Q&A: SQL Server 2008 a better fit for consolidation
End of life comes for SQL Server 2005 SP2, 2008
What's new for installation with SQL Server 2008?
SQL Server Reporting Services 2008 offers faster speeds, new variations
Microsoft SQL Server 2008 Learning Guide
Understanding transparent data encryption in SQL Server 2008
Working with sparse columns in SQL Server 2008
Implementing SQL Server 2008 FILESTREAM functionality
Microsoft renames SQL Server release, adds data services
An overview of SQL Server Report Builder 2.0

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
contiguity  (SearchSQLServer.com)
contiguous  (SearchSQLServer.com)
drilldown  (SearchSQLServer.com)
hashing  (SearchSQLServer.com)
hybrid online analytical processing  (SearchSQLServer.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary


option to calculate the additional amounts, as shown in the following statement:

SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY ROLLUP (TerritoryName, LastName)
ORDER BY TerritoryName, LastName

Notice that the ROLLUP option specifies the TerritoryName and LastName columns. Because the query includes both columns, the results will contain a rollup of the sales amount for each territory/name set. In addition, the results will also include the total sales for all territories and the totals for the individual territories, as you can see in the following table.

TerritoryName   LastName   YTDSales
NULL NULL 20220787.8219  
Central NULL 3857163.6332
Central Carson 3857163.6332
Northeast NULL 4557045.0459
Northeast Blythe 4557045.0459
Northwest NULL 3587378.4257
Northwest Campbell 3587378.4257
Southwest NULL 8219200.7171
Southwest Ito 3018725.4858
Southwest Mitchell 5200475.2313

You can include additional aggregations by using the CUBE option rather than ROLLUP, as shown in the following SELECT statement:

SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY CUBE (TerritoryName, LastName)
ORDER BY TerritoryName, LastName

The results will now include totals for the individual salesperson, as well as the individual territory (which is shown in the following table). If a salesperson had been associated with multiple territories, those totals would be reflected here. In this case, however, the totals are the same as the totals for the territory/salesperson sets.

TerritoryName   LastName   YTDSales
NULL NULL 20220787.8219  
NULL Blythe 4557045.0459
NULL Campbell 3587378.4257
NULL Carson 3857163.6332
NULL Ito 3018725.4858
NULL Mitchell 5200475.2313
Central NULL 3857163.6332
Central Carson 3857163.6332
Northeast NULL 4557045.0459
Northeast Blythe 4557045.0459
Northwest NULL 3587378.4257
Northwest Campbell 3587378.4257
Southwest NULL 8219200.7171
Southwest Ito 3018725.4858
Southwest Mitchell 5200475.2313

One last thing to note about the ROLLUP and CUBE options is that the syntax used here was introduced in SQL Server 2008. The original syntax is still supported, but the new syntax adheres to the latest ANSI (or American National Standards Institute) standards.

Now that you have an overview of how the GROUP BY clause aggregated data prior to SQL Server 2008, let's look at the new GROUPING SETS option. The following SELECT statement uses the option to define four distinct sets:

SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY GROUPING SETS
((TerritoryName, LastName), TerritoryName, LastName, ())
ORDER BY TerritoryName, LastName

The first set is the combination of the TerritoryName and LastName columns. Notice that they're enclosed in their own set of parentheses: As a result, a sum will be provided for each territory/name set. The next two grouping sets are the same columns listed individually. This means that a total will be provided for each territory and name. The final set is a set of empty parentheses, which indicates that a total will be provided for all sales.

As it turns out, the GROUPING SETS option in this statement returns the same results as the previous CUBE example. This is because I've specified the same set of columns, both individually and together. In other words, for the two columns specified in both statements, all possible combinations are aggregated. The CUBE operator acts as a shorthand for specifying every possible combination in GROUPING SETS.

The advantage of the GROUPING SETS option, however, is that you can specify any combination of columns. The following statement, for example, uses GROUPING SETS to define only three sets:

SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY GROUPING SETS
((TerritoryName, LastName), TerritoryName, ())
ORDER BY TerritoryName, LastName

Unlike the preceding example, the GROUPING SETS option does not include a LastName set. Interestingly, the results returned by this statement will be the same as those returned by the ROLLUP example above. In this case, ROLLUP acts as shorthand for this particular combination of sets, in which the second column name in the first set (in this case, LastName) is not included as a standalone set.

However, the GROUPING SETS groupings are not limited to combinations that must match either CUBE or ROLLUP, which is why GROUPING SETS is an important new feature. In the following example, GROUPING SETS includes only the TerritoryName/LastName set and the total (empty parentheses) set:

SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY GROUPING SETS ((TerritoryName, LastName), ())
ORDER BY TerritoryName, LastName

Now your results will include only total sales for each territory/name pair and for all sales, as shown in the following table:

TerritoryName   LastName   YTDSales
NULL NULL 20220787.8219  
Central Carson 3857163.6332

Northeast Blythe 4557045.0459

Northwest Campbell 3587378.4257
Southwest Ito 3018725.4858

Southwest Mitchell 5200475.2313

Except for the query that returns the total for all sales, these results are very similar to a basic GROUP BY clause. If you were to drop the total set from the GROUPING SETS option, as shown in the following statement, your results would be the same as a basic GROUP BY clause:

SELECT TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE (CountryRegionName = 'United States') AND (SalesYTD > 3000000)
GROUP BY GROUPING SETS ((TerritoryName, LastName))
ORDER BY TerritoryName, LastName

But your goal, of course, is not to try to duplicate a basic GROUP BY clause any more than it is to try to duplicate the ROLLUP or CUBE options. In fact, the power of the GROUPING SETS option becomes more apparent when you're aggregations become more complex. For example, the following SELECT statement uses the GROUPING SETS option to define three groups, one of which includes three columns:

SELECT CountryRegionName, TerritoryName, LastName, SUM(SalesYTD) AS YTDSales
FROM Sales.vSalesPerson
WHERE SalesYTD > 3000000
GROUP BY GROUPING SETS ((CountryRegionName, TerritoryName, LastName),
CountryRegionName, TerritoryName)
ORDER BY CountryRegionName, TerritoryName, LastName

The first set is based on the CountryRegionName, TerritoryName, and LastName columns, so the total amount of sales will be provided for each region/territory/name set. The next set of totals is based on the CountryRegionName column only, so totals will be provided for each region. The same is true for territories. The following table shows the results generated by this statement.

TerritoryName   LastName   YTDSales  
NULL Canada NULL 3189356.2465 
NULL Central NULL 3857163.6332
NULL France NULL 3827950.238
NULL Northeast NULL 4557045.0459
NULL Northwest NULL 3587378.4257
NULL Southwest NULL 8219200.7171
NULL United Kingdom NULL 5015682.3752
Canada NULL NULL 3189356.2465
Canada Canada Saraiva 3189356.2465
France NULL NULL 3827950.238
France France Varkey Chudukatil 3827950.238
United Kingdom NULL NULL 5015682.3752
United Kingdom United Kingdom Pak 5015682.3752
United States NULL NULL 20220787.8219
United States Central Carson 3857163.6332
United States Northeast Blythe 4557045.0459
United States Northwest Campbell 3587378.4257
United States Southwest Ito 3018725.4858
United States Southwest Mitchell 5200475.2313

As you can see, the GROUPING SETS option provides you greater control over how you group data in order to aggregate related values. Prior to SQL Server 2008, you would have had to create a complex set of UNION ALL statements to achieve the same results. Now, you need to create only one statement. You should continue to use the ROLLUP and CUBE options when it makes sense, because the syntax is simpler, but when you need to return specific sets of data that fall outside the range of ROLLUP and CUBE, you'll find the GROUPING SETS option to be both useful and easy to use.

ABOUT THE AUTHOR:   

[IMAGE] Robert Sheldon is a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. His books include Beginning MySQL (part of the Wrox Programmer-to-Programmer series), SQL: A Beginner's Guide (based on the SQL:1999 standard), MCSE Training Kit: Designing Highly Available Web Solutions with Microsoft Windows 2000 Server Technologies, and MCSE Training Kit: Microsoft SQL Server 2000 Database Design and Implementation. You can find more information at http://www.rhsheldon.com.



Rate this Tip
To rate tips, you must be a member of SearchSQLServer.com.
Register now to start rating these tips. Log in if you are already a member.




DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



SQL Server Development - .NET, C#, T-SQL, Visual Basic
HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts