Part one reviewed the basics of working with T-SQL aggregate functions in SQL Server 2008. This next section provides examples of how to group a set of data, use checksums to analyze changes, and perform statistical analyses of group values.
Working with grouped data
When using the GROUP BY clause in a T-SQL statement, you can include the ROLLUP, CUBE or GROUPING SETS operator to specify that additional summary data be included in the result set. The summary data is indicated by the null value listed in the GROUP BY column.
This is a special use of the null value. The column specified in a GROUP BY clause, however, can include null values, which are also shown as a null for the column’s value. To distinguish between these two types of null values, SQL Server supports the GROUPING aggregate function. The function indicates whether the column specified in the GROUP BY clause is aggregated.
The GROUPING function returns a value of 1 if the value in the result set is aggregated. Otherwise, the function returns a value of 0. Let’s take a look at an example of how this works. In the following SELECT statement, I’ve grouped data from the Product table based on the Color column:
AVG(ListPrice) AS AvgPrice,
COUNT(*) AS TotalAmount,
GROUPING(Color) AS AggGroup
Color WITH ROLLUP
As you can see, the GROUP BY clause includes the WITH ROLLUP operator. Because Color is the column specified in the GROUP BY clause, that column can be used with the GROUPING function in the SELECT list.
NOTE: The GROUPING function can only be used in a SELECT list, HAVING clause, or ORDER BY clause.
The GROUPING function is used to indicate whether the value listed in the Color column is one of the grouped values or a rolled up value that’s been aggregated. The following table includes the results returned by the SELECT statement:
As you would expect, the AggGroup column shows a value of 0 for the colors themselves (black, blue, grey, etc.). This indicates that the color is not an aggregated value; indeed, it is simply a grouped color. Where the use of GROUPING becomes important is when the Color column shows a value of NULL.
For the first instance of NULL, the AggGroup column returns a value of 0, which indicates that this instance of NULL is not an aggregation, but rather the group of data for which no color has been assigned.
The second instance of NULL in the Color column, however, shows an AggGroup value of 1, which means that this is part of the ROLLUP aggregation and all totals in the group are aggregated. As a result, you know that the average price and count provided for this row represent the totals for the entire table. In other words, this row contains the rolled up values for all the color groups.
Verifying checksums with aggregate functions
SQL Server also provides the CHECKSUM_AGG function, which lets you run a checksum against a group of values. This is handy for detecting changes in a table. Note, however, that checksum can only be used for an integer expression and ignores null values when performing its calculation.
The following SELECT statement includes two instances of CHECKSUM_AGG, one with DISTINCT and one without:
CHECKSUM_AGG(CAST(Quantity AS int)) AS ChecksumAgg,
CHECKSUM_AGG(DISTINCT CAST(Quantity AS int)) AS
The statement retrieves data from the ProductInventory table. The checksums are performed on the Quantity column, but because the column is configured with the smallint data type, I must convert the value to an int data type before running the checksum. The following table shows the results returned by this statement:
As you can see, the checksum for all non-null values in the Quantity column is 262, while the checksum for distinct, non-null values is 78.
Returning statistical data through T-SQL aggregate functions
SQL Server supports four aggregate functions that let you retrieve statistical data about a group of values in a numeric column:
- STDEV -- returns the statistical standard deviation of the values
- STDEVP -- returns the statistical standard deviation for the population of the values
- VAR -- returns the statistical variance of the values
- VARP -- returns the statistical variance for the population of the values
Each function ignores null values and returns a float value. The returned value indicates how much the set of values deviate from the average (the standard deviation is the square root of the variance). Unfortunately, I’m not a statistician, so you’ll need to refer to a good statistics book for more details about how standard deviations and variances are calculated -- and how population fits into those calculations. In the meantime, the following statement demonstrates how to use these functions:
STDEV(ListPrice) AS StdevPrice,
STDEVP(ListPrice) AS StdevpPrice,
VAR(ListPrice) AS VarPrice,
VARP(ListPrice) AS VarpPrice
In this statement, I’ve used each of the four aggregate functions to return statistical data about the ListPrice column in the Product table. The following table shows the results returned by this statement:
That’s all there is to using aggregate functions in SQL Server. As you can see, they can be a useful addition to your T-SQL arsenal when retrieving data, whether or not you use the GROUP BY clause to group that data. For more details on any of these functions check out the applicable topic in SQL Server Books Online. Each topic includes additional examples that demonstrate how specific functions work.
Back to part one: The basics of T-SQL aggregate functions
ABOUT THE AUTHOR
Robert Sheldon is a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. You can find more information at www.rhsheldon.com.