A recent article discussed the basics of T-SQL string functions with examples of how to truncate, modify and collect data from them. This next two-part series reviews T-SQL aggregate functions, with details on how to find the average and sum of a set of numeric values, as well as the number of values in a group. Part two details how to obtain additional data from a set of values, run a checksum, and access statistical data.
When working with data in a SQL Server database, there may be times when you want to perform a calculation on a set of values. For example, your data might include the amounts of individual retail sales made by your organization. As a result, you may want to find the total sales for a particular year or the average amount of sales for each representative.
Transact-SQL includes aggregate functions in SQL Server 2008 that let you perform these types of calculations. A T-SQL aggregate function, which returns a single value, can be used as an expression in a SELECT list or in a COMPUTE, COMPUTE BY, or HAVING clause. All aggregate functions ignore null values, except for COUNT.
In this article, I will demonstrate how to use the aggregate functions available to SQL Server 2008. The examples created for this article are based on a local instance of SQL Server 2008 and the AdventureWorks2008 sample database. The examples are organized according to functionality, though this organization is meant only as a way to help explain and demonstrate the functions.
The basics of T-SQL aggregate functions
The first T-SQL aggregate functions I’ll start with are AVG and SUM. The AVG function returns the average of a set of numeric values, while the SUM function returns the sum total of such a set. In both cases, null values are ignored.
The following example uses both of these functions to provide summary sales data from the SalesOrderHeader table in the AdventureWorks2008 database:
AVG(SubTotal) AS AvgSales,
SUM(SubTotal) AS TotalSales
In this example, I’ve grouped the values by Territory ID, and then for each group I’ve provided the average and total amount of sales based on the SubTotal column.
As you can see, to use a T-SQL aggregate function I have to specify the function name, followed by the column (in parentheses) that contains the values I want to aggregate. Note that the column can actually be any expression that adheres to the function’s requirements. For details about what expressions you can use with a function, see the applicable topic in SQL Server Books Online.
That’s all there is to using the AVG and SUM aggregate functions. The statement above returns the results shown in the following table:
As the results show, the statement returns the average and total sales for each Territory ID in the SalesOrderHeader table.
Now let’s look at two other aggregate functions: MAX and MIN. The MAX function returns the highest value from a set of values, and the MIN function returns the lowest value. Both functions ignore null values. In the following statement, I’ve modified the previous example to include the MAX and MIN functions:
AVG (SubTotal) AS AvgSales,
SUM(SubTotal) AS TotalSales,
MAX(SubTotal) AS MaxSale,
MIN(SubTotal) AS MinSale
Notice that I once again had to specify the function name, followed by the column name in parentheses. The statement results now show the maximum and minimum sales value for each Territory ID, as shown in the following table:
One aspect of the MAX and MIN functions that makes them different from AVG and SUM is that you can also use MAX and MIN with character and datetime data. When working with character data, the maximum and minimum values are based on the column’s collating sequence. For instance, in the following example I’ve retrieved the first and last names of products listed in the Product table:
MIN(Name) AS FirstProduct,
MAX(Name) AS LastProduct
The name column is configured with the nvarchar data type. When I use the MAX or MIN functions against that column, the maximum and minimum values returned are based on the alphabetical list of names in that column. As a result, the statement returns the first and last product names in that list, as shown in the following table:
If I had used the MIN or MAX function against a datetime column, the value returned would be based on the earliest or latest datetime value stored in the column.
Determining row counts with T-SQL aggregate functions
Another valuable aggregate function supported by SQL Server is COUNT. The COUNT function returns the number of values in a group. Unlike other aggregate functions, you can specify whether null values are included in that count. In addition, you can specify whether to count all values or distinct values. The following SELECT statement shows several ways you can use the COUNT function to determine the number of values in a group:
COUNT(*) AS ProductCount,
COUNT(ProductModelID) AS ModelCount,
COUNT(DISTINCT ProductModelID) AS DistinctCount
As you can see, the statement uses the COUNT function to arrive at three different counts. The first instance of the function uses the format COUNT(*) to return the number of values in the group, including null values. In this case, the group includes the entire Product table. Because there are 504 rows in that table, the function returns a value of 504, as shown below:
The second instance of the COUNT function simply counts the values in the ProductModelID column. The count returned by this instance of the function does not include null values, so the number returned is 295 rather than 504, as shown in the preceding table.
The third instance of the COUNT function includes the DISTINCT keyword, which indicates that only distinct, non-null values should be included in the count. As a result, the function returns a value of only 119.
NOTE: Most aggregate functions support the DISTINCT keyword so you can specify that the aggregation is performed only on a distinct set of values from the specified group. Refer to each function’s topic in SQL Server Books Online for details on how to use DISTINCT with that aggregate function.
In addition to the COUNT aggregate function, SQL Server supports the COUNT_BIG function. The only difference between the two functions is that COUNT returns an int value and COUNT_BIG returns a bigint value.
Part two: Grouped data, checksums and stat retrieval
ABOUT THE AUTHOR
Robert Sheldon is a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. You can find more information at www.rhsheldon.com.