The nuts and bolts of T-SQL aggregate functions in SQL Server 2008
A recent article discussed the basics
of T-SQL string functions with examples of how to truncate, modify and collect data from
them. This next two-part series reviews T-SQL aggregate functions, with details on how to find the
average and sum of a set of numeric values, as well as the number of values in a group. Part
two details how to obtain additional data from a set of values, run a checksum, and access
statistical data.
When working with data in a SQL Server database, there may be times when you want
Premium Access
Register now for unlimited access to our premium content across our network of over 70 information Technology web sites.
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States.
Privacy
Dig Deeper
-
People who read this also read...
-
This was first published in May 2010
to perform a
calculation on a set of values. For example, your data might include the amounts of individual
retail sales made by your organization. As a result, you may want to find the total sales for a
particular year or the average amount of sales for each representative.
Transact-SQL includes aggregate functions in SQL Server
2008 that let you perform these types of calculations. A T-SQL aggregate function, which
returns a single value, can be used as an expression in a SELECT list or in a COMPUTE, COMPUTE BY,
or HAVING clause. All aggregate functions ignore null values, except for COUNT.
In this article, I will demonstrate how to use the aggregate functions available to SQL Server
2008. The examples created for this article are based on a local instance of SQL Server 2008 and
the AdventureWorks2008
sample database. The examples are organized according to functionality, though this organization is
meant only as a way to help explain and demonstrate the functions.
The basics of T-SQL aggregate functions
The first T-SQL aggregate functions I’ll start with are AVG and SUM. The AVG
function returns the average of a set of numeric values, while the SUM function returns the sum
total of such a set. In both cases, null values are ignored.
The following example uses both of these functions to provide summary sales data from the
SalesOrderHeader table in the AdventureWorks2008 database:
SELECT
TerritoryID,
AVG(SubTotal) AS
AvgSales,
SUM(SubTotal) AS
TotalSales
FROM
Sales.SalesOrderHeader
GROUP BY
TerritoryID
ORDER BY
TerritoryID
In this example, I’ve grouped the values by Territory ID, and then for each group I’ve provided
the average and total amount of sales based on the SubTotal column.
As you can see, to use a T-SQL aggregate function I have to specify the function name, followed
by the column (in parentheses) that contains the values I want to aggregate. Note that the column
can actually be any expression that adheres to the function’s requirements. For details about what
expressions you can use with a function, see the applicable topic in SQL Server Books Online.
That’s all there is to using the AVG and SUM aggregate functions. The statement above returns
the results shown in the following table:
Table 1
As the results show, the statement returns the average and total sales for each Territory ID in
the SalesOrderHeader table.
Now let’s look at two other aggregate functions: MAX and MIN. The MAX function
returns the highest value from a set of values, and the MIN function returns the lowest value. Both
functions ignore null values. In the following statement, I’ve modified the previous example to
include the MAX and MIN functions:
SELECT
TerritoryID,
AVG (SubTotal) AS
AvgSales,
SUM(SubTotal) AS
TotalSales,
MAX(SubTotal) AS
MaxSale,
MIN(SubTotal) AS MinSale
FROM
Sales.SalesOrderHeader
GROUP BY
TerritoryID
ORDER BY
TerritoryID
Notice that I once again had to specify the function name, followed by the column name in
parentheses. The statement results now show the maximum and minimum sales value for each Territory
ID, as shown in the following table:
Table 2 (click to enlarge)
One aspect of the MAX and MIN functions that makes them different from AVG and SUM is that you
can also use MAX and MIN with character and datetime
data. When working with character data, the maximum and minimum values are based on the
column’s collating sequence. For instance, in the following example I’ve retrieved the first and
last names of products listed in the Product table:
SELECT
MIN(Name) AS FirstProduct,
MAX(Name) AS LastProduct
FROM
Production.Product
The name column is configured with the nvarchar
data type. When I use the MAX or MIN functions against that column, the maximum and minimum
values returned are based on the alphabetical list of names in that column. As a result, the
statement returns the first and last product names in that list, as shown in the following
table:
Table 3
If I had used the MIN or MAX function against a datetime column, the value returned would be
based on the earliest or latest datetime value stored in the column.
Determining row counts with T-SQL aggregate
functions
Another valuable aggregate function supported by SQL Server is COUNT. The COUNT function
returns the number of values in a group. Unlike other aggregate functions, you can specify whether
null values are included in that count. In addition, you can specify whether to count all values or
distinct values. The following SELECT statement shows several ways you can use the COUNT function
to determine the number of values in a group:
SELECT
COUNT(*) AS ProductCount,
COUNT(ProductModelID) AS
ModelCount,
COUNT(DISTINCT ProductModelID) AS
DistinctCount
FROM
Production.Product
As you can see, the statement uses the COUNT function to arrive at three different counts. The
first instance of the function uses the format COUNT(*) to return the number of values in the group, including null values. In
this case, the group includes the entire Product table. Because there are 504 rows in that table,
the function returns a value of 504, as shown below:
Table 4
The second instance of the COUNT function simply counts the values in the ProductModelID column.
The count returned by this instance of the function does not include null values, so the number
returned is 295 rather than 504, as shown in the preceding table.
The third instance of the COUNT function includes the DISTINCT keyword, which indicates that
only distinct, non-null values should be included in the count. As a result, the function returns a
value of only 119.
NOTE: Most aggregate functions support the DISTINCT keyword
so you can specify that the aggregation is performed only on a distinct set of values from the
specified group. Refer to each function’s topic in SQL Server Books Online for details on how to
use DISTINCT with that aggregate function.
In addition to the COUNT aggregate function, SQL Server supports the COUNT_BIG function.
The only difference between the two functions is that COUNT returns an int value and COUNT_BIG
returns a bigint value.
Part two: Grouped
data, checksums and stat retrieval
ABOUT THE AUTHOR
Robert Sheldon is a technical consultant and the author of numerous books,
articles, and training material related to Microsoft Windows, various relational database
management systems, and business intelligence design and implementation. You can find more
information at www.rhsheldon.com.
Disclaimer:
Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.
Join the conversationComment
Share
Comments
Results
Contribute to the conversation