Tip

T-SQL analytic functions in SQL Server can help solve problems quickly

Like other mainstream commercial database systems, SQL Server supports analytic functions in Transact-SQL to depict complex analytical tasks. With the help of these analytic functions, we can

    Requires Free Membership to View

perform common analyses, such as ranking, percentiles, moving averages and cumulative sums that can be expressed concisely in a single SQL statement.

Before analytic functions, complex analytical tasks were performed using self-joins, correlated subqueries, temporary tables or some combination of all three. That was inefficient and consumed a significant amount of system resources. Expressing queries with analytic functions simplifies complex tasks by eliminating programming self-joins and correlated subqueries and using fewer temporary tables.

For more on analytic functions

SQL analytic functions vs. Oracle stored procedures

SQL Server analytic function logs: Why you need to see them

Analytic functions, BI and data mining: What's the difference?

The first batch of Transact-SQL analytic functions came with the release of SQL Server 2005, which included a variety of ranking functions, such as ROW_NUMBER, RANK, DENSE_RANK and NTILE. SQL Server 2012 introduced eight more Transact-SQL analytic functions: PERCENT_RANK, CUME_DIST, PERCENTILE_CONT, PERCENTILE_DISC, LEAD, LAG, FIRST_VALUE and LAST_VALUE.

Ranking analytic functions

Ranking functions return the ranking value for rows within a partition.

There are four ranking functions:

  • Row Number, which assigns a row number to each row in the result set.
  • Rank, which returns the rank value for each row in the result set, but with gaps.
  • Dense Rank, which is the same as the Rank function, but also displays the rank value for each row in the result set and without gaps in the sequence.
  • NTILE, which partitions the ranks into a specific numbered group.

For example, if you have a table with 30 values, you might use NTILE (3) to number the first 10 rows as group 1, the middle 10 rows as group 2 and the last 10 rows as group 3.

For each ranking function, we must specify an OVER clause, which determines the partitioning and ordering of the rowset before the ranking function is applied.

Here is the general syntax of ranking functions:

FUNCTION (Argument1,…[n])

OVER ([PARTITION BY value_expression,…[n])

<<Order_by_clause>>)

An example of this would be if you were to analyze the following query and its results, which would demonstrate the use of ranking functions based on the SalesQuota column.

SELECT [LastName]

      ,[FirstName]

      ,[SalesQuota]

      ,ROW_NUMBER() OVER (ORDER BY [SalesQuota]) AS [Row_Number()]

      ,RANK() OVER (ORDER BY [SalesQuota]) AS [RANK()]

      ,DENSE_RANK() OVER (ORDER BY [SalesQuota]) AS [DENSE_RANK()]

      ,NTILE(10) OVER (ORDER BY [SalesQuota]) AS [NTILE(4)]

FROM [AdventureWorks2012].[Sales].[vSalesPerson]

We can use the optional PARTITION BY clause, which divides the rows based on the value expression. Then the rows are ranked in the order specified. For example, I'll rewrite the above query as follows to partition the result set by the CountryRegionName column.

SELECT [LastName], [FirstName], [SalesQuota], [CountryRegionName]

      ,ROW_NUMBER() OVER (PARTITION BY [CountryRegionName]

                        ORDER BY [SalesQuota]) AS [Row_Number()]

      ,RANK() OVER (PARTITION BY [CountryRegionName]

                        ORDER BY [SalesQuota]) AS [RANK()]

      ,DENSE_RANK() OVER (PARTITION BY [CountryRegionName]

                        ORDER BY [SalesQuota]) AS [DENSE_RANK()]

      ,NTILE(4) OVER (PARTITION BY [CountryRegionName]

                        ORDER BY [SalesQuota]) AS [NTILE(4)]

FROM [AdventureWorks2012].[Sales].[vSalesPerson]

PERCENT_RANK
Use the PERCENT_RANK function to evaluate the relative position of each row in a query result set or partition. SQL Server uses the following formula to calculate the value for the PERCENT_RANK column:

(rank() – 1) / (total rows in a query result set or partition – 1)

For example, the following example uses the PERCENT_RANK function to compute the rank of the salesperson's sales quota within a country as a percentage. The PARTITION BY clause is specified for partitioning of the rows in the result set by country region name, and the ORDER BY clause orders the rows in each partition.

SELECT [LastName], [FirstName], [SalesQuota], [CountryRegionName]

      ,RANK() OVER (PARTITION BY [CountryRegionName]

                        ORDER BY [SalesQuota]) AS [RANK()]

      ,PERCENT_RANK() OVER (PARTITION BY [CountryRegionName]

                        ORDER BY [SalesQuota]) AS [PERCENT_RANK()]

FROM [AdventureWorks2012].[Sales].[vSalesPerson]

WHERE [SalesQuota] IS NOT NULL

CUME_DIST
The CUME_DIST function evaluates cumulative distribution value in a group of values in a given result set or partition. SQL Server uses the following formula to calculate the value for the CUME_DIST column:

(values less than or equal to the current value in the group)

/ (total row in a query result set or partition)

The following example uses the CUME_DIST function to calculate the sales quota percentile for each salesperson's sales quota within a given country. The value returned by the CUME_DIST function represents the percentage of salespeople who have a sales quota less than or equal to that of the current salespeople in the same country.

SELECT [LastName], [FirstName], [SalesQuota], [CountryRegionName]

      ,RANK() OVER (PARTITION BY [CountryRegionName]

                        ORDER BY [SalesQuota]) AS [RANK()]

      ,CUME_DIST() OVER (PARTITION BY [CountryRegionName]

                        ORDER BY [SalesQuota]) AS [CUME_DIST()]

FROM [AdventureWorks2012].[Sales].[vSalesPerson]

WHERE [SalesQuota] IS NOT NULL

PERCENTILE_CONT and PERCENTILE_DESC
The PERCENTILE_CONT function uses a discrete distribution model to calculate a percentile. It accepts the percentile value (that is, the desired CUME_DIST value) and a sort specification to return the value that would fall within that percentile value. The PERCENTILE_DESC function works the same as a PERCENTILE_CONT function: It returns the smallest value whose percentile is greater than or equal to the given percentile.

Here is the general syntax of PERCENTILE_CONT and PERCENTILE_DESC:

FUNCTION (CUME_DIST)

WITHIN GROUP (<<Order_by_clause>>)

OVER ([PARTITION BY value_expression,…[n])

As yet another example, execute the following query, which uses PERCENTILE_CONT and PERCENTILE_DISC to find the median employee salary in each business entity. These functions do not always return the same value because PERCENTILE_CONT estimates the correct value, which may not exist in the data set, while PERCENTILE_DISC always gives an actual value for the set.

SELECT TOP 15

[BusinessEntityID], [Rate]

,PERCENTILE_CONT (0.5) WITHIN GROUP (ORDER BY [Rate])

      OVER (PARTITION BY [BusinessEntityID]) AS [PERCENTILE_CONT (0.5)]

      ,PERCENTILE_DISC (0.5) WITHIN GROUP (ORDER BY [Rate])

      OVER (PARTITION BY [BusinessEntityID]) AS [PERCENTILE_DISC (0.5)]

      ,CUME_DIST() OVER (PARTITION BY [BusinessEntityID]

ORDER BY [Rate]) AS [CUME_DIST()]

FROM [AdventureWorks2012].[HumanResources].[EmployeePayHistory]

LEAD and LAG
The LEAD function provides access to the row that follows the current row. LAG is the opposite of LEAD: It provides access to the previous row instead of the row that follows the current row. Here is the general syntax:

FUNCTION (scalar_expression [ ,offset ] , [ default ])

OVER ([PARTITION BY value_expression,…[n])

<<Order_by_clause>>)

The following example uses the LEAD and LAG functions to compare year-to-date sales among AdventureWorks employees partitioned by sales territory.

SELECT [FirstName]

,[TerritoryGroup]

      ,[SalesYTD] 

      ,LEAD([SalesYTD]) OVER (PARTITION BY [TerritoryGroup]

ORDER BY [SalesYTD]) AS [Next_Lower_SalesYTD (LEAD ())]

      ,LAG([SalesYTD]) OVER (PARTITION BY [TerritoryGroup]

ORDER BY [SalesYTD]) AS [Prev_Higer_SalesYTD (LAG ())]

FROM [AdventureWorks2012].[Sales].[vSalesPerson]

WHERE [TerritoryGroup] IS NOT NULL

FIRST_VALUE and LAST_VALUE
FIRST_VALUE returns the first value of the result set or partition, whereas LAST_VALUE returns the last value of the result set or partition. If the last value in the set is null, the function returns NULL unless you specify IGNORE NULLS.

When not specified, the rows range clause defaults to “RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW”, which sometime returns an unexpected value. This is because the last value in the window is fixed. For proper results, specify row range as either “RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING” or “RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING”.

Here is the syntax:

FUNCTION (scalar_expression [,offset] , [default])

OVER ([PARTITION BY value_expression,…[n])

<<Order_by_clause>> <<Rows_range_clause>>)

The following example uses the FIRST_VALUE and LAST_VALUE functions to return highest and lowest year-to-date figures for each sales territory.

SELECT DISTINCT [TerritoryGroup]

      ,FIRST_VALUE([SalesYTD])

        OVER (PARTITION BY [TerritoryGroup]

            ORDER BY [SalesYTD]

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

AS [Highest_SalesYTD (FIRST_VALUE)]

      ,LAST_VALUE([SalesYTD])

        OVER (PARTITION BY [TerritoryGroup]

            ORDER BY [SalesYTD]

            RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)

AS [Lowest_SalesYTD (FIRST_VALUE)]

FROM [AdventureWorks2012].[Sales].[vSalesPerson]

WHERE [TerritoryGroup] IS NOT NULL

This was first published in September 2012

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.