Using T-SQL string functions to convert and report on data

There’s plenty more to learn about string functions beyond the basics. Here you’ll find details on more advanced practices, like performing a SOUNDEX analysis.

In part one, you learned some basic T-SQL string functions and how to use them. This article takes things one step further with examples of how to get the most out of string functions by converting numerical and character data, gathering additional information and performing a SOUNDEX analysis.

Converting string values

T-SQL provides functions used to convert string values, as shown in the following SELECT statement:

   SELECT
     Name + CHAR(9) + STR(MakeFlag) AS Product,
     ASCII(MakeFlag)  AsciiMakeFlag,
     RTRIM(Style) AS Style,
     UNICODE(RTRIM(Style)) AS UnicodeStyle
   FROM
     Production.Product
   WHERE
     ProductID = 771

The statement returns the following results:

Product                     AsciiMakeFlag Style UnicodeStyle
--------------------------- ------------- ----- ------------
Mountain-100 Silver, 38   1 49            U     85

 (1 row(s) affected)

The first function in the statement, CHAR, converts an integer ASCII code (0 through 255) to its character symbol, and only takes one argument -- the ASCII integer. The function is particularly handy for inserting control characters into your strings, such as the tab (9), line feed (10), or carriage return (13).

In the example above, CHAR(9) is used to insert a tab into the Product value. Note that T-SQL also supports the NCHAR function, which converts a Unicode integer to Unicode characters. The next function, STR, converts numerical data to character data. The argument for the STR function is the numeric expression to be converted. In the example above, the MakeFlag bit column is converted to a string so it can be concatenated with the Name column. Remember, if the two columns are concatenated without converting the bit column, an error message will display.

T-SQL also supports the ASCII function, which converts a single character to an ASCII integer. The example uses the ASCII function to convert the MakeFlag value to an integer. Since the column returns a 1, the value is converted to 49.

The UNICODE function is also shown in the example, and is used to convert a character to a Unicode integer. The function takes only one argument, a Unicode character expression. In the above example, the Style value is converted to a Unicode integer. It is important, however, to first use RTRIM to remove the trailing space because the UNICODE function, like the ASCII function, can convert only one character at a time.

Gathering information from string values

There are also functions that provide information about a string value, such as its length or the placement of a substring within the value. The following SELECT statement includes several functions that provide information about a character value:

   SELECT
     LEN(Description) AS LengthDescrip,
     CHARINDEX('tech', Description) AS CharIndexDescrip,
     PATINDEX('%bike%', Description) AS PatIndexDescrip
   FROM
     Production.ProductDescription
   WHERE
     ProductDescriptionID = 321

The functions in this statement are based on the Description column. For this product, the Description column contains this value: Same technology as all of our Road series bikes. Perfect all-around bike for road or racing. The SELECT statement returns the following results:

LengthDescrip CharIndexDescrip PatIndexDescrip
------------- ---------------- ---------------
93 6 43

  (1 row(s) affected)

The LEN function returns the number of characters in the specified character expression. The expression is the function’s only argument. For the Description column, there are 93 characters.

The CHARINDEX function searches a character expression for the starting position of a set of characters and takes three arguments: the characters to search for, the expression to be searched, and the starting position for the search. This last argument is optional. In the example above, the Description column is searched for the first occurrence of tech, which starts at the sixth position of the string value.

The final function in the above statement is PATINDEX, which searches for the first position of a character pattern in a character expression. The function takes two arguments: the pattern that is being searched for and the character expression to be searched. Wildcards can be used to define the pattern and provide flexibility in the types of patterns searched. If you use the % wildcard, however, it must precede and follow the characters to be searched, unless you search on the first or last character. For example, in the statement above, %bike% is specified as the character pattern to search for in the Description value. The results show that the first instance of the pattern appears at character 43.

Performing SOUNDEX analysis on string values

The final set of string functions discussed is related to SOUNDEX, a system for converting alphanumeric values to a four-character code in order to identify similar sounding words. The first character returned by a SOUNDEX function is the same as the target character expression, and the next three characters are numeric codes that evaluate the consonants. (Vowels are ignored unless they are the first letter.)

In the following T-SQL statements, SOUNDEX-related functions are used to analyze two last names:

   DECLARE @Name1 varchar(30)
   SET @Name1 =
     (SELECT LastName FROM Person.Person
     WHERE BusinessEntityID = 341)
   DECLARE @Name2 varchar(30)
   SET @Name2 =
     (SELECT LastName FROM Person.Person
     WHERE BusinessEntityID = 441)
   SELECT
     @Name1 AS Name1,
     @Name2 AS Name2,
     SOUNDEX(@name1)AS SoundexName1,
     SOUNDEX(@Name2) AS SoundexName2,
     DIFFERENCE(@Name1, @Name2) AS SoundexDiff

The last names returned by the SET subquery statements are Allen and Alan, which are inserted into variables, and used in the SELECT statement, returning the following results:

Name1 Name2 SoundexName1 SoundexName2 SoundexDiff
----- ----- ------------ ------------ -----------
Allen Alan A450 A450 4

  (1 row(s) affected)

The first time the SOUNDEX function is used, the last name Allen is analyzed; the second time analyzes Alan. Notice that the function takes only one argument -- the string to be analyzed. In this case, the SOUNDEX analyses of the two spellings return the same results, meaning the names are very similar. This can be handy for identifying someone whose name is spelled differently in two places. The DIFFERENCE function is also demonstrated and takes two arguments, the character expressions to be compared, and returns an integer that rates the difference between the two strings. The integer can range from 0 through 4. A value of 0 indicates little to no similarity between two strings, while a 4 indicates a strong similarity. In the example above, the function returns a 4 because the names are similar.

If you want to see the results for two names that are less alike, run the following statements:

   DECLARE @Name3 varchar(30)
   SET @Name3 =
     (SELECT LastName FROM Person.Person
     WHERE BusinessEntityID = 1829)
   DECLARE @Name4 varchar(30)
   SET @Name4 =
     (SELECT LastName FROM Person.Person
     WHERE BusinessEntityID = 2580)
   SELECT
     @Name3 AS Name3,
     @Name4 AS Name4,
     SOUNDEX(@Name3)AS SoundexName3,
     SOUNDEX(@Name4) AS SoundexName4,
     DIFFERENCE(@Name3, @Name4) AS SoundexDiff

In this case, the SOUNDEX function returns a value of S000 for the last name Su and a value of S620 for the last name Suarez. In addition, the DIFFERENCE function returns a value of 3, as shown in the following results:

Name3 Name4 SoundexName3 SoundexName4 SoundexDiff
----- ------ ------------ ------------ -----------
Su Suarez S000 S620 3

  (1 row(s) affected)

The SOUNDEX function and string functions provide a great deal of flexibility when working with character data. Although these examples are relatively basic, more complex statements using these functions can be created. Be sure to check out SQL Server Books Online for more information.

Back to Part one: T-SQL functions - The ins and outs for SQL Server

ABOUT THE AUTHOR
Robert Sheldon
is a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. You can find more information at www.rhsheldon.com.

This was first published in April 2010

Dig deeper on SQL-Transact SQL (T-SQL)

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchDataCenter

SearchDataManagement

SearchAWS

SearchOracle

SearchContentManagement

SearchWindowsServer

Close