T-SQL includes several built-in string functions that can be used to manipulate numerical and character data. This two-part series discusses the basics of these functions and provides examples for how to use them, beginning with how to truncate and modify string values. Part two discusses how to convert and gather information from string values, as well as how to perform a SOUNDEX analysis.
String functions in T-SQL let you work with character values such as char, nchar, varchar and nvarchar. For instance, you can use string functions to remove trailing spaces, extract part of a value, or convert numerical data to character data. String functions add flexibility to your T-SQL statements and make it easier to work with character data that is not in the form you need.
In this article, I’ll describe the basic string functions available in SQL Server and provide examples for how to use them. I created the examples on a local instance of SQL Server 2008 and based them on the AdventureWorks2008 sample database. The examples are organized into five categories: truncating data, modifying data, converting data, returning details about data, and performing SOUNDEX analysis on data. This organization is meant to provide a better understanding of each function and the differences between them.
Truncating string values
The following set of functions details how to extract parts of a string value. Most of these functions are included in the following SELECT statement:
LEFT(Name, 12) AS BikeType,
SUBSTRING(Name, 9, 4) AS Model,
RIGHT(Name, 2) AS Size,
RTRIM(Style) AS Style
ProductNumber LIKE 'bk-t%'
As you can see, the SELECT clause includes the functions LEFT, SUBSTRING, RIGHT, and RTRIM. When you run the statement against the AdventureWorks2008 database, you should receive the following results:
|Touring-2000 Blue, 60||Touring-2000||2000||60||U|
|Touring-1000 Yellow, 46||Touring-1000||1000||46||U|
|Touring-1000 Yellow, 50||Touring-1000||1000||50||U|
|Touring-1000 Yellow, 54||Touring-1000||1000||54||U|
|Touring-1000 Yellow, 60||Touring-1000||1000||60||U|
|Touring-3000 Blue, 54||Touring-3000||3000||54||U|
|Touring-3000 Blue, 58||Touring-3000||3000||58||U|
|Touring-3000 Blue, 62||Touring-3000||3000||62||U|
|Touring-3000 Yellow, 44||Touring-3000||3000||44||U|
|Touring-3000 Yellow, 50||Touring-3000||3000||50||U|
|Touring-3000 Yellow, 54||Touring-3000||3000||54||U|
|Touring-3000 Yellow, 58||Touring-3000||3000||58||U|
|Touring-3000 Yellow, 62||Touring-3000||3000||62||U|
|Touring-1000 Blue, 46||Touring-1000||1000||46||U|
|Touring-1000 Blue, 50||Touring-1000||1000||50||U|
|Touring-1000 Blue, 54||Touring-1000||1000||54||U|
|Touring-1000 Blue, 60||Touring-1000||1000||60||U|
|Touring-2000 Blue, 46||Touring-2000||2000||46||U|
|Touring-2000 Blue, 50||Touring-2000||2000||50||U|
|Touring-2000 Blue, 54||Touring-2000||2000||54||U|
|Touring-3000 Blue, 44||Touring-3000||3000||44||U|
|Touring-3000 Blue, 50||Touring-3000||3000||50||U|
(22 row(s) affected)
When looking at the functions individually, LEFT and RIGHT are similar. The LEFT function returns the left part of the character string, and the RIGHT function returns the right part of the string, based on the specified number of characters.
For example, in the SELECT statement above, the LEFT function returns the first 12 characters of the Name column. The function takes two arguments: a character expression and the number of characters. In this case, the character expression is the Name column and the number of characters is 12. As a result, the last part of the Name value is truncated and only the first 12 characters are returned. If you look at the first row of the result set, you can see that the full name value -- Touring-2000 Blue, 60 -- is changed to Touring-2000.
The RIGHT function works in the same way, but it retrieves only the right portion of the value. In the example above, the RIGHT function returns only the last two characters of the Name column. Again, the Name column is specified as the character expression with 2 being the number of characters, which returns a two-character value.
The SUBSTRING function can extract data from anywhere in the string value. The function takes three arguments: a character expression, a start expression that indicates which character will start the extraction, and a length expression that indicates how many characters to extract. In the example above, the character expression is once again the Name column. The second argument (9) means that we should start with the ninth character, and the third argument (4) means that we should retrieve four characters. As the query results show, the Model column contains the ninth through twelfth characters for each row. For instance, in the first row, 2000 has been extracted from the Name value.
The other function in this example is RTRIM, which removes all trailing spaces from the character expression (specified as its one argument). In this case, the RTRIM function is used to remove the trailing space from the Style column. The column is configured with the NCHAR(2) data type, which means all the single-character values will also include a trailing space. In certain applications and extraction processes, trailing spaces may cause problems -- particularly if you’re trying to match values. The RTRIM function, however, eliminates that space.
Note that the RTRIM function also has its counterpart -- the LTRIM function -- which removes the leading spaces from a character expression. Although RTRIM is more commonly used, LTRIM is still useful in certain situations.
Modifying string values
T-SQL also includes a set of functions that modify string values in SQL Server. This is demonstrated in the following statement:
REPLACE(Name, 'Mountain', 'Mtn') AS ReplaceName,
STUFF(Name, 11, 2, 'x01') AS StuffName,
REVERSE(ProductNumber) AS ReverseNum,
STUFF(ProductNumber, 1, 2, REPLICATE(0, 3)) AS
ReplicateNum, ProductNumber + SPACE(4) + LOWER
(ProductLine) AS LowerProdLine
ProductID = 771
This statement uses several functions to modify the values in the Name and ProductNumber columns. The original value of the Name column is Mountain-100 Silver, 38 and the original value of the ProductNumber column is BK-M82S-38. The statement returns the following results:
|Mtn-100 Silver, 38||Mountain-1x01 Silver, 38||83-S28M-KB|
(1 row(s) affected)
The first function used in the statement, REPLACE, replaces all occurrences of a set of characters in a string value with a new set of characters. The function takes three arguments: the character expression, the set of characters to be replaced, and the replacement characters. In the above example, the REPLACE function specifies the Name column as the character expression. The characters to be replaced are Mountain with the new characters being Mtn. As a result, the new value is Mtn-100 Silver, 38.
The next function, STUFF, deletes a specified set of characters based on position, and inserts a new set of characters. The function takes four arguments: the character expression, the starting position of the characters to be deleted, the number of characters to delete, and the characters to be inserted. In the example above, the STUFF function specifies the Name column as the character expression. The characters to be deleted start at the eleventh character (the second argument) and include two characters (the third argument). After those characters are deleted, the new characters (x01) are inserted in their place, resulting in a new value of Mountain-1x01 Silver, 38.
The REVERSE function simply reverses the order of the character expression, which in this case, is ProductNumber. As a result, the new product number is returned as 83-S28M-KB.
The REPLICATE function, which repeats a string value a specified number of times, is also included. The function takes two arguments: the string value and the number of times to repeat it. In the above example, 0 is repeated three times. Notice, however, the REPLICATE function is embedded as the fourth argument in the STUFF function. As a result, the first two characters of the ProductNumber value are replaced with 000.
The SPACE function is similar to the REPLICATE function, as it simply returns a set of repeated spaces. The function takes one argument, which specifies the number of spaces to return. In the above example, four spaces are added to the concatenated value.
The LOWER function is also included in the example, and converts uppercase characters to lowercase. In the above example, the ProductLine value is converted to lowercase. T-SQL also supports the UPPER function, which converts lowercase characters to uppercase.
ABOUT THE AUTHOR
Robert Sheldon is a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. You can find more information at www.rhsheldon.com.