Problem solve Get help with specific problems with your technologies, process and projects.

Replacing non-alphanumeric characters in strings using T-SQL

Learn two important methods for replacing non-alphanumeric characters in strings using T-SQL with Microsoft SQL Server.

As a follow-up to my previous two articles (Complex grouping of strings in T-SQL and Working with delimited strings...

in T-SQL), here are two important methods for replacing non-alphanumeric characters in strings.

Method #1: Using a table of ASCII code values

This method removes all characters other than [a to z] , [A to Z] and [0 to 9]. In other words, it removes all symbols ("#?&/() and so on. You can do this by creating a table with all bad ASCII code values and cycling through a string, replacing the bad codes with an empty string.

--Create table with all bad codes
DECLARE @mycode INT
CREATE TABLE #badcodes(badcode INT)
SET @mycode=33
WHILE @mycode<=255
BEGIN
 IF (@mycode BETWEEN 33 AND 47) OR (@mycode BETWEEN 58 AND 64)
 OR (@mycode BETWEEN 58 AND 64) OR (@mycode BETWEEN 91 AND 96)
 OR (@mycode BETWEEN 123 AND 255)
 INSERT INTO #BADCODES VALUES(@mycode)
 SET @mycode=@mycode+1
END
GO
 
--strip out the characters
DECLARE @test VARCHAR(100)
SET @test='se+*ar%c&h^da#tab~se'
 
UPDATE #badcodes
SET @test=REPLACE(@test,CHAR(badcode),'')
 
PRINT @test

Method #2: Using specified patterns

For more info, check out SQL Server books online for help on "PATINDEX" and "STUFF".

DECLARE @pos INT
DECLARE @myString VARCHAR(20)
SET @myString='se+*ar%c&h^da#tab~se'
SET @pos = PATINDEX('%[^a-z0-9_]%',@myString)
-- sets @pos to the position in @myString of the first character
-- not (^) equal to a letter (a-z), a digit (0-9) or an underscore
-- or sets @pos to 0 if no such character exists in @myString (in which
-- case you are done)
 
WHILE @pos > 0 
BEGIN 
  SET @myString = STUFF(@myString,@pos,1,'')
  -- overwrites the non-alphanumeric character at position @pos with ''
  SET @pos = PATINDEX('%[^a-z0-9_]%',@myString)
  -- looks for the next non-alphanumeric character
END
 
SELECT @myString

An example: how to replace non-alphanumeric characters from a column

Let's create a test table first:

--////////////////////////////////
Create table tblTest(col1 varchar(20),col2 varchar(20))
insert into tblTest values('one$','00#7')
insert into tblTest values('tw*o','%420')
insert into tblTest values('thre@e','11&1')
 
--select * from tblTest
 
col1                 col2                 
-------------------- -------------------- 
one$                 00#7
tw*o                 %420
thre@e               11&1
 
--////////////////////////////////

Add one new column and insert the values by appending existing columns.

 
--////////////////////////////////
ALTER TABLE tblTest ADD Newcol varchar(40)
update tblTest set Newcol=col1+col2
--select * from tblTest
 
col1                 col2                 Newcol                                   
-------------------- -------------------- ---------------
one$                 00#7                 one$00#7
tw*o                 %420                 tw*o%420
thre@e               11&1                 thre@e11&1
 
--////////////////////////////////
 

Here's the solution. You can use one of the above two methods to remove non-alphanumeric characters from a table column. I have choosen Mehtod #2 to solve this example problem. You have to add row numbers to the table to loop through the table and update.

 
-- Add row number to each row
SELECT IDENTITY(int,1,1) AS ID, *
 INTO #tblAlpha
 FROM tblTest
 
--select * from #tblAlpha
--update new column 
DECLARE @MaxRank  int 
DECLARE @Counter  int 
declare @pos int
Declare @s varchar(100)
 
Set nocount on
SELECT @MaxRank = Max(ID) From #tblAlpha 
SET @Counter = 1
WHILE @Counter <= @MaxRank 
BEGIN 
 select @s=newcol from #tblAlpha where id=@Counter
 set @pos = patindex('%[^a-z0-9_]%',@s)
 while @pos > 0 
 begin
   set @s = stuff(@s,@pos,1,'')
   set @pos = patindex('%[^a-z0-9_]%',@s)
 end
 update #tblAlpha set newcol=@s where id=@Counter
 
SET @Counter = @Counter +1 
END 
 
truncate table tbltest
insert into tbltest select col1,col2,newcol from #tblAlpha
drop table #tblAlpha
 
select * from tbltest

Reader Feedback

Israel P. writes: This took over 24 hrs to run on my machine so it doesn't work for my purposes. I found this code much more effective ( it took only 35 mins to run):

while @@rowcount > 0
        update  user_list_original
        set     fname = replace(fname, substring(fname, patindex('%[^a-zA-Z ]%', fname), 1), '')
        where   patindex('%[^a-zA-Z ]%', fname) <> 0

Is there anything missing by using this? I have a table with millions of records that I need to 'clean'. Any pointers for cleaning data (emails, addresses, etc.)?

For More Information

  • What do you think about this tip? E-mail the Editor at tdichiara@techtarget.com with your feedback.
  • The Best SQL Server Web Links: tips, tutorials, scripts, and more.
  • Have an SQL Server tip to offer your fellow DBA's and developers? The best tips submitted will receive a cool prize--submit your tip today!
  • Ask your technical SQL Server questions--or help out your peers by answering them--in our live discussion forums.
  • Check out our Ask the Experts feature: Our SQL, Database Design, Oracle, SQL Server, DB2, metadata, and data warehousing gurus are waiting to answer your toughest questions.

Dig Deeper on SQL-Transact SQL (T-SQL)

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchDataCenter

SearchDataManagement

SearchAWS

SearchOracle

SearchContentManagement

SearchWindowsServer

Close