Problem solve Get help with specific problems with your technologies, process and projects.

Replacing non-alphanumeric characters in strings using T-SQL

Learn two important methods for replacing non-alphanumeric characters in strings using T-SQL with Microsoft SQL Server.

As a follow-up to my previous two articles (Complex grouping of strings in T-SQL and Working with delimited strings...

in T-SQL), here are two important methods for replacing non-alphanumeric characters in strings.

Method #1: Using a table of ASCII code values

This method removes all characters other than [a to z] , [A to Z] and [0 to 9]. In other words, it removes all symbols ("#?&/() and so on. You can do this by creating a table with all bad ASCII code values and cycling through a string, replacing the bad codes with an empty string.

--Create table with all bad codes
DECLARE @mycode INT
CREATE TABLE #badcodes(badcode INT)
SET @mycode=33
WHILE @mycode<=255
BEGIN
 IF (@mycode BETWEEN 33 AND 47) OR (@mycode BETWEEN 58 AND 64)
 OR (@mycode BETWEEN 58 AND 64) OR (@mycode BETWEEN 91 AND 96)
 OR (@mycode BETWEEN 123 AND 255)
 INSERT INTO #BADCODES VALUES(@mycode)
 SET @mycode=@mycode+1
END
GO
 
--strip out the characters
DECLARE @test VARCHAR(100)
SET @test='se+*ar%c&h^da#tab~se'
 
UPDATE #badcodes
SET @test=REPLACE(@test,CHAR(badcode),'')
 
PRINT @test

Method #2: Using specified patterns

For more info, check out SQL Server books online for help on "PATINDEX" and "STUFF".

DECLARE @pos INT
DECLARE @myString VARCHAR(20)
SET @myString='se+*ar%c&h^da#tab~se'
SET @pos = PATINDEX('%[^a-z0-9_]%',@myString)
-- sets @pos to the position in @myString of the first character
-- not (^) equal to a letter (a-z), a digit (0-9) or an underscore
-- or sets @pos to 0 if no such character exists in @myString (in which
-- case you are done)
 
WHILE @pos > 0 
BEGIN 
  SET @myString = STUFF(@myString,@pos,1,'')
  -- overwrites the non-alphanumeric character at position @pos with ''
  SET @pos = PATINDEX('%[^a-z0-9_]%',@myString)
  -- looks for the next non-alphanumeric character
END
 
SELECT @myString

An example: how to replace non-alphanumeric characters from a column

Let's create a test table first:

--////////////////////////////////
Create table tblTest(col1 varchar(20),col2 varchar(20))
insert into tblTest values('one$','00#7')
insert into tblTest values('tw*o','%420')
insert into tblTest values('thre@e','11&1')
 
--select * from tblTest
 
col1                 col2                 
-------------------- -------------------- 
one$                 00#7
tw*o                 %420
thre@e               11&1
 
--////////////////////////////////

Add one new column and insert the values by appending existing columns.

 
--////////////////////////////////
ALTER TABLE tblTest ADD Newcol varchar(40)
update tblTest set Newcol=col1+col2
--select * from tblTest
 
col1                 col2                 Newcol                                   
-------------------- -------------------- ---------------
one$                 00#7                 one$00#7
tw*o                 %420                 tw*o%420
thre@e               11&1                 thre@e11&1
 
--////////////////////////////////
 

Here's the solution. You can use one of the above two methods to remove non-alphanumeric characters from a table column. I have choosen Mehtod #2 to solve this example problem. You have to add row numbers to the table to loop through the table and update.

 
-- Add row number to each row
SELECT IDENTITY(int,1,1) AS ID, *
 INTO #tblAlpha
 FROM tblTest
 
--select * from #tblAlpha
--update new column 
DECLARE @MaxRank  int 
DECLARE @Counter  int 
declare @pos int
Declare @s varchar(100)
 
Set nocount on
SELECT @MaxRank = Max(ID) From #tblAlpha 
SET @Counter = 1
WHILE @Counter <= @MaxRank 
BEGIN 
 select @s=newcol from #tblAlpha where id=@Counter
 set @pos = patindex('%[^a-z0-9_]%',@s)
 while @pos > 0 
 begin
   set @s = stuff(@s,@pos,1,'')
   set @pos = patindex('%[^a-z0-9_]%',@s)
 end
 update #tblAlpha set newcol=@s where id=@Counter
 
SET @Counter = @Counter +1 
END 
 
truncate table tbltest
insert into tbltest select col1,col2,newcol from #tblAlpha
drop table #tblAlpha
 
select * from tbltest

Reader Feedback

Israel P. writes: This took over 24 hrs to run on my machine so it doesn't work for my purposes. I found this code much more effective ( it took only 35 mins to run):

while @@rowcount > 0
        update  user_list_original
        set     fname = replace(fname, substring(fname, patindex('%[^a-zA-Z ]%', fname), 1), '')
        where   patindex('%[^a-zA-Z ]%', fname) <> 0

Is there anything missing by using this? I have a table with millions of records that I need to 'clean'. Any pointers for cleaning data (emails, addresses, etc.)?

For More Information

  • What do you think about this tip? E-mail the Editor at tdichiara@techtarget.com with your feedback.
  • The Best SQL Server Web Links: tips, tutorials, scripts, and more.
  • Have an SQL Server tip to offer your fellow DBA's and developers? The best tips submitted will receive a cool prize--submit your tip today!
  • Ask your technical SQL Server questions--or help out your peers by answering them--in our live discussion forums.
  • Check out our Ask the Experts feature: Our SQL, Database Design, Oracle, SQL Server, DB2, metadata, and data warehousing gurus are waiting to answer your toughest questions.

This was last published in June 2002

Dig Deeper on SQL-Transact SQL (T-SQL)

Start the conversation

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchDataCenter

SearchDataManagement

SearchAWS

SearchOracle

SearchContentManagement

SearchWindowsServer

Close