As a follow-up to my previous two articles (Complex grouping of strings in T-SQL and Working with delimited strings in T-SQL), here are two important methods for replacing non-alphanumeric characters in strings.
Method #1: Using a table of ASCII code values
This method removes all characters other than [a to z] ,
[A to Z] and [0 to 9]. In other words, it removes all symbols ("#?&/() and so on. You can do this by creating a table with all bad ASCII
code values and cycling through a string, replacing the bad
codes with an empty string.
--Create table with all bad codes
DECLARE @mycode INT
CREATE TABLE #badcodes(badcode INT)
SET @mycode=33
WHILE @mycode<=255
BEGIN
IF (@mycode BETWEEN 33 AND 47) OR (@mycode BETWEEN 58 AND 64)
OR (@mycode BETWEEN 58 AND 64) OR (@mycode BETWEEN 91 AND 96)
OR (@mycode BETWEEN 123 AND 255)
INSERT INTO #BADCODES VALUES(@mycode)
SET @mycode=@mycode+1
END
GO
--strip out the characters
DECLARE @test VARCHAR(100)
SET @test='se+*ar%c&h^da#tab~se'
UPDATE #badcodes
SET @test=REPLACE(@test,CHAR(badcode),'')
PRINT @test
Method #2: Using specified patterns
For more info, check out SQL Server books online for help on "PATINDEX"
and "STUFF".
DECLARE @pos INT
DECLARE @myString VARCHAR(20)
SET @myString='se+*ar%c&h^da#tab~se'
SET @pos = PATINDEX('%[^a-z0-9_]%',@myString)
-- sets @pos to the position in @myString of the first character
-- not (^) equal to a letter (a-z), a digit (0-9) or an underscore
-- or sets @pos to 0 if no such character exists in @myString (in which
-- case you are done)
WHILE @pos > 0
BEGIN
SET @myString = STUFF(@myString,@pos,1,'')
-- overwrites the non-alphanumeric character at position @pos with ''
SET @pos = PATINDEX('%[^a-z0-9_]%',@myString)
-- looks for the next non-alphanumeric character
END
SELECT @myString
An example: how to replace non-alphanumeric characters from a column
Let's create a test table first:
--////////////////////////////////
Create table tblTest(col1 varchar(20),col2 varchar(20))
insert into tblTest values('one$','00#7')
insert into tblTest values('tw*o','%420')
insert into tblTest values('thre@e','11&1')
--select * from tblTest
col1 col2
-------------------- --------------------
one$ 00#7
tw*o %420
thre@e 11&1
--////////////////////////////////
Add one new column and insert the values
by appending existing columns.
--////////////////////////////////
ALTER TABLE tblTest ADD Newcol varchar(40)
update tblTest set Newcol=col1+col2
--select * from tblTest
col1 col2 Newcol
-------------------- -------------------- ---------------
one$ 00#7 one$00#7
tw*o %420 tw*o%420
thre@e 11&1 thre@e11&1
--////////////////////////////////
Here's the solution. You can use one of the above two methods to remove
non-alphanumeric characters from a table column. I have choosen Mehtod #2 to solve this example problem. You have to add row numbers to the table to loop
through the table and update.
-- Add row number to each row
SELECT IDENTITY(int,1,1) AS ID, *
INTO #tblAlpha
FROM tblTest
--select * from #tblAlpha
--update new column
DECLARE @MaxRank int
DECLARE @Counter int
declare @pos int
Declare @s varchar(100)
Set nocount on
SELECT @MaxRank = Max(ID) From #tblAlpha
SET @Counter = 1
WHILE @Counter <= @MaxRank
BEGIN
select @s=newcol from #tblAlpha where id=@Counter
set @pos = patindex('%[^a-z0-9_]%',@s)
while @pos > 0
begin
set @s = stuff(@s,@pos,1,'')
set @pos = patindex('%[^a-z0-9_]%',@s)
end
update #tblAlpha set newcol=@s where id=@Counter
SET @Counter = @Counter +1
END
truncate table tbltest
insert into tbltest select col1,col2,newcol from #tblAlpha
drop table #tblAlpha
select * from tbltest
Reader Feedback
Israel P. writes: This took over 24 hrs to run on my machine so it doesn't work for my purposes. I found this code much more effective ( it took only 35 mins to run):
while @@rowcount > 0
update user_list_original
set fname = replace(fname, substring(fname, patindex('%[^a-zA-Z ]%', fname), 1), '')
where patindex('%[^a-zA-Z ]%', fname) <> 0
Is there anything missing by using this? I have a table with millions of records that I need to 'clean'. Any pointers for cleaning data (emails, addresses, etc.)?
For More Information
- What do you think about this tip? E-mail the Editor at tdichiara@techtarget.com with your feedback.
- The Best SQL Server Web Links: tips, tutorials, scripts, and more.
- Have an SQL Server tip to offer your fellow DBA's and developers? The best tips submitted will receive a cool prize--submit your tip today!
- Ask your technical SQL Server questions--or help out your peers by answering them--in our live discussion forums.
- Check out our Ask the Experts feature: Our SQL, Database Design, Oracle, SQL Server, DB2, metadata, and data warehousing gurus are waiting to answer your toughest questions.