Problem solve Get help with specific problems with your technologies, process and projects.

Take caution when cleansing redundant delimiters

Delimiters aid in human comprehension of data, but you'll most likely want to cleanse data of these characters for storage. Contributor Serdar Yegulalp explains how to take caution doing so: Removal of some delimiters can destroy your data integrity

When formatted data passes from an application to a database, it is typically cleansed of redundant delimiters. The delimiters never change and they are only needed for human comprehension, so you can often do away with them when storing a particular piece of data on the backend. But, in some cases, cleansing delimiters may be easier said than done.

Social security numbers take the format xxx-xx-xxxx, so 100-30-8000 would be valid. When stored in a database. However, that number would most likely convert to a varchar without the delimiters and appear as 117308190.

Phone numbers are a little more complicated. A U.S. phone number with area code can take several different formats: (212) 280-2984, 212-280-2984 and even 212.280.2984 are all recognized as valid phone numbers. If you were only storing U.S. phone numbers, you could do away with the delimiters and simply store 10-digit numbers as varchar: 2122802984.

The situation gets more complicated when you deal with international phone numbers. Consider +46-2-80520. The +46 is the country code (Sweden), the 2 is an area exchange and the rest of the digits are the actual phone number. But what about +33-02-42-22-60-11 (France)? No two countries will have similarly formatted phone numbers, so the delimiters are crucial. If you drop the delimiters, your data is trash. (At the very least, it becomes hard to determine which part of the number is the exchange number versus the actual phone number, but the underlying problem of lost data integrity still stands.)

In the first case, where delimiters are not needed for data, you can use a constraint on the column where the data is stored to ensure that it's clean. (It helps when the front-end application supplies clean data where possible, too.) In the second case, when delimiters are part of the data and not simply there for readability, they need to be preserved rigorously. You can use a constraint to make sure only a certain variety of delimiters are provided (e.g., the digits 0-9, the dash and the plus).

In this example, the CHECK constraint shown here for the column PHONE enforces that a country code delimiter appears at the beginning of the phone number -- a + sign, a two-digit country code and a dash. The rest of the phone number is freeform but, depending on your needs, that could also be policed with constraints.

CHECK ([PHONE] like '+[0-9][0-9]-%')

About the author: Serdar Yegulalp is editor of the Windows Power Users Newsletter. Check it out for the latest advice and musings on the world of Windows network administrators -- and please share your thoughts as well!


More information from SearchSQLServer.com

  • Tip: Custom user-defined string functions by Igor Nikiforov
  • Step-by-Step Guide: Ensuring data integrity in SQL Server
  • RSS: Sign up for our RSS feed to receive expert advice everyday


  • This was last published in September 2005

    Dig Deeper on Microsoft SQL Server Performance Monitoring and Tuning

    Start the conversation

    Send me notifications when other members comment.

    Please create a username to comment.

    -ADS BY GOOGLE

    SearchBusinessAnalytics

    SearchDataCenter

    SearchDataManagement

    SearchAWS

    SearchOracle

    SearchContentManagement

    SearchWindowsServer

    Close