Preserving Unicode data integrity

Mixing Unicode's multiple encoding methods can lead to damaged or improperly stored data. Get steps for preserving your Unicode data in this tip.

Unicode is one of the most broadly-accepted standards for storing data using a mixture of international character sets. It has been available in SQL Server since version SQL Server 7.0. However, if you are not careful in managing how data is sent to SQL Server, mixing Unicode's multiple encoding methods can lead to damaged or improperly stored data.

For instance, most Web applications store Unicode data in the UTF-8 format. UTF-8 stores all 7-bit ASCII characters as is and then uses special control characters to store the rest of the Unicode set. SQL Server, on the other hand, uses UCS-2 or UTF-16 Unicode format, which mimics how the 32-bit Windows kernel itself stores information. (This way data doesn't have be to be converted back and forth to another format and performance is enhanced.) If UTF-16 data in the database is retrieved and misinterpreted as UTF-8, the result is usually gibberish. If this mangled data is reinserted into the database, the data will most likely be ruined.

To avoid such problems you must do two things.

1. Set the codepage to 65001 for any pages that contain data retrieved from the server. This will automatically convert any UCS-2/UTF-16 data to UTF-8 format when the page is rendered and sent to the client.

2. Also set the codepage to 65001 for any data sent from a Web page, since the Web server will also automatically convert UTF-8 data sent to SQL Server into UCS-2.

Note that a database won't explicitly support Unicode data unless the field types in question are also explicitly supported. If you use text instead of ntext, for instance, any Unicode entities sent to the database will instead be translated into ISO-8859-1 encoding, which should not be intermixed with true Unicode entities to avoid data damage.

About the author: Serdar Yegulalp is editor of the Windows Power Users Newsletter. Check it out for the latest advice and musings on the world of Windows network administrators -- and please share your thoughts as well!


More information from SearchSQLServer.com

  • Tip: Performance impacts of joining Unicode and non-Unicode data
  • Ask the Experts: Convert English language database into Japanese
  • Ask the Experts: NVARCHAR vs. VARCHAR


  • This was first published in August 2005

    Dig deeper on SQL Server Backup and Recovery

    Pro+

    Features

    Enjoy the benefits of Pro+ membership, learn more and join.

    0 comments

    Oldest 

    Forgot Password?

    No problem! Submit your e-mail address below. We'll send you an email containing your password.

    Your password has been sent to:

    -ADS BY GOOGLE

    SearchBusinessAnalytics

    SearchDataCenter

    SearchDataManagement

    SearchAWS

    SearchOracle

    SearchContentManagement

    SearchWindowsServer

    Close