Problem solve Get help with specific problems with your technologies, process and projects.

Preserving Unicode data integrity

Mixing Unicode's multiple encoding methods can lead to damaged or improperly stored data. Get steps for preserving your Unicode data in this tip.

Unicode is one of the most broadly-accepted standards for storing data using a mixture of international character sets. It has been available in SQL Server since version SQL Server 7.0. However, if you are not careful in managing how data is sent to SQL Server, mixing Unicode's multiple encoding methods can lead to damaged or improperly stored data.

For instance, most Web applications store Unicode data in the UTF-8 format. UTF-8 stores all 7-bit ASCII characters as is and then uses special control characters to store the rest of the Unicode set. SQL Server, on the other hand, uses UCS-2 or UTF-16 Unicode format, which mimics how the 32-bit Windows kernel itself stores information. (This way data doesn't have be to be converted back and forth to another format and performance is enhanced.) If UTF-16 data in the database is retrieved and misinterpreted as UTF-8, the result is usually gibberish. If this mangled data is reinserted into the database, the data will most likely be ruined.

To avoid such problems you must do two things.

1. Set the codepage to 65001 for any pages that contain data retrieved from the server. This will automatically convert any UCS-2/UTF-16 data to UTF-8 format when the page is rendered and sent to the client.

2. Also set the codepage to 65001 for any data sent from a Web page, since the Web server will also automatically convert UTF-8 data sent to SQL Server into UCS-2.

Note that a database won't explicitly support Unicode data unless the field types in question are also explicitly supported. If you use text instead of ntext, for instance, any Unicode entities sent to the database will instead be translated into ISO-8859-1 encoding, which should not be intermixed with true Unicode entities to avoid data damage.

About the author: Serdar Yegulalp is editor of the Windows Power Users Newsletter. Check it out for the latest advice and musings on the world of Windows network administrators -- and please share your thoughts as well!

More information from

  • Tip: Performance impacts of joining Unicode and non-Unicode data
  • Ask the Experts: Convert English language database into Japanese
  • Ask the Experts: NVARCHAR vs. VARCHAR

  • Dig Deeper on SQL Server Backup and Recovery

    Start the conversation

    Send me notifications when other members comment.

    Please create a username to comment.