Unicode is one of the most broadly-accepted standards for storing data using a mixture of international character sets. It has been available in SQL Server since version SQL Server 7.0. However, if you are not careful in managing how data is sent to SQL Server, mixing Unicode's multiple encoding methods can lead to damaged or improperly stored data.

For instance, most Web applications store Unicode data in the UTF-8 format. UTF-8 stores all 7-bit ASCII characters as is and then uses special control characters to store the rest of the Unicode set. SQL Server, on the other hand, uses UCS-2 or UTF-16 Unicode format, which mimics how the 32-bit Windows kernel itself stores information. (This way data doesn't have be to be converted back and forth to another format and performance is enhanced.) If UTF-16 data in the database is retrieved and misinterpreted as UTF-8, the result is usually gibberish. If this mangled data is reinserted into the database, the data will most likely be ruined.

To avoid such problems you must do two things.

1. Set the codepage to 65001 for any pages that contain data retrieved from the server. This will automatically convert any UCS-2/UTF-16 data to UTF-8 format when the page is rendered and sent to the client.

2. Also set the codepage to 65001 for any data sent from a Web page, since the Web server will also automatically convert UTF-8 data sent to SQL Server into UCS-2.

Note that a database won't explicitly support Unicode data unless

    Requires Free Membership to View

the field types in question are also explicitly supported. If you use text instead of ntext, for instance, any Unicode entities sent to the database will instead be translated into ISO-8859-1 encoding, which should not be intermixed with true Unicode entities to avoid data damage.

About the author: Serdar Yegulalp is editor of the Windows Power Users Newsletter. Check it out for the latest advice and musings on the world of Windows network administrators -- and please share your thoughts as well!

More information from SearchSQLServer.com

  • Tip: Performance impacts of joining Unicode and non-Unicode data
  • Ask the Experts: Convert English language database into Japanese
  • Ask the Experts: NVARCHAR vs. VARCHAR

  • This was first published in August 2005

    There are Comments. Add yours.

    TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

    REGISTER or login:

    Forgot Password?
    By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
    Sort by: OldestNewest

    Forgot Password?

    No problem! Submit your e-mail address below. We'll send you an email containing your password.

    Your password has been sent to:

    Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.