The broad range of data types in SQL Server can sometimes throw people through a loop, especially when the data types seem to be highly interchangeable. Two in particular that constantly spark questions are VARCHAR and NVARCHAR: what's the difference between the two, and how important is the difference?
VARCHAR is an abbreviation for variable-length character string. It's a string of text characters that can be as large as the page size for the database table holding the column in question. The size for a table page is 8,196 bytes, and no one row in a table can be more than 8,060 characters. This in turn limits the maximum size of a VARCHAR to 8,000 bytes.
The "N" in NVARCHAR means uNicode. Essentially, NVARCHAR is nothing more than a VARCHAR that supports two-byte characters. The most common use for this sort of thing is to store character data that is a mixture of English and non-English symbols — in my case, English and Japanese.
The key difference between the two data types is how they're stored. VARCHAR is stored as regular 8-bit data. But NVARCHAR strings are stored in the database as UTF-16 — 16 bits or two bytes per character, all the time — and converted to whatever codepage is being used by the database connection on output (typically UTF-8). That said, NVARCHAR strings have the same length restrictions as their VARCHAR cousins — 8,000 bytes. However, since NVARCHARs use two bytes for each character, that means a given NVARCHAR can only hold 4,000 characters (not bytes) maximum. So, the amount of storage needed for NVARCHAR entities is going to be twice whatever you'd allocate for a plain old VARCHAR.
More on SQL Server management
Learn about six essential SQL Server management tools
Get an overview of SQL Server auditing tools
Check out SQL Server in-memory database features
Because of this, some people may not want to use NVARCHAR universally, and may want to fall back on VARCHAR — which takes up less space per row — whenever possible.
Here's an example of how to mix and match the use of the two types. Let's say we have a community website where people log in with a username, but can also set a public "friendly" name to be more easily identified by other users. The login name can be a VARCHAR, which means it must be 8-bit ASCII (and it can be constrained further to conventional alphanumerics with a little more work, typically on the front end). The friendly name can be an NVARCHAR to allow Unicode entities. This way you're allowing support for Unicode, but only in the place where it matters most — both for the users, and where the extra storage space is going to be put to the best possible use.
This was first published in July 2007