Character string data types

There are four different character data types that store character strings: char, varchar, varchar(max), and text. There are also four character data types for storing unicode data: nchar, nvarchar, nvarchar(max) and ntext. This book excerpt on T-SQL constructs explains these data types.

Data Types 

Data types specify the type of information (such as number, string, picture, date) that can be stored in a column or a variable. SQL Server recognizes 28 system-defined data types. Apart from these data types, you can create user-defined data types in T-SQL and in .NET to fulfill specific needs. The following are the categories of system-defined data types:

  • Character strings
  • Unicode character strings
     


    Figure 3-1 Grouping objects using a schema in the Object Browser.

  • Date and time
  • Approximate numeric
  • Exact numeric
  • Integer numbers
  • Monetary
  • Binary
  • Special

NOTE
In some cases, you can use different identifiers to refer to a data type in T-SQL code. For example, the char data type can be referenced as character, and varchar can be referenced as character varying. Some of these synonyms are based on ANSI SQL-92 standard requirements.

Figure 3-1 Grouping objects using a schema in the Object Browser

Character Strings
Character data types store character strings. The four different character types vary in length and storage characteristics:

  • char
  • varchar
  • varchar(max)
  • text

The char data type is used to store strings of fixed size. As noted earlier, the maximum size of this data type is 8,000 characters, which is a significant increase over the 255-character limit in early versions. When a variable or a table column is assigned with a string that is shorter than its nominal size, it is padded with trailing spaces to fill the specified field length.
The varchar data type stores strings of variable size up to 8,000 characters long. When a character value whose length is less than the nominal size is assigned to the column or variable, SQL Server does not add trailing spaces to it, but records it as is. varchar data types occupy two additional bytes in order to record the length of the string.

NOTE
Maintenance of this information requires some additional computation during I/O operation, but that time is usually countered by savings in the space required. A record using such columns occupies less space, and more records fit into a single page. Therefore, SQL Server reads more records when accessing data, and it is more likely that a single page contains the information that the user is looking for.

The text data type is used to store huge amounts of data. One field can store up to 2GB (231 – 1 bytes) of information. Only a 16-byte pointer to this data is stored in the table. Therefore, additional processing overhead is involved with the use of text columns. There are special functions for processing text values.

NOTE
You should avoid using this data type because Microsoft plans to remove it in future versions of SQL Server. You should use the new varchar(max) data type instead.

Varchar(max) looks and operates like the varchar data type, but it is by internal structure and functionality actually much more like the text data type.
It is also designed to store large strings (up to 2GB). The string is stored in special data pages, not in the data page with the rest of the record. Its biggest advantage is that most string operating functions that work with varchar also work with varchar(max); however, there are a few new functions for additional functionality. The following command creates a table with four fields using different character string data types:

Create table Contacts(ContactId char(8),
                      Name varchar(50),
                      Note text,
                      Resume varchar(max))

Character constants are delimited from the rest of the Transact-SQL code with quotes. For example, the following statement inserts contact information:

insert into Contacts (ContactId, Name, Note, Resume)
values ('CO-92-81', 'Tom Jones', 'Tom@trigon.com', 'N/a')

Unicode Character Strings
Microsoft SQL Server 2005 has four character data types for storing Unicode data—using non-ASCII character sets. They are equivalent to the char, varchar, varchar(max), and text data types and are called

  • nchar
  • nvarchar
  • nvarchar(max)
  • ntext

The main difference between these new data types and the older character data types is that the new data types can hold Unicode characters, which occupy 2 bytes per character. Therefore, the maximum string length that they can store is half that of the corresponding older data types (4,000 for nchar and nvarchar).
The following statement creates the same table as the previous example but uses Unicode data types:

Create table Contacts_2(ContactId nchar(8),
                        Name nvarchar(50),
                        Note ntext,
                        Resume nvarchar(max))
go

Unicode character constants are also delimited with quotes but are prefixed with N':

insert into Contacts_2 (ContactId, Name, Note, Resume)
values (N'CO-92-81', N'Tom Jones', N'Tom@trigonblue.com', N'N/a')

This N' prefix might look a little odd, but you will get used to it. Microsoft documentation is full of samples with Unicode constants. It was some time before I discovered the reason Microsoft uses N' as a prefix. It stands for "National." In fact, acceptable alternative identifiers for these data types are

  • National char
  • National char varying
  • National char varying(max)
  • National text

TIP
Typically, it is not a problem if you omit the N' prefix on constants. SQL Server automatically converts the string to its Unicode equivalent. Naturally, it is better to insert it whenever you are dealing with Unicode columns or variables, but it is not a big problem. The CPU will just have to perform a couple of extra cycles to make the conversion.
However, there are cases in which it becomes a problem. When your string constant is part of a query criterion, then the presence of the N' prefix might significantly affect execution of the query. If the column is defined as a non-Unicode string and the criterion is specified with the N' prefix, SQL Server converts every row of the table to compare it with the Unicode constant. As a result, the query performs a table scan instead of using an index.


Basic Transact-SQL Programming Constructs


The previous tip is from "15 tips in 15 minutes: Basic Transact-SQL Programming Constructs," excerpted from Chapter 3, of the book "Microsoft SQL Server 2005 Stored Procedure Programming in T-SQL & .NET" by Dejan Sunderic, courtesy of McGraw-Hill Publishing.

This was first published in February 2007

Dig deeper on SQL-Transact SQL (T-SQL)

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchBusinessAnalytics

SearchDataCenter

SearchDataManagement

SearchAWS

SearchOracle

SearchContentManagement

SearchWindowsServer

Close