Using MAX data types in SQL Server

Using MAX data types in SQL Server 2005 adds flexibility when adding large amounts of data.

A set of data types in SQL Server 2005 called MAX types can be a big help for companies looking to handle large

amounts of data. MAX types are extensions to the VARCHAR, NVARCHAR and VARBINARY types, which previously could only be sized up to 8000 bytes. The MAX versions can accommodate up to 2 GB of data, just like TEXT and IMAGE -- and are fully compatible with all intrinsic SQL Server string functions. MAX types would have come in handy as MegaWare's DBA team geared up to help the the marketing department launch a new Web site. The Web site was needed to post documents on, and the development team thought that using the SQL Server 2000 database as the document repository would simplify things. Steve, MegaWare's DBA, didn't see any major issues with this; storing the documents in the database instead of the file system would mean a bit more work for the server, but it would also make backup and management a lot easier. And it would be impossible for the database to become out of sync with the file system.

Many of the documents that the marketing department wanted to store were greater than 8000 bytes, so VARCHAR was clearly not the right data type for the job. Instead, the TEXT data type was used to define the column that would hold the data. With its 2 GB-per-instance capacity, TEXT had no problems holding even the largest documents the marketing guys could throw at the database.

Months passed and the marketing team filled the repository with lots of dull copy. But that didn't really concern Steve. The database was humming along happily, and everyone was pleased with the results of the project.

Until, that is, the fateful day when the company's slogan changed. The marketing team decided that MegaWare: It's really cool! sounded better than the old It's MegaWare's Way or the Highway! Since the marketing team had embedded the old slogan in the footer of every single document in the repository, it was now Steve's job to change all of those documents.

"No problem," thought Steve, opening up the SQL Server Query Analyzer tool and executing the following T-SQL batch:

 UPDATE MarketingDocuments SET Document = REPLACE(Document,
'It''s MegaWare''s Way or the Highway!', 'MegaWare: It''s
really cool!)

Steve's relaxed grin quickly disappeared when he saw the resultant error message: "Argument data type text is invalid for argument 1 of replace function."

REPLACE, as it turns out, doesn't work with the TEXT data type. Neither do CHARINDEX or SUBSTRING -- or at least they don't work beyond the eight-thousandth character. Furthermore, developers can forget about manipulating local variables of the TEXT or IMAGE types; virtually no operations are supported. Even simply updating a substring within a document requires the use of somewhat obscure and difficult-to-use functions like READTEXT and WRITETEXT. Not the kinds of functions that a developer or busy DBA wants to have to figure out how to properly use come crunch time.

Luckily for SQL Server developers, the dark clouds are breaking up and blue skies are in the forecast. SQL Server 2005 introduces a new series of data types known as the MAX types. These are extensions to the VARCHAR, NVARCHAR and VARBINARY types, which previously could only be sized up to 8000 bytes. The MAX versions can accommodate up to 2 GB of data, just like TEXT and IMAGE -- and are fully compatible with all intrinsic SQL Server string functions.

Defining a variable of one of the MAX types is as simple as replacing the size in characters (in the case of VARCHAR/NVARCHAR) or bytes (in the case of VARBINARY) with the MAX keyword:

 DECLARE @BigString VARCHAR(MAX) SET @BigString = 'abc'

Although this variable can be freely manipulated and passed to any of the intrinsic string functions, the compatibility is not without caveats. First of all, developers should not expect sized VARCHAR and VARBINARY variables to automatically "upgrade" to MAX versions when they reach the 8000-byte limit. For instance, take the following batch:

 DECLARE @String1 VARCHAR(4001) DECLARE @String2 VARCHAR(4001)
SET @String1 = REPLICATE('1', 4001) SET @String2 = REPLICATE('2', 4001)
SELECT LEN(@String1 + @String2)

4001 + 4001 = 8002, but the limit for sized VARCHARs is 8000. Since neither of these variables is of a MAX type, the result of the LEN function is 8000, not 8002. Simple fixes for this behavior are to declare one or both of the variables as VARCHAR(MAX) or else convert one or both of the variables when they are concatenated. The MAX type will take precedence if concatenated with a sized type, producing a MAX type as a result. So, the result of the following batch is 8002, as expected:

 DECLARE @String1 VARCHAR(4001) DECLARE @String2 VARCHAR(4001)
SET @String1 = REPLICATE('1', 4001) SET @String2 = REPLICATE('2', 4001)
SELECT LEN(CONVERT(VARCHAR(MAX), @String1) + @String2)

It's also important that developers realize that string literals will default to sized rather than MAX types when passed into string functions. The result of the following query, for instance, is surprising:

 SELECT LEN(REPLICATE('1', 8002))

Because the string literal, '1', is treated as a sized VARCHAR instead of a VARCHAR(MAX), the result is 8000 -- but in SQL Server 2005, the REPLICATE function is capable of producing strings as large as 2 GB. To fix this issue, convert the string literal into a VARCHAR(MAX) so that the function will output the same type:

 SELECT LEN(REPLICATE(CONVERT(VARCHAR(MAX), '1'), 8002))

This query now returns the expected result: 8002. Remember to always test very carefully any production code using new features; hidden problems such as the one described above can and doubtless will wreak havoc at the worst possible time.

In addition to variables, the MAX types can be used to define columns of tables:

 CREATE TABLE BigStrings ( BigString VARCHAR(MAX) )

When used for tables, it's important to realize that the MAX types have a slightly different row overflow behavior than the TEXT and IMAGE types. In SQL Server, the maximum row size is 8060 bytes. To get around this restriction and still manage storage of up to 2 GB per instance, data stored using the TEXT and IMAGE types is automatically placed off-row by the storage engine, leaving only a 16-byte pointer in the row. That means row sizes are decreased, which is good for performance. Retrieving the large data is expensive, however, since it is not stored in the same place as the on-row data.

The MAX data types, by default, use a hybrid of the TEXT/IMAGE overflow behavior and the behavior of the normal (sized) VARCHAR/VARBINARY types. If a column's data, plus the data in all of the other columns in the table, has a total size of less than 8060 bytes, the data is stored in-row. If the data exceeds 8060 bytes, the data in the MAX column will be stored off-row. For the BigStrings table, the following row will be stored in the same data pages as other data from the table:

 INSERT BigStrings (BigString) VALUES (REPLICATE('1', 8000))

But the following row will result in an overflow:

 INSERT BigStrings (BigString) VALUES (REPLICATE(CONVERT(VARCHAR
(MAX), '1'), 100000))

You can change the default behavior of the MAX data types on a per-table basis and they'll behave just like the TEXT and IMAGE types. This is accomplished using the "large value types out of row" option of the sp_tableoption stored procedure. To modify the BigStrings table to treat MAX types the same as the TEXT and IMAGE datatypes, use the following T-SQL:

 EXEC sp_tableoption 'BigStrings', 'large value types out of row', '1'

Given how easy it is to define MAX data types, as well as the flexibility they afford, some data architects may be tempted to begin defining tables like the following:

 CREATE TABLE Addresses ( Name VARCHAR(MAX), AddressLine1 VARCHAR(MAX),
AddressLine2 VARCHAR(MAX), City VARCHAR(MAX), State VARCHAR(MAX),
PostalCode VARCHAR(MAX) )

Note to these architects: Don't do that! The data model in an organization should both constrain the data to realistic limits and give user interface designers rough guidelines for field sizes. What kind of UI would be created against a table like that one?

In addition to data integrity and UI implications, there could be performance penalties if architects were to needlessly use these types. Remember that the query optimizer uses column size as one of the many metrics for determining optimal query plans. Given this table, the optimizer would have very few options in that regard.

So, you now know that the MAX data types add a great element of flexibility for handling large data in SQL Server 2005. But what became of Steve, MegaWare's unfortunate DBA? Stuck with SQL Server 2000, he began updating his resume, assuming that his failure to ably update the table would cost him his job. But luckily for him -- and fans of MegaWare's products everywhere -- a Google search quickly located the article Search and replace in a TEXT column, that showed him how to properly do the update. He spends his nights sleeplessly counting the minutes; in just a few months, the TEXT and IMAGE data types will be nothing more than an unpleasant memory.

Adam Machanic is a senior database engineer for GetConnected Inc. He has several years of experience developing a variety of applications using SQL Server as a data repository and is active on numerous online technical forums. He is a Microsoft Certified Professional and a SQL Server MVP. Machanic, who also serves as our SQL Server 2005 expert, welcomes your questions.

 


Do you have comments on this tip? Let us know.

This was first published in June 2005

Dig deeper on Microsoft SQL Server 2005

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchBusinessAnalytics

SearchDataCenter

SearchDataManagement

SearchAWS

SearchOracle

SearchContentManagement

SearchWindowsServer

Close