Tip

XML data type in SQL Server 2005 vs. VARCHAR (MAX)

As a database administrator, I tend to look closely at performance issues and how to make sure the use of XML does not affect SQL Server performance. In this tip, I'll take you through an example using two tables, one inserted and queried with XML and the other with the VARCHAR (MAX) data type. Take a look at the storage, CPU and I/O results and make the best choice for your SQL Server environment.

Note: The tests used here use only base tables, with no indexes.

If you're interested in a comparison of performance implications using

    Requires Free Membership to View

T-SQL commands and XML AUTO in SQL Server, check out my previous tip.

The XML data type
The XML data type logically differs from the VARCHAR (MAX) in the checks done by SQL Server to make sure the content is indeed valid XML.

Testing environment description For my testing, I will use XML I copied from an *.rdl file (Reporting Services Report), which is 265KB long. I will create two tables with the same structure, except one table uses XML data type and the other uses VARCHAR (MAX) data type for the MyXML field:

create table TryVACRCHARDatatype(
id int identity not null,
MyXML VARCHAR(MAX) null
)

go

create table TryXMLDatatype (
id int identity not null,
MyXML XML null
)
Go

I will insert the XML data the same way to each table:

set statistics io on
Go
declare @XML XML

-- My big XML (for space reasons I am not including all of it here) SET @XML = '<?xml version="1.0" encoding="utf-8"?>
………
'

insert into TryXMLDatatype (MyXML) values (@XML)
go

declare @Varch VARCHAR(MAX)

-- My big XML (for space reasons I am not including all of it here)
SET @Varch = '<?xml version="1.0" encoding="utf-8"?>
………
'

insert into TryXMLDatatype (MyXML) values (@Varch)
Go

Insert XML data into the tables
I monitored the above insert commands in SQL Profiler and with statistics I/O, running them twice.

Statistics I/O results:
Table 'TryXMLDatatype'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 18, lob physical reads 0, lob read-ahead reads 0.

Table 'TryVACRCHARDatatype'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 90, lob physical reads 0, lob read-ahead reads 0.

Execution plans show the exact same plan for both commands.

Profiler results (two executions of each insert):

Note: The XML insertion takes more CPU but less reads and writes. Also, inserting the XML data type has more duration. I will analyze this behavior in the next section.

Querying the tables
I ran SELECT * FROM in each table and monitored with Profiler and Statistics I/O:

SELECT * FROM TryXMLDatatype
Go
SELECT * FROM TryVACRCHARDatatype
Go

Results:

Note: The values in the fields are different!

Statistics I/O results:

Table 'TryXMLDatatype'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 56, lob physical reads 0, lob read-ahead reads 18. Table 'TryVACRCHARDatatype'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 200, lob physical reads 0, lob read-ahead reads 0.

Less I/O is issued with the XML data type.

Again, execution plans show the exact same plan for both commands.

Why is there such a difference between the reads of the two tables?

Note: The duration of extracting the XML is higher. The duration is subjected to many different factors depending on the activity on the machine. I'll ignore this difference, mainly because no CPU is shown. If CPU activity was involved, it might explain the difference in duration, but this is not the case.

Let's query the length of the columns in each table:

SELECT datalength(MyXML) FROM TryXMLDatatype
Go
SELECT datalength(MyXML) FROM TryVACRCHARDatatype
Go

Surprisingly, the XML field has far fewer characters than the other. Here is the explanation:

XML -- less I/O:

When inserting the XML data type, the "extra" data such as the "                and tabs are removed from the field too. The result is a much more economical storage.

I tried to copy the value from the XML column to the VARCHAR(MAX) column as follows:

truncate table TryVARCHARDatatype
Go
insert into TryVARCHARDatatype (MyXML)
select convert(varchar(max),MyXML) from TryXMLDatatype
Go

The result:

Looks the same, but…

SELECT datalength(MyXML) FROM TryXMLDatatype
Go
SELECT datalength(MyXML) FROM TryVACRCHARDatatype
Go

This is better than before (136,268 instead of 271,210), but still not the same. If I convert the XML to a VARCHAR(MAX) and calculate the length:

SELECT
datalength(convert(varchar(max),MyXML)) FROM TryXMLDatatype
go
SELECT datalength(MyXML)FROM TryVARCHARDatatype
Go

Same length.
This means that the XML datatype is stored in a more efficient way. And now that both tables contain the same data, let's view the Profiler trace of the select * from both tables (three executions):

There are about 20% fewer reads for the XML datatype.

More on improving SQL Server performance:

The statistics I/O:

Table 'TryXMLDatatype'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 56, lob physical reads 0, lob read-ahead reads 18.

Table 'TryVARCHARDatatype'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 100, lob physical reads 0, lob read-ahead reads 0.

Again, it shows a more efficient read operation with the XML datatype.

What about querying the XML converted to VARCHAR(MAX):

SELECT convert(varchar(max),MyXML) FROM TryXMLDatatype
Go
SELECT MyXML FROM TryVARCHARDatatype
Go

Table 'TryXMLDatatype'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 20, lob physical reads 0, lob read-ahead reads 9.

Table 'TryVARCHARDatatype'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 100, lob physical reads 0, lob read-ahead reads 0.

The only metric with a constant behavior is the reads I/O. Furthermore, it is more likely for the XML convertion query to use CPU for the conversion -- that's normal.

Conclusion
The XML data type is meant to store pure XML data, excluding unnecessary headers and trailing characters. This results in a more cost-effective storage by means of I/O, yet there is still the CPU cost for checking that the data is valid XML.

ABOUT THE AUTHOR
Michelle Gutzait works as a senior database consultant for ITERGY International Inc., an IT consulting firm specializing in the design, implementation, security and support of Microsoft products in the enterprise. Gutzait has been involved in IT for 20 years as a developer, business analyst and database consultant. For the last 10 years, she has worked exclusively with SQL Server. Her skills include SQL Server infrastructure design, database design, performance tuning, security, high availability, VLDBs, replication, T-SQL/packages coding, and more.

This was first published in January 2008

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.