XML data type in SQL Server 2005 vs. VARCHAR (MAX)

When choosing to use the XML data type or the VARCHAR (MAX) data type, in SQL Server 2005, DBAs and developers should know the potential performance impact of each. Here's a look at two tables, one inserted and queried with XML and the other with the VARCHAR (MAX) data type. Take a look at the CPU and I/O results and make the best data type choice for your SQL Server environment.

This Content Component encountered an error

As a database administrator, I tend to look closely at performance issues and how to make sure the use of XML does not affect SQL Server performance. In this tip, I'll take you through an example using two tables, one inserted and queried with XML and the other with the VARCHAR (MAX) data type. Take a look at the storage, CPU and I/O results and make the best choice for your SQL Server environment.

Note: The tests used here use only base tables, with no indexes.

If you're interested in a comparison of performance implications using T-SQL commands and XML AUTO in SQL Server, check out my previous tip.

The XML data type
The XML data type logically differs from the VARCHAR (MAX) in the checks done by SQL Server to make sure the content is indeed valid XML.

Testing environment description For my testing, I will use XML I copied from an *.rdl file (Reporting Services Report), which is 265KB long. I will create two tables with the same structure, except one table uses XML data type and the other uses VARCHAR (MAX) data type for the MyXML field:

create table TryVACRCHARDatatype(
id int identity not null,
MyXML VARCHAR(MAX) null
)

go

create table TryXMLDatatype (
id int identity not null,
MyXML XML null
)
Go

I will insert the XML data the same way to each table:

set statistics io on
Go
declare @XML XML

-- My big XML (for space reasons I am not including all of it here) SET @XML = '<?xml version="1.0" encoding="utf-8"?>
………
'

insert into TryXMLDatatype (MyXML) values (@XML)
go

declare @Varch VARCHAR(MAX)

-- My big XML (for space reasons I am not including all of it here)
SET @Varch = '<?xml version="1.0" encoding="utf-8"?>
………
'

insert into TryXMLDatatype (MyXML) values (@Varch)
Go

Insert XML data into the tables
I monitored the above insert commands in SQL Profiler and with statistics I/O, running them twice.

Statistics I/O results:
Table 'TryXMLDatatype'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 18, lob physical reads 0, lob read-ahead reads 0.

Table 'TryVACRCHARDatatype'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 90, lob physical reads 0, lob read-ahead reads 0.

Execution plans show the exact same plan for both commands.

Profiler results (two executions of each insert):

Figure 1

Note: The XML insertion takes more CPU but less reads and writes. Also, inserting the XML data type has more duration. I will analyze this behavior in the next section.

Querying the tables
I ran SELECT * FROM in each table and monitored with Profiler and Statistics I/O:

SELECT * FROM TryXMLDatatype
Go
SELECT * FROM TryVACRCHARDatatype
Go

Results:

Figure 2

Note: The values in the fields are different!

Statistics I/O results:

Table 'TryXMLDatatype'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 56, lob physical reads 0, lob read-ahead reads 18. Table 'TryVACRCHARDatatype'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 200, lob physical reads 0, lob read-ahead reads 0.

Less I/O is issued with the XML data type.

Again, execution plans show the exact same plan for both commands.

Why is there such a difference between the reads of the two tables?

Note: The duration of extracting the XML is higher. The duration is subjected to many different factors depending on the activity on the machine. I'll ignore this difference, mainly because no CPU is shown. If CPU activity was involved, it might explain the difference in duration, but this is not the case.

Let's query the length of the columns in each table:

SELECT datalength(MyXML) FROM TryXMLDatatype
Go
SELECT datalength(MyXML) FROM TryVACRCHARDatatype
Go

Surprisingly, the XML field has far fewer characters than the other. Here is the explanation:

XML -- less I/O:

When inserting the XML data type, the "extra" data such as the "                and tabs are removed from the field too. The result is a much more economical storage.

I tried to copy the value from the XML column to the VARCHAR(MAX) column as follows:

truncate table TryVARCHARDatatype
Go
insert into TryVARCHARDatatype (MyXML)
select convert(varchar(max),MyXML) from TryXMLDatatype
Go

The result:

Looks the same, but…

SELECT datalength(MyXML) FROM TryXMLDatatype
Go
SELECT datalength(MyXML) FROM TryVACRCHARDatatype
Go

This is better than before (136,268 instead of 271,210), but still not the same. If I convert the XML to a VARCHAR(MAX) and calculate the length:

SELECT
datalength(convert(varchar(max),MyXML)) FROM TryXMLDatatype
go
SELECT datalength(MyXML)FROM TryVARCHARDatatype
Go

Same length.
This means that the XML datatype is stored in a more efficient way. And now that both tables contain the same data, let's view the Profiler trace of the select * from both tables (three executions):

There are about 20% fewer reads for the XML datatype.

The statistics I/O:

Table 'TryXMLDatatype'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 56, lob physical reads 0, lob read-ahead reads 18.

Table 'TryVARCHARDatatype'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 100, lob physical reads 0, lob read-ahead reads 0.

Again, it shows a more efficient read operation with the XML datatype.

What about querying the XML converted to VARCHAR(MAX):

SELECT convert(varchar(max),MyXML) FROM TryXMLDatatype
Go
SELECT MyXML FROM TryVARCHARDatatype
Go

Table 'TryXMLDatatype'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 20, lob physical reads 0, lob read-ahead reads 9.

Table 'TryVARCHARDatatype'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 100, lob physical reads 0, lob read-ahead reads 0.

The only metric with a constant behavior is the reads I/O. Furthermore, it is more likely for the XML convertion query to use CPU for the conversion -- that's normal.

Conclusion
The XML data type is meant to store pure XML data, excluding unnecessary headers and trailing characters. This results in a more cost-effective storage by means of I/O, yet there is still the CPU cost for checking that the data is valid XML.

ABOUT THE AUTHOR
Michelle Gutzait works as a senior database consultant for ITERGY International Inc., an IT consulting firm specializing in the design, implementation, security and support of Microsoft products in the enterprise. Gutzait has been involved in IT for 20 years as a developer, business analyst and database consultant. For the last 10 years, she has worked exclusively with SQL Server. Her skills include SQL Server infrastructure design, database design, performance tuning, security, high availability, VLDBs, replication, T-SQL/packages coding, and more.

This was first published in January 2008

Dig deeper on XML in SQL Server

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchDataCenter

SearchDataManagement

SearchAWS

SearchOracle

SearchContentManagement

SearchWindowsServer

Close