Tip

An introduction to XML shredding for SQL Server

Matthew Schroeder, Contributor

XML shredding is the process of consuming XML data and parsing out the content to scalar elements.

Before we get into how XML shredding occurs, let's first take a look at the way XML should be utilized in SQL

    Requires Free Membership to View

Server.

XML – Unleashing the pros and cons

Just because SQL Server supports XML does not mean it's the best approach for all situations. I am reminded of several architects who believed they should transfer data between multiple SQL Servers in XML. Their approach was to take a set of related relational data (roughly corresponding to an object), parse it into XML, select it utilizing a C# service, transfer it to the receiving SQL Server, shred it on the SQL Server side, and store it in the same format as it was on the originating system.

Using this approach, they basically incurred a CPU/RAM overhead to parse the XML on the originating server. This also increased network bandwidth due to the text-based nature of XML and created additional CPU/RAM overhead to shred the XML on the receiving server.

One needs to keep in mind the purpose of XML, which is the transfer of data between disparate systems of differing types in a uniform manner. That being said, XML comes with some drawbacks, such as the addition of tags which increases the size and RAM/CPU to parse or shred. Also, XML is text-based, which means any smaller data type such as an INT will immediately become larger when sent as XML since it will turn into text and have tags added.

In the past, one of the benefits of XML has been to pass in XML data that constitutes a block of data (typically a series of rows), shred the data, and then process and store it in relational format. Prior to SQL Server 2008, without the use of XML, this would have consisted of multiple consecutive calls to a stored procedure (since I know none of you would use embedded T-SQL code). Starting with SQL Server 2008, one can pass in a table-valued parameter. The table-valued parameters do not have the overhead associated with XML and are far less intensive on CPU/RAM resources. They also do not consume as much network bandwidth.

The CPU/RAM load that parsing and shredding XML adds to a SQL Server also relates to scalability issues. If a SQL Server is currently (or expected to be) heavily utilized, then one will have to be careful that the volume of expected shredding will not overload the CPU/RAM capacity. If a SQL Server is heavily utilized and the XML shredding could have a noticeable load, it might be a better idea to distribute the shredding operations to service boxes running custom-written services that will shred the XML and called stored procedures on the SQL Server with the resulting relational data. It is often easier to scale service servers than a SQL Server if the system is under strain.

XML Shredding option summary

1. Perform shredding via another language on a separate service server

Pros

  • Scalability for parsing and shredding of large XML files very quickly, which can then be fed into the SQL Server by T-SQL stored procedures in a relational format.
  • Far more friendly to CPU/RAM resources on the SQL Server.
  • Easier XML manipulation utilizing programming languages like C# and VB.NET which are typically faster when shredding complex files.

Cons

  • Another application layer to develop and manage means increased development costs and time.

2. Perform shredding on SQL Server itself utilizing T-SQL/ XQuery /XPath

Pros

  • T-SQL is native to SQL Server, so the execution is going to be faster than using a CLR solution. The speed compared to a separate server(s) will depend on the XML document, processing methods, network speed, and other factors.
  • Disaster recovery is easy.

Cons

  • The versatility of the current XML functionality is poor compared to the full programming languages.
  • CPU/RAM resources are impacted.
  • Scalability will be an issue if the SQL Server is maxed out.

3. Perform shredding on SQL Server itself utilizing CLR stored procedures

Pros

  • The full versatility of any CLR-compliant language such as C# or VB.NET.
  • Increased ability to process complex XML files more efficiently utilizing readers.

Cons

  • The need to deploy and maintain assemblies on the server, which complicates scalability and disaster recovery issues (though this can be planned for).
  • SQL Server needs to utilize the CLR engine to run the interpreted code. A slight delay may be added to translate the IL code to machine code, but this is dependent on the JIT compilers as to the impact it can be cached.
  • Scalability will be an issue if the SQL Server is maxed out.

XML shredding

XML shredding is accomplished by parsing out elements with XQuery. Keep in mind that XPath is a subset of XQuery. XQuery utilizes XPath to address parts of an XML document. This is supplemented by the "For, Let, Where, Order By, Return" expression, or FLWOR for short. XQuery does not allow for updating or full-text searching capabilities. While both features are expected to be developed in the future, for now XQuery is just a query language.

XML sample

<sample1>
<company>TheRealLife</company>
<company>TheRealLife2</company>
<city>Miami</city>
<year-founded>1998</ year-founded>
<industry>software</industry>
</sample1>

Calling stored procedure sample

declare @m as xml;

set @m = ' <sample1>
<company>TheRealLife</company>
<company>TheRealLife2</company>
<city>Miami</city>
<year-founded>1998</year-founded>
<industry>software</industry>
</sample1>'


exec dbo.XMLShred @d = @m;

Result set1:
TheRealLife TheRealLife2

Result set2:
<company>TheRealLife</company>
<company>
TheRealLife2</company>

Shredding stored procedure sample

create proc dbo.XMLShred (@d as XML)
AS

     select @d.query('
         for $step in /sample1/company
         return string($step)
               '
)

     select @d.query ('/sample1/company')
return 0;

The first select is a classic FLWOR, which will return the contents of a single node or, if multiple nodes are present, the appended results (note result set 1). The second select returns all the matching nodes. In this case, I have multiple company nodes to illustrate the difference. You could also choose to change the line to this select@d.query ('/sample1/company[2]'). This will return the node at the given position.

Considering the limited functionality for complex XML manipulation in T-SQL, you can also choose to implement a CLR stored procedure. This is basically a stored procedure that is implemented in the Microsoft .NET Framework CLR and can be written in CLR-supported languages such as C# or VB.NET. Keep in mind that in this configuration, you will need to be able to deploy assemblies onto the physical SQL Server, which might be a problem depending on how well your production server is locked down. The assemblies will also have to be included in restores, so be careful to plan for their impact in relation to disaster recovery, scalability, and other scenarios.

When you are making a decision about how to shred XML files, be mindful of the potential impact. Since we have now outlined the pros, cons, and considerations of each XML shredding option, you should be able to pick the best approach for any given architecture.

Matthew Schroeder is a CIO Consultant working with chief executives. He is a visionary IT executive with 15+ years of experience managing development/operations teams, project management, vendor management, optimizing ROI, and IT/business alignment. He can be reached at cyberstrike@aggressivecoding.com.

 

This was first published in February 2010

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.