XML shredding is the process of consuming XML data and parsing out the content to scalar elements.
Before we get into how XML shredding occurs, let's first take a look at the way XML should be utilized in SQL
XML – Unleashing the pros and cons
Just because SQL Server supports XML does not mean it's the best approach for all situations. I am reminded of several architects who believed they should transfer data between multiple SQL Servers in XML. Their approach was to take a set of related relational data (roughly corresponding to an object), parse it into XML, select it utilizing a C# service, transfer it to the receiving SQL Server, shred it on the SQL Server side, and store it in the same format as it was on the originating system.
Using this approach, they basically incurred a CPU/RAM overhead to parse the XML on the originating server. This also increased network bandwidth due to the text-based nature of XML and created additional CPU/RAM overhead to shred the XML on the receiving server.
One needs to keep in mind the purpose of XML, which is the transfer of data between disparate systems of differing types in a uniform manner. That being said, XML comes with some drawbacks, such as the addition of tags which increases the size and RAM/CPU to parse or shred. Also, XML is text-based, which means any smaller data type such as an INT will immediately become larger when sent as XML since it will turn into text and have tags added.
In the past, one of the benefits of XML has been to pass in XML data that constitutes a block of data (typically a series of rows), shred the data, and then process and store it in relational format. Prior to SQL Server 2008, without the use of XML, this would have consisted of multiple consecutive calls to a stored procedure (since I know none of you would use embedded T-SQL code). Starting with SQL Server 2008, one can pass in a table-valued parameter. The table-valued parameters do not have the overhead associated with XML and are far less intensive on CPU/RAM resources. They also do not consume as much network bandwidth.
The CPU/RAM load that parsing and shredding XML adds to a SQL Server also relates to scalability issues. If a SQL Server is currently (or expected to be) heavily utilized, then one will have to be careful that the volume of expected shredding will not overload the CPU/RAM capacity. If a SQL Server is heavily utilized and the XML shredding could have a noticeable load, it might be a better idea to distribute the shredding operations to service boxes running custom-written services that will shred the XML and called stored procedures on the SQL Server with the resulting relational data. It is often easier to scale service servers than a SQL Server if the system is under strain.
XML Shredding option summary
1. Perform shredding via another language on a separate service server
- Scalability for parsing and shredding of large XML files very quickly, which can then be fed into the SQL Server by T-SQL stored procedures in a relational format.
- Far more friendly to CPU/RAM resources on the SQL Server.
- Easier XML manipulation utilizing programming languages like C# and VB.NET which are typically faster when shredding complex files.
- Another application layer to develop and manage means increased development costs and time.
2. Perform shredding on SQL Server itself utilizing T-SQL/ XQuery /XPath
- T-SQL is native to SQL Server, so the execution is going to be faster than using a CLR solution. The speed compared to a separate server(s) will depend on the XML document, processing methods, network speed, and other factors.
- Disaster recovery is easy.
- The versatility of the current XML functionality is poor compared to the full programming languages.
- CPU/RAM resources are impacted.
- Scalability will be an issue if the SQL Server is maxed out.
3. Perform shredding on SQL Server itself utilizing CLR stored procedures
- The full versatility of any CLR-compliant language such as C# or VB.NET.
- Increased ability to process complex XML files more efficiently utilizing readers.
- The need to deploy and maintain assemblies on the server, which complicates scalability and disaster recovery issues (though this can be planned for).
- SQL Server needs to utilize the CLR engine to run the interpreted code. A slight delay may be added to translate the IL code to machine code, but this is dependent on the JIT compilers as to the impact it can be cached.
- Scalability will be an issue if the SQL Server is maxed out.
XML shredding is accomplished by parsing out elements with XQuery. Keep in mind that XPath is a subset of XQuery. XQuery utilizes XPath to address parts of an XML document. This is supplemented by the "For, Let, Where, Order By, Return" expression, or FLWOR for short. XQuery does not allow for updating or full-text searching capabilities. While both features are expected to be developed in the future, for now XQuery is just a query language.
Calling stored procedure sampledeclare @m as xml;
set @m = ' <sample1>
exec dbo.XMLShred @d = @m;
Shredding stored procedure samplecreate proc dbo.XMLShred (@d as XML)
for $step in /sample1/company
select @d.query ('/sample1/company')
The first select is a classic FLWOR, which will return the contents of a single node or, if multiple nodes are present, the appended results (note result set 1). The second select returns all the matching nodes. In this case, I have multiple company nodes to illustrate the difference. You could also choose to change the line to this email@example.com ('/sample1/company'). This will return the node at the given position.
Considering the limited functionality for complex XML manipulation in T-SQL, you can also choose to implement a CLR stored procedure. This is basically a stored procedure that is implemented in the Microsoft .NET Framework CLR and can be written in CLR-supported languages such as C# or VB.NET. Keep in mind that in this configuration, you will need to be able to deploy assemblies onto the physical SQL Server, which might be a problem depending on how well your production server is locked down. The assemblies will also have to be included in restores, so be careful to plan for their impact in relation to disaster recovery, scalability, and other scenarios.
When you are making a decision about how to shred XML files, be mindful of the potential impact. Since we have now outlined the pros, cons, and considerations of each XML shredding option, you should be able to pick the best approach for any given architecture.Matthew Schroeder is a CIO Consultant working with chief executives. He is a visionary IT executive with 15+ years of experience managing development/operations teams, project management, vendor management, optimizing ROI, and IT/business alignment. He can be reached at firstname.lastname@example.org.
This was first published in February 2010