This is an excerpt from the Addison-Wesley book Guru's Guide to SQL Server Architecture and Internals by Ken Henderson.
This chapter will get into how the SQLXML technologies are designed and how they fit together from an architectural standpoint. As with the rest of the book, my intent here is to get beyond the "how to" and into the "why" behind how SQL Server's technologies work.
With the popularity and ubiquity of XML, it's no surprise that SQL Server has extensive support for working with it. Like most modern DBMSs, SQL Server regularly needs to work with and store data that may have originated in XML. Without this built-in support, getting XML to and from SQL Server would require the application developer to translate XML data before sending it to SQL Server and again after receiving it back. Obviously, this could quickly become very tedious given the pervasiveness of the language.
SQL Server is an XML-enabled DBMS. This means that it can read and write XML data. It can return data from databases in XML format, and it can read and update data stored in XML documents.
Table 18.1 SQL Server's XML Features
|FOR XML||An extension to the SELECT command that allows result sets to be returned as XML|
|OPENXML||Allows reading and writing of data in XML documents|
|XPath queries||Allows SQL Server databases to be queried using XPath syntax|
|Schemas||Supports XSD and XDR mapping schemas and XPath queries against them|
|SOAP support||Allows clients to access SQL Server's functionality as a Web service|
|Updategrams||XML templates through which data modifications can be applied to a database|
|Managed classes||Classes that expose the functionality of SQLXML inside the .NET Framework|
|XML Bulk Load||A high-speed facility for loading XML data into a SQL Server database|
As Table 18.1 illustrates, out of the box, SQL Server's XML features can be broken down into eight general categories. We'll explore each of these in this chapter and discuss how they work and how they interoperate.
SQL Server uses Microsoft's XML parser, MSXML, to load XML data, so we'll begin our discussion there. There are two basic ways to parse XML data using MSXML: using the Document Object Model (DOM) or using the Simple API for XML (SAX). Both DOM and SAX are W3C standards. The DOM method involves parsing the XML document and loading it into a tree structure in memory. The entire document is materialized and stored in memory when processed this way. An XML document parsed via DOM is known as a DOM document (or just "DOM" for short). XML parsers provide a variety of ways to manipulate DOM documents. Listing 18.1 shows a short Visual Basic app that demonstrates parsing an XML document via DOM and querying it for a particular node set. (You can find the source code to this app in the CH18msxmltest subfolder on the CD accompanying this book.)
Private Sub Command1_Click() Dim bstrDoc As String bstrDoc = "
" & _ "" Dim xmlDoc As New DOMDocument30 If Len(Text1.Text) = 0 Then Text1.Text = bstrDoc End If If Not xmlDoc.loadXML(Text1.Text) Then MsgBox "Error loading document" Else Dim oNodes As IXMLDOMNodeList Dim oNode As IXMLDOMNode If Len(Text2.Text) = 0 Then Text2.Text = "//Song" End If Set oNodes = xmlDoc.selectNodes(Text2.Text) For Each oNode In oNodes If Not (oNode Is Nothing) Then sName = oNode.nodeName sData = oNode.xml MsgBox "Node <" + sName + ">:" _ + vbNewLine + vbTab + sData + vbNewLine End If Next Set xmlDoc = Nothing End If End Sub One More Day" & _ " Hard Habit to Break" & _ " Forever" & _ " Boys of Summer" & _ " Cherish" & _ " Dance" & _ " I Will Always Love You" & _ "
We begin by instantiating a DOMDocument object, then call its loadXML method to parse the XML document and load it into the DOM tree. We call its selectNodes method to query it via XPath. The selectNodes method returns a node list object, which we then iterate through using For Each. In this case, we display each node name followed by its contents via VB's Msg- Box function. We're able to access and manipulate the document as though it were an object because that's exactly what it is—parsing an XML document via DOM turns the document into a memory object that you can then work with just as you would any other object.
SAX, by contrast, is an event-driven API. You process an XML document via SAX by configuring your application to respond to SAX events. As the SAX processor reads through an XML document, it raises events each time it encounters something the calling application should know about, such as an element starting or ending, an attribute starting or ending, and so on. It passes the relevant data about the event to the application's handler for the event. The application can then decide what to do in response—it could store the event data in some type of tree structure, as is the case with DOM processing; it could ignore the event; it could search the event data for something in particular; or it could take some other action. Once the event is handled, the SAX processor continues reading the document. At no point does it persist the document in memory as DOM does. It's really just a parsing mechanism to which an application can attach its own functionality. In fact, SAX is the underlying parsing mechanism for MSXML's DOM processor. Microsoft's DOM implementation sets up SAX event handlers that simply store the data handed to them by the SAX engine in a DOM tree.
As you've probably surmised by now, SAX consumes far less memory than DOM does. That said, it's also much more trouble to set up and use. By persisting documents in memory, the DOM API makes working with XML documents as easy as working with any other kind of object. SQL Server uses MSXML and the DOM to process documents you load via sp_xml_preparedocument. It restricts the virtual memory MSXML can use for DOM processing to one-eighth of the physical memory on the machine or 500MB, whichever is less. In actual practice, it's highly unlikely that MSXML would be able to access 500MB of virtual memory, even on a machine with 4GB of physical memory. The reason for this is that, by default, SQL Server reserves most of the user mode address space for use by its buffer pool. You'll recall that we talked about the MemToLeave space in Chapter 11 and noted that the non–thread stack portion defaults to 256MB on SQL Server 2000. This means that, by default, MSXML won't be able to use more than 256MB of memory—and probably considerably less given that other things are also allocated from this region—regardless of the amount of physical memory on the machine.
The reason MSXML is limited to no more than 500MB of virtual memory use regardless of the amount of memory on the machine is that SQL Server calls the GlobalMemoryStatus Win32 API function to determine the amount of available physical memory. GlobalMemoryStatus populates a MEMORYSTATUS structure with information about the status of memory use on the machine. On machines with more than 4GB of physical memory, GlobalMemoryStatus can return incorrect information, so Windows returns a -1 to indicate an overflow. The Win32 API function GlobalMemoryStatusEx exists to address this shortcoming, but SQLXML does not call it. You can see this for yourself by working through the following exercise.