Home > SQL Server Tips > Database Management and Administration > Working with sparse columns in SQL Server 2008
SQL Server Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

DATABASE MANAGEMENT AND ADMINISTRATION

Working with sparse columns in SQL Server 2008


Robert Sheldon, Contributor
06.02.2009
Rating: -4.86- (out of 5)


Expert advice on database administration
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


SQL Server 2008 introduces the concept of the sparse column, which is a type of column that optimizes storage for null values. When a column contains a substantial number of null values, defining the column as sparse can save a significant amount of disk space. In fact, a null value in a sparse column takes up no space at all.

There are trade-offs, however, in using sparse columns because more space is needed for non-null values. In a column configured as sparse, the non-null value requires an additional 4 bytes of storage. For example, a DATETIME value in a non-sparse column requires 8 bytes of storage. In a sparse column, however, that value requires 12 bytes. For this reason, Microsoft recommends that you use sparse columns only when the space saved is at least 20 to 40%.

If you decide to implement sparse columns, keep in mind that there are a number of restrictions. For example, the column must be nullable and cannot be configured with the ROWGUIDCOL or IDENTITY properties, cannot include a default, and cannot be bound to a rule. In addition, you cannot define a column as sparse if it is configured with certain data types, such as TEXT, IMAGE, or TIMESTAMP.

You can find a complete list of restrictions on sparse columns via Microsoft's SQL Server 2008 Books Online.

Implementing sparse columns

Configuring a column as sparse is simply a matter of including the SPARSE keyword in the column definition. For instance, in the following T-SQL script, I created the Products table in the AdventureWorks 2008 sample database and then populated the table with data from the Production.Product table:

USE AdventureWorks2008
GO
IF EXISTS (SELECT table_name
FROM information_schema.tables
WHERE table_name = 'Products')
DROP TABLE Products
GO
CREATE TABLE Products
(
   ProductID INT NOT NULL PRIMARY KEY,
   ProductName NVARCHAR(50) NOT NULL,
   ProductColor NVARCHAR(15) SPARSE NULL,
   ProductWeight DECIMAL(8,2) SPARSE NULL,
   SellEndDate DATETIME SPARSE NULL
)
GO
INSERT INTO Products
(ProductID, ProductName, ProductColor, ProductWeight, SellEndDate)
SELECT ProductID, Name, Color, Weight, SellEndDate
FROM Production.Product
GO

As you can see in the table definition, I've defined the ProductColor, ProductWeight and SellEndDate columns as sparse. As a result, null values will be stored more efficiently in these columns (note that the source data contains a significant number of null values).

You can query sparse columns as you would any type of column, or you can return all values by using the basic SELECT * in your SELECT clause:

SELECT * FROM Products

If you run the query, you'll find a complete data set, with many of the values null. Remember - the difference is in the storage, not in the data itself.

Implementing column sets

SQL Server supports another feature that is related to the sparse column — the column set. A column set is similar to a calculated column except that you can update the data.

A table can include only one column set and must abide by a number of restrictions. For example, you cannot add a column set to a table that is already defined with sparse columns, and you cannot modify a column set after it has been defined. However, you can add sparse columns to a table definition that is already defined with a column set. Once again, you can find a complete list of guidelines to follow when working with column sets from SQL Server 2008 Books Online.

To create a column set, include an XML column definition in your CREATE TABLE statement as shown in the following T-SQL script:

IF EXISTS (SELECT table_name
FROM information_schema.tables
WHERE table_name = 'Products')
DROP TABLE Products
GO
CREATE TABLE Products
(
   ProductID INT NOT NULL PRIMARY KEY,
   ProductName NVARCHAR(50) NOT NULL,
   ProductColor NVARCHAR(15) SPARSE NULL,
   ProductWeight DECIMAL(8,2) SPARSE NULL,
   SellEndDate DATETIME SPARSE NULL,
   ProductInfo XML COLUMN_SET FOR ALL_SPARSE_COLUMNS
)
GO
INSERT INTO Products
(ProductID, ProductName, ProductColor, ProductWeight, SellEndDate)
SELECT ProductID, Name, Color, Weight, SellEndDate
FROM Production.Product
GO

The table definition shown here is similar to the preceding example except that it now includes the ProductInfo column. Notice that the column is defined with the XML data type and includes the keywords COLUMN_SET FOR ALL_SPARSE_COLUMNS.

This is all you need to do to define a column set on a table. When you retrieve data from that column, the query engine automatically returns a structured data set that contains all non-null values in the sparse columns. For example, if you run the SELECT statement above, you'll receive the following results (only first 10 rows are shown):

ProductID  ProductName ProductInfo
1
Adjustable Race NULL
2
Bearing Ball NULL
3
BB Ball Bearing NULL
4
Headset Ball Bearings NULL
316
Blade NULL
317
LL Crankarm <ProductColor>Black</ProductColor>
318
ML Crankarm <ProductColor>Black</ProductColor>
319
HL Crankarm <ProductColor>Black</ProductColor>
320
Chainring Bolts <ProductColor>Silver</ProductColor> 
321
Chainring Nut <ProductColor>Silver</ProductColor>

As you can see, the ProductInfo column contains the product data, rendered as XML. If all values in the sparse columns of a given row are null, then only a null valued is returned.

Notice that the result set does not include the individual sparse columns. When you use SELECT * to retrieve data from a table configured with a column set, only the column set and non-sparse columns are returned. However, if you specify the name of the sparse columns in your SELECT clause, the columns will be returned. For instance, the following SELECT statement specifies the ProductColor sparse column:

SELECT ProductID, ProductName, ProductColor
FROM Products
WHERE SellEndDate IS NOT NULL

The following table shows the first 10 rows returned by this statement:

ProductID  ProductName ProductColor 
709 Mountain Bike Socks, M White
710 Mountain Bike Socks, L White
725 LL Road Frame - Red, 44 Red
726 Road Frame - Red, 48 Red
727 Road Frame - Red, 52 Red
728 Road Frame - Red, 58 Red
729 Road Frame - Red, 60 Red
730 Road Frame - Red, 62 Red
731 Road Frame - Red, 44 Red
732 Road Frame - Red, 48 Red

As mentioned above, when you retrieve a column set, all non-null sparse column data is returned. For instance, if the ProductColor, ProductWeight and SellEndDate columns each contain values, your results for the column set would be similar to the following:

<ProductColor>Red</ProductColor>
<ProductWeight>2.26</ProductWeight>
<SellEndDate>2002-06-30T00:00:00</SellEndDate>

Notice that the results include an element for each sparse column.

Implementing filtered indexes

Another new feature in SQL Server 2008 that is related to sparse columns is the filtered index. A filtered index is a nonclustered index that is defined on a subset of data. For example, suppose that you want to improve performance on queries that retrieve data from the ProductName column in the Products table. You can create a filtered index that excludes the null values, as shown in the following example:

CREATE NONCLUSTERED INDEX ixProductName
ON Products (ProductName)
WHERE SellEndDate IS NOT NULL

The main difference between a basic nonclustered index and a filtered index is that you specify a WHERE clause in your filtered index definition. In this case, the index is created only on those values that are not null. All other values are included in the index.

Filtered indexes, when used with sparse columns, can help to improve performance and reduce the amount of necessary storage, assuming that the ratio of null values to non-null values justifies the overhead placed on non-null values. Together with column sets, sparse columns and filtered indexes might prove to be beneficial additions to your deployments, but you should carefully consider their use before widely implementing them. An overabundance of sparse columns and filtered indexes might point to a database that needs to be better designed. Still, the combination of sparse columns, column sets and filtered indexes could prove to be valuable tools in your SQL Server arsenal when used wisely.

ABOUT THE AUTHOR:   
Robert Sheldon is a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. His books include Beginning MySQL (part of the Wrox Programmer-to-Programmer series), SQL: A Beginner's Guide (based on the SQL:1999 standard), MCSE Training Kit: Designing Highly Available Web Solutions with Microsoft Windows 2000 Server Technologies, and MCSE Training Kit: Microsoft SQL Server 2000 Database Design and Implementation. You can find more information at http://www.rhsheldon.com.

Rate this Tip
To rate tips, you must be a member of SearchSQLServer.com.
Register now to start rating these tips. Log in if you are already a member.




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



RELATED CONTENT
Microsoft SQL Server 2008
Programming report generation with SQL Server Reporting Services 2008
Q&A: SQL Server 2008 a better fit for consolidation
End of life comes for SQL Server 2005 SP2, 2008
What's new for installation with SQL Server 2008?
SQL Server Reporting Services 2008 offers faster speeds, new variations
Microsoft SQL Server 2008 Learning Guide
Understanding transparent data encryption in SQL Server 2008
Implementing SQL Server 2008 FILESTREAM functionality
Microsoft renames SQL Server release, adds data services
New GROUP BY option provides better data control in SQL Server 2008

SQL/Transact SQL (T-SQL)
Combining result sets from multiple SQL Server queries
Using DELETE and TRUNCATE TABLE statements to delete data in SQL Server
SQL language crash course (just enough to be dangerous)
Working with IntelliSense in SQL Server 2008 Management Studio
SQL Server Mailbag: Stored procedures, triggers and SSRS reports
Determining the source of full transaction logs in SQL Server
New GROUP BY option provides better data control in SQL Server 2008
Using the OPENROWSET function in SQL Server
Loading data files with SQL Server's BULK INSERT statement
Importing and exporting bulk data with SQL Server's bcp utility
SQL/Transact SQL (T-SQL) Research

Database Management and Administration
Top 5 DBA tasks that are a waste of time (and might be hurting your SQL Server)
Password cracking tools for SQL Server
Using traces in SQL Server Profiler
Meet compliance requirements with improved database security practices
Hardening the network and OS for SQL Server security
Securing the server and database in SQL Server
How SQL Server 2008 components impact SharePoint implementations
Troubleshooting Distributed Transaction Coordinator errors in SQL Server
Achieving high availability and disaster recovery with SharePoint databases
Clearing the Windows page file and its effect on server performance

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
ACID  (SearchSQLServer.com)
Collaboration Data Objects  (SearchSQLServer.com)
commit  (SearchSQLServer.com)
container  (SearchSQLServer.com)
DAO  (SearchSQLServer.com)
fetch  (SearchSQLServer.com)
OLE DB  (SearchSQLServer.com)
query  (SearchSQLServer.com)
SQL  (SearchSQLServer.com)
T-SQL  (SearchSQLServer.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



SQL Server Development - .NET, C#, T-SQL, Visual Basic
HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts