Problem solve Get help with specific problems with your technologies, process and projects.

Working with sparse columns in SQL Server 2008

Sparse columns can provide significant storage benefits to those working with SQL Server 2008. Learn the proper way to implement sparse columns, as well as column sets and filtered indexes.

SQL Server 2008 introduces the concept of the sparse column, which is a type of column that optimizes storage for null values. When a column contains a substantial number of null values, defining the column as sparse can save a significant amount of disk space. In fact, a null value in a sparse column takes up no space at all.

There are trade-offs, however, in using sparse columns because more space is needed for non-null values. In a column configured as sparse, the non-null value requires an additional 4 bytes of storage. For example, a DATETIME value in a non-sparse column requires 8 bytes of storage. In a sparse column, however, that value requires 12 bytes. For this reason, Microsoft recommends that you use sparse columns only when the space saved is at least 20 to 40%.

If you decide to implement sparse columns, keep in mind that there are a number of restrictions. For example, the column must be nullable and cannot be configured with the ROWGUIDCOL or IDENTITY properties, cannot include a default, and cannot be bound to a rule. In addition, you cannot define a column as sparse if it is configured with certain data types, such as TEXT, IMAGE, or TIMESTAMP.

You can find a complete list of restrictions on sparse columns via Microsoft's SQL Server 2008 Books Online.

Implementing sparse columns

Configuring a column as sparse is simply a matter of including the SPARSE keyword in the column definition. For instance, in the following T-SQL script, I created the Products table in the AdventureWorks 2008 sample database and then populated the table with data from the Production.Product table:

USE AdventureWorks2008
GO
IF EXISTS (SELECT table_name
FROM information_schema.tables
WHERE table_name = 'Products')
DROP TABLE Products
GO
CREATE TABLE Products
(
   ProductID INT NOT NULL PRIMARY KEY,
   ProductName NVARCHAR(50) NOT NULL,
   ProductColor NVARCHAR(15) SPARSE NULL,
   ProductWeight DECIMAL(8,2) SPARSE NULL,
   SellEndDate DATETIME SPARSE NULL
)
GO
INSERT INTO Products
(ProductID, ProductName, ProductColor, ProductWeight, SellEndDate)
SELECT ProductID, Name, Color, Weight, SellEndDate
FROM Production.Product
GO

As you can see in the table definition, I've defined the ProductColor, ProductWeight and SellEndDate columns as sparse. As a result, null values will be stored more efficiently in these columns (note that the source data contains a significant number of null values).

You can query sparse columns as you would any type of column, or you can return all values by using the basic SELECT * in your SELECT clause:

SELECT * FROM Products

If you run the query, you'll find a complete data set, with many of the values null. Remember - the difference is in the storage, not in the data itself.

Implementing column sets

SQL Server supports another feature that is related to the sparse column — the column set. A column set is similar to a calculated column except that you can update the data.

A table can include only one column set and must abide by a number of restrictions. For example, you cannot add a column set to a table that is already defined with sparse columns, and you cannot modify a column set after it has been defined. However, you can add sparse columns to a table definition that is already defined with a column set. Once again, you can find a complete list of guidelines to follow when working with column sets from SQL Server 2008 Books Online.

To create a column set, include an XML column definition in your CREATE TABLE statement as shown in the following T-SQL script:

IF EXISTS (SELECT table_name
FROM information_schema.tables
WHERE table_name = 'Products')
DROP TABLE Products
GO
CREATE TABLE Products
(
   ProductID INT NOT NULL PRIMARY KEY,
   ProductName NVARCHAR(50) NOT NULL,
   ProductColor NVARCHAR(15) SPARSE NULL,
   ProductWeight DECIMAL(8,2) SPARSE NULL,
   SellEndDate DATETIME SPARSE NULL,
   ProductInfo XML COLUMN_SET FOR ALL_SPARSE_COLUMNS
)
GO
INSERT INTO Products
(ProductID, ProductName, ProductColor, ProductWeight, SellEndDate)
SELECT ProductID, Name, Color, Weight, SellEndDate
FROM Production.Product
GO

The table definition shown here is similar to the preceding example except that it now includes the ProductInfo column. Notice that the column is defined with the XML data type and includes the keywords COLUMN_SET FOR ALL_SPARSE_COLUMNS.

This is all you need to do to define a column set on a table. When you retrieve data from that column, the query engine automatically returns a structured data set that contains all non-null values in the sparse columns. For example, if you run the SELECT statement above, you'll receive the following results (only first 10 rows are shown):

ProductID  ProductName ProductInfo
1
Adjustable Race NULL
2
Bearing Ball NULL
3
BB Ball Bearing NULL
4
Headset Ball Bearings NULL
316
Blade NULL
317
LL Crankarm <ProductColor>Black</ProductColor>
318
ML Crankarm <ProductColor>Black</ProductColor>
319
HL Crankarm <ProductColor>Black</ProductColor>
320
Chainring Bolts <ProductColor>Silver</ProductColor> 
321
Chainring Nut <ProductColor>Silver</ProductColor>

As you can see, the ProductInfo column contains the product data, rendered as XML. If all values in the sparse columns of a given row are null, then only a null valued is returned.

Notice that the result set does not include the individual sparse columns. When you use SELECT * to retrieve data from a table configured with a column set, only the column set and non-sparse columns are returned. However, if you specify the name of the sparse columns in your SELECT clause, the columns will be returned. For instance, the following SELECT statement specifies the ProductColor sparse column:

SELECT ProductID, ProductName, ProductColor
FROM Products
WHERE SellEndDate IS NOT NULL

The following table shows the first 10 rows returned by this statement:

ProductID  ProductName ProductColor 
709 Mountain Bike Socks, M White
710 Mountain Bike Socks, L White
725 LL Road Frame - Red, 44 Red
726 Road Frame - Red, 48 Red
727 Road Frame - Red, 52 Red
728 Road Frame - Red, 58 Red
729 Road Frame - Red, 60 Red
730 Road Frame - Red, 62 Red
731 Road Frame - Red, 44 Red
732 Road Frame - Red, 48 Red

As mentioned above, when you retrieve a column set, all non-null sparse column data is returned. For instance, if the ProductColor, ProductWeight and SellEndDate columns each contain values, your results for the column set would be similar to the following:

<ProductColor>Red</ProductColor>
<ProductWeight>2.26</ProductWeight>
<SellEndDate>2002-06-30T00:00:00</SellEndDate>

Notice that the results include an element for each sparse column.

Implementing filtered indexes

Another new feature in SQL Server 2008 that is related to sparse columns is the filtered index. A filtered index is a nonclustered index that is defined on a subset of data. For example, suppose that you want to improve performance on queries that retrieve data from the ProductName column in the Products table. You can create a filtered index that excludes the null values, as shown in the following example:

CREATE NONCLUSTERED INDEX ixProductName
ON Products (ProductName)
WHERE SellEndDate IS NOT NULL

The main difference between a basic nonclustered index and a filtered index is that you specify a WHERE clause in your filtered index definition. In this case, the index is created only on those values that are not null. All other values are included in the index.

Filtered indexes, when used with sparse columns, can help to improve performance and reduce the amount of necessary storage, assuming that the ratio of null values to non-null values justifies the overhead placed on non-null values. Together with column sets, sparse columns and filtered indexes might prove to be beneficial additions to your deployments, but you should carefully consider their use before widely implementing them. An overabundance of sparse columns and filtered indexes might point to a database that needs to be better designed. Still, the combination of sparse columns, column sets and filtered indexes could prove to be valuable tools in your SQL Server arsenal when used wisely.

ABOUT THE AUTHOR:
Robert Sheldon is a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. You can find more information at  http://www.rhsheldon.com.

This was last published in June 2009

Dig Deeper on Microsoft SQL Server Administration

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchDataCenter

SearchDataManagement

SearchAWS

SearchOracle

SearchContentManagement

SearchWindowsServer

Close