Data in the fast lane: Column-store indexes in SQL Server Denali
A few months ago, I gave you an introduction
to column-store indexes in SQL Server Denali, code name for the upcoming version of the
database. Designed to improve query performance for data warehouse-sized tables, the column-store
component was not available in the first
community technology preview (CTP), released last November. It is included in the latest
preview, CTP3, so let's take a deeper look at this exciting feature.
The architecture is wildly different from the traditional “row-store” format, in which indexed
data
Premium Access
Register now for unlimited access to our premium content across our network of over 70 information Technology web sites.
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States.
Privacy
Dig Deeper
-
People who read this also read...
-
This was first published in September 2011
is stored together on the same page. In a column store, the data from each index column is
grouped and stored separately, and the column data can be compressed. Also, when you run a query
against a column-store index,
SQL Server has to read only
the columns used in the query. The results: smaller disk
I/O and a smaller
memory footprint, so queries can run several times to several hundred times faster.
I performed comparisons between row-store indexes and column-store indexes. My test table had 85
million rows. I used several combinations of columns and queries and my findings were in line with
what Microsoft writes in SQL Server Books Online:
- Column-store indexes were always faster than row store indexes. SQL Server Books Online
mentions that in some specific cases the opposite could be true, but in most cases you can expect
better performance.
- Just as expected, column-store indexes excelled in queries that used aggregation and
filtering.
- The query optimizer recognized that the column-store index works faster and chooses it over the
row-store index.
- The difference in execution time varied between several times faster to several hundred times
faster. You get the best results from columns with low selectivity, meaning the variance in values
is not very high. The reason for this is that the data is compressed -- the more repeating values
you have in a column, the better compression ratio you get. In terms of data types, you get the
best ratios on character and numeric columns.
- Building a column-store index took about the same time it took to build a row-store index. In
CTP2, a private preview circulated among Microsoft MVPs, it took much longer to build a
column-store index. The performance seems to have improved in CTP3, at least in the databases I
have been working with.
The biggest difference in speed was on a query that returns an aggregation of two columns, each
with low variance in values; the result set had 850 rows. Using a traditional, row-store index, the
query took roughly four minutes; with column-store indexes on these two columns, the execution time
dropped to 1.2 seconds. That’s about 200 times faster, an impressive speed considering the query
was processing data from 85 million rows. Plus, my Denali server is a virtual machine that does not
have optimum disk I/O throughput.
Column-store indexes: Not all sweetness and light
Unfortunately, column-store indexes in SQL Server Denali do have some limitations. You can only
have one column-store index per table. If you use table partitioning,
you must include the partitioning column in your column-store index. The following data types
cannot be included: binary, varbinary, text, ntext, image, varchar(max), nvarchar(max),
uniqueidentifier, rowversion, timestamp, decimal and numeric data types with precision greater than
18 digits, common language runtime and xml.
But the biggest restriction is this: Once you create a column-store index, the table becomes
read-only, so any data modifications against the table will fail. This is a showstopper for any
tables that are updated frequently.
There are workarounds, however. The most obvious is updating the table only during predefined
times. Delete or disable the index, and then rebuild it after the table is updated.
Or you can use partitioning. Populate a new partition with inserts and switch it into the table.
To update existing rows, switch out a partition, drop the column-store index, update the data,
re-create the index and then switch it into the table.
The last option is to split data from a table into static data and updateable data. Keep the
static data in a table with a column-store index and store any new data or data that needs to be
modified in a table with regular indexes. Then, write your queries using the UNION ALL
statement to select from both tables. That way, you’ll get part of the benefit: The
column-store index on the static table will speed up the query, while the data from the regular
table will come back at normal speed.
Of course, these approaches are not very practical; they are suitable mainly for data warehouse
tables or infrequently used online transaction processing tables. However, SQL Server Books
Online warns against creating column-store indexes to make tables read-only, because future
releases may change. This implies that in the future, column-store indexes won’t have this
limitation.
Column-store indexes in SQL Server Denali are very useful in certain environments, especially in
data warehouses. The speed you get over row-store index queries means you could run certain queries
in real time, instead of creating aggregate tables with static data and periodically refreshing
them. Check out this new technology and start thinking about where it might be useful in your
organization.
ABOUT THE AUTHOR
Roman Rehak is principal database architect at MyWebGrocer in Colchester, Vt. He specializes in SQL Server development,
database performance tuning, ADO.NET and writing database tools. He contributes to Visual
Studio Magazine, SQL Server Magazine and other publications and presents at user groups
and conferences in the U.S. and Canada.
Disclaimer:
Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.
Join the conversationComment
Share
Comments
Results
Contribute to the conversation