A few months ago, I gave you an introduction to column-store indexes in SQL Server Denali, code name for the upcoming...
version of the database. Designed to improve query performance for data warehouse-sized tables, the column-store component was not available in the first community technology preview (CTP), released last November. It is included in the latest preview, CTP3, so let's take a deeper look at this exciting feature.
The architecture is wildly different from the traditional “row-store” format, in which indexed data is stored together on the same page. In a column store, the data from each index column is grouped and stored separately, and the column data can be compressed. Also, when you run a query against a column-store index, SQL Server has to read only the columns used in the query. The results: smaller disk I/O and a smaller memory footprint, so queries can run several times to several hundred times faster.
I performed comparisons between row-store indexes and column-store indexes. My test table had 85 million rows. I used several combinations of columns and queries and my findings were in line with what Microsoft writes in SQL Server Books Online:
- Column-store indexes were always faster than row store indexes. SQL Server Books Online mentions that in some specific cases the opposite could be true, but in most cases you can expect better performance.
- Just as expected, column-store indexes excelled in queries that used aggregation and filtering.
- The query optimizer recognized that the column-store index works faster and chooses it over the row-store index.
- The difference in execution time varied between several times faster to several hundred times faster. You get the best results from columns with low selectivity, meaning the variance in values is not very high. The reason for this is that the data is compressed -- the more repeating values you have in a column, the better compression ratio you get. In terms of data types, you get the best ratios on character and numeric columns.
- Building a column-store index took about the same time it took to build a row-store index. In CTP2, a private preview circulated among Microsoft MVPs, it took much longer to build a column-store index. The performance seems to have improved in CTP3, at least in the databases I have been working with.
The biggest difference in speed was on a query that returns an aggregation of two columns, each with low variance in values; the result set had 850 rows. Using a traditional, row-store index, the query took roughly four minutes; with column-store indexes on these two columns, the execution time dropped to 1.2 seconds. That’s about 200 times faster, an impressive speed considering the query was processing data from 85 million rows. Plus, my Denali server is a virtual machine that does not have optimum disk I/O throughput.
Column-store indexes: Not all sweetness and light
Unfortunately, column-store indexes in SQL Server Denali do have some limitations. You can only have one column-store index per table. If you use table partitioning, you must include the partitioning column in your column-store index. The following data types cannot be included: binary, varbinary, text, ntext, image, varchar(max), nvarchar(max), uniqueidentifier, rowversion, timestamp, decimal and numeric data types with precision greater than 18 digits, common language runtime and xml.
But the biggest restriction is this: Once you create a column-store index, the table becomes read-only, so any data modifications against the table will fail. This is a showstopper for any tables that are updated frequently.
There are workarounds, however. The most obvious is updating the table only during predefined times. Delete or disable the index, and then rebuild it after the table is updated.
Or you can use partitioning. Populate a new partition with inserts and switch it into the table. To update existing rows, switch out a partition, drop the column-store index, update the data, re-create the index and then switch it into the table.
The last option is to split data from a table into static data and updateable data. Keep the static data in a table with a column-store index and store any new data or data that needs to be modified in a table with regular indexes. Then, write your queries using the UNION ALL statement to select from both tables. That way, you’ll get part of the benefit: The column-store index on the static table will speed up the query, while the data from the regular table will come back at normal speed.
Of course, these approaches are not very practical; they are suitable mainly for data warehouse tables or infrequently used online transaction processing tables. However, SQL Server Books Online warns against creating column-store indexes to make tables read-only, because future releases may change. This implies that in the future, column-store indexes won’t have this limitation.
Column-store indexes in SQL Server Denali are very useful in certain environments, especially in data warehouses. The speed you get over row-store index queries means you could run certain queries in real time, instead of creating aggregate tables with static data and periodically refreshing them. Check out this new technology and start thinking about where it might be useful in your organization.
ABOUT THE AUTHOR
Roman Rehak is principal database architect at MyWebGrocer in Colchester, Vt. He specializes in SQL Server development, database performance tuning, ADO.NET and writing database tools. He contributes to Visual Studio Magazine, SQL Server Magazine and other publications and presents at user groups and conferences in the U.S. and Canada.