BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
With the release of SQL Server 2012, Microsoft introduced a new type of nonclustered index called xVelocity columnstore index in the SQL Server database engine. It is based on VertiPaq in-memory data compression technology. The xVelocity memory optimized columnstore indexes are one of the most significant performance and scalability enhancements in SQL Server 2012. The xVelocity columnstore indexes speed up the processing time for common data warehousing queries -- particularly in situations where very large quantities of data have to be aggregated and accessed quickly. According to Microsoft, under certain conditions, the xVelocity columnstore index provides queries with speed improvements of 4X, 10X or even 100X.
Before designing, implementing or managing a columnstore index, it is beneficial to understand how these indexes work and how the data is stored in a columnstore index.
Basics: xVelocity Columnstore Indexes
Unlike traditional tables (heaps) and indexes (B-trees), where data is stored and grouped in a row-based fashion, xVelocity columnstore indexes are nonclustered indexes that group and store data column-wise instead of row-wise and then join the columns to complete the index. The columnstore index stores each column in a separate set of disk pages, rather than storing multiple rows per page, which is the traditional storage format. For example, consider a table containing student data, as illustrated in Table 1:
In traditional indexes (B-trees), the data of a student table is organized in one or more data pages, as illustrated in Figure 1:
In a columnstore index, student table data is stored column-wise in a separate set of disk pages, as illustrated in Figure 2:
How does an xVelocity columnstore index improve query performance?
xVelocity columnstore indexes are based on VertiPaq in-memory data compression technology, which usually improves the performance of database warehouse queries. The VertiPaq in-memory data compression allows a large amount of data to be compressed in-memory; this results in the query requiring less I/O because the amount of data transferred from disk to memory is significantly reduced.
For wide tables, such as those commonly found in data warehouses, columnstore indexes come in handy when a user executes a query using the columnstore index, as. SQL Server only extracts the columns that are needed to satisfy the query. As a result, less data is read from a disk. For example, consider the dbo.Contacts table with the following nonclustered index:
CREATE TABLE [dbo].[Contacts](
[BusinessEntityID] [int] NOT NULL,
[PersonType] [nchar](2) NOT NULL,
[NameStyle] [bit] NOT NULL,
[FirstName] [nvarchar](50) NOT NULL,
[MiddleName] [nvarchar](50) NULL,
[LastName] [nvarchar](50) NOT NULL,
[EmailPromotion] [int] NOT NULL,
[AdditionalContactInfo] [xml] (CONTENT[Person].[AdditionalContactInfoSchemaCollection]) NULL,
[ModifiedDate] [datetime] NOT NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IXNC_ContactsIndex] ON [dbo].[Contacts]
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
Now click new query window, and then execute the following query:
SET STATISTICS IO ON;
SELECT [FirstName], [LastName], [ModifiedDate]
WHERE [Modified] >= ‘01/06/2004’
As you can see above, I've set the option STATISTICS IO to ON. This option is used to display the information regarding the amount of disk activity generated by Transact-SQL statements (see Figure 3):
Next, execute the following script to drop the existing nonclustered index and replace it with the columnstore index.
DROP NONCLUSTERED INDEX [IXNC_ContactsIndex] ON [dbo].[Contacts]
CREATE NONCLUSTERED COLUMNSTORE INDEX [IXNC_Columnstore]
WITH (DROP_EXISTING = OFF) ON [PRIMARY]
The directive COLUMNSTORE instructs SQL Server that this will be a columnstore index. Alternatively, we can use Object Explorer in SQL Server Management Studio to create this columnstore index. To do that, expand table and right click Indexes. Then choose a New Index and then Nonclustered columnstore index.
After creating the above columnstore index, execute the above query again and compare the disk activity with the previous run (see Figure 4):
As you can see, compared to a regular nonclustered index, the columnstore index on dbo.Contacts table significantly improves the query disk I/O.
Columnstore Index Restrictions and Limitations
For more on features in SQL Server 2012
Little-known features that will make your job easier
In-memory and xVelocity -- what to know before you integrate
- The columnstore index feature is only available in SQL Server 2012 Enterprise edition.
- Columnstore indexes cannot be combined with following features: Change data capture, change tracking, page and row compression and replication.
- A columnstore index cannot have more than 1024 columns.
- They cannot be set up as a clustered or unique index.
- They cannot be created over a View or Indexed View.
- Columnstore indexes cannot include sparse columns and they cannot act as primary or foreign keys for a table.
Columnstore indexes cannot include columns with following data types: binary, varbinary, ntext, text, image, varchar(max), nvarchar(max), uniqueidentifier, FILESTREAM, xml, rowversion (and timestamp), sql_variant, decimal (and numeric) with precision greater than 18 digits, datetimeoffset with scale greater than 2 and CLR types (hierarchyid and spatial types).