Home > SQL Server Tips > SQL Server Management > SQL Server clustered index design for performance
SQL Server Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

SQL SERVER MANAGEMENT

SQL Server clustered index design for performance


Matthew Schroeder
01.07.2008
Rating: -4.30- (out of 5)


Expert advice on database administration
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


Clustered indexes in SQL Server are a critical consideration in the overall architecture of the database. They are often overlooked, misunderstood or, if the database is small, considered unimportant.

This article points out the importance of clustered indexes for overall system performance and maintenance as your database grows. I will briefly cover how SQL Server clustered indexes are stored on disk, why they should always increase over time and why it is best that clustered indexes be static. I'll also touch on many-to-many tables, why they are used and how clustered indexes make these tables more efficient.

Finally, it is absolutely critical that we touch on the new SQL Server 2005 partitioned table concept and examine how partitioned tables affect clustered indexes. This will help you make the right decisions from the very start.

Clustered indexes are created by default to match the primary key, which is defined on tables in SQL Server. However, you can create a clustered index on any column and then define a primary key on a separate column or columns. At this point, the primary key would be created as a unique non-clustered index. Typically, a clustered index will match the primary key, but not necessarily, so be careful. Given the variety of situations that can arise, I'll be discussing the clustered indexes themselves, and for now ignore whether you choose to make them primary keys.

Clustered indexes actually hold the row data for SQL Server, so wherever your clustered indexes are stored is also where your data is stored. The clustered indexes are organized into ranges of data. For example, values 1 to 10 may be stored in one range and 90 to 110 in another range. Since clustered indexes are stored as ranges, if you need to do a search on a range for an audit log, it would be more efficient for the clustered index to be based on the date column that would be used to return the date ranges. Non-clustered indexes work better for


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


RELATED CONTENT
Microsoft SQL Server Performance Monitoring and Tuning
Performance implications of transaction log autogrowth in SQL Server
The short course on how SQL Server really works
Determining the source of full transaction logs in SQL Server
Improving SQL Server full-text search performance
New GROUP BY option provides better data control in SQL Server 2008
Microsoft SQL Server 2008 Resource Governor primer
Examining data files when SQL Server tempdb is full
Testing transaction log autogrowth behavior in SQL Server
Meeting business needs with SQL Server full-text search
Using dynamic management views to improve SQL Server index effectiveness

SQL Server Database Modeling and Design
Managing the development lifecycle with Visual Studio Team System 2008
A first look at Visual Studio Team System 2008 Database Edition
Testing transaction log autogrowth behavior in SQL Server
Top 10 SQL Server Tips of 2008
Tutorial: SQL Server indexing tips to improve performance
Tutorial: Learn SQL Server basics from A-Z
SQL Server database design disasters: How it all starts
Can you shrink your SQL Server database to death?
Physical data storage in SQL Server 2005 and 2008
SQL Server 2008 data types: Datetime, string, user-defined and more

SQL Server Management
A first look at Microsoft SQL Server 2008 R2
Maintaining high availability of SQL Server virtual machines
Creating fault-tolerant SQL Server installations
Using Microsoft Hyper-V for SQL Server consolidation
Scaling up vs. scaling out with SQL Server 2008
Migrating to SQL Server 2008 and leveraging new features
Testing a SQL Server environment before an upgrade
Does upgrading to SQL Server 2008 fit your business?
Meeting business needs with SQL Server full-text search
Using dynamic management views to improve SQL Server index effectiveness

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
contiguity  (SearchSQLServer.com)
contiguous  (SearchSQLServer.com)
drilldown  (SearchSQLServer.com)
hashing  (SearchSQLServer.com)
hybrid online analytical processing  (SearchSQLServer.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary


specific value searches, e.g. "date = DateValue," rather than range searches, e.g. "date between date1 and date2."

Ever-increasing values for clustered indexes

Clustered indexes should be based on columns whose values constantly increase over time. In my prior example on using the date column from an audit log, the date values for an audit log would be constantly increasing and older dates would not be inserted into the table. This would be an "ever-increasing" column. Another good example of an ever-increasing value is an identity column, since, by design, it constantly increases.

Why am I spending so much time discussing ever-increasing values for clustered indexes? The most important attributes of clustered indexes is that they are ever-increasing and static in nature. The reason ever-increasing is so important has to do with the range architecture I outlined earlier. If the values are not ever-increasing, then SQL Server has to allocate space within existing ranges for those records rather than placing them in new ranges at the end of the index.

If the values are not ever-increasing, then once the ranges fill up and a value comes in that fits within a filled up index range, SQL Server will make room in an index by doing a page split. Internally, SQL Server takes the filled up page and splits it into two separate pages that have substantially more room at that point but take significantly more resources to process. You can prepare for this eventuality by setting a fill factor of 70% or so, which gives you 30% free space for incoming values.

The problem with this approach is that you continually have to "reindex" the clustered index so it maintains a free space percentage of 30%. Reindexing the clustered index will also cause heavy I/O load since it has to move the actual data itself and any non-clustered indexes have to be rebuilt, adding greatly to maintenance time.

If the clustered index is ever-increasing, you will not have to rebuild the clustered index; you can set a 100% fill factor on the clustered index, and at that point you will only need to reindex the less-intensive, non-clustered indexes as time progresses, resulting in more up time.

Ever-increasing values will only add entries to the end of the index and build new ranges when necessary. Logical fragmentation will not exist since the new values are continually added to the end of the index and the fill factor will be 100%. The higher the fill factor, the more rows are stored on each page. Higher fill factors require less I/O, RAM and CPU for queries. The smaller the data types you pick for the clustered index, the faster the joins/queries will be. Also, since each non-clustered index requires it to contain the clustered index key, then the smaller the clustered index key and the smaller the non-clustered indexes will be.

The best data types for clustered indexes are generally pretty narrow. Referring to data type size, it's typically a smallint, int, bigint or datetime. When datetime values are used as the clustering index, they are the only column and are normally ever-increasing date values that are often queried as range data. Generally, you should avoid compound (multiple columns) clustered indexes except in the following situations: many-to-many tables and SQL Server 2005 partitioned tables that have the partitioning column included as part of the clustered index to allow for index alignment.

Many-to-many tables and clustered indexes

Many-to-many tables are used for their extremely fast join capabilities and their ability to allow for quick re-association of records, from one owning record to another. Consider the following structure:

Customer
[TABLE]

CustomerOrder
[TABLE]

Orders
[TABLE]

The clustered indexes in this structure would be CustomerID, OrderID. The compound would be CustomerID/OrderID. Here are the benefits with this structure:

One negative result of this approach is the fragmentation that occurs on the CustomerOrder table. However, that should not be a big issue, since the table is relatively narrow, has only two columns with narrow data types and only one clustered index. The elimination of the non-clustered indexes, which would be needed on the Orders table if it contained CustomerID, more than makes up for this cost.

Clustered indexes and partitioned tables in SQL Server 2005

Partitioned tables in SQL Server 2005 are tables that appear to be a single table on the surface, but behind the scenes ¬-- at the storage subsystem level -- they are actually multiple partitions that can be spread across many filegroups. The table partitions are spread across various filegroups based on the values in a single column. Partitioning tables in this manner causes several side effects. I will just cover the basics here, to give you some understanding of the factors involved. I recommend that you study partitioned tables before attempting to implement them.

You can create a clustered index in this environment based on only one column. But, if that one column is not the column the table is partitioned on, then the clustered index is said to be non-aligned. If a clustered index is non-aligned, then any snapping in/out (or merging) of partitions will require you to drop the clustered index along with the non-clustered indexes and rebuild them from scratch. This is necessary because SQL Server cannot tell what portions of the clustered/non-clustered indexes belong to which table partitions. Needless to say, this will certainly cause system downtime.

The clustered index on a partitioned table should always contain the regular clustering column, which is ever-increasing and static, as well as the column that is used for partitioning the table. If the clustered index includes the column used for partitioning the table, then SQL Server knows what portion of the clustered/non-clustered indexes belong to which partition. Once a clustered index contains the column that the table is partitioned on, then the clustered index is "aligned." Partitions can then be snapped in/out (and merged) without rebuilding the clustered/non-clustered indexes, causing no downtime for the system. Inserts/updates/deletes will also work faster, because those operations only have to consider the indexes that reside on their particular partition.

Summary

SQL Server clustered indexes are an important part of database architecture and I hope you've learned enough from this article to know why you need to carefully plan for them from the very start. It is vital for the future health of your database that clustered indexes be narrow, static and ever-increasing. Clustered indexes can help you achieve faster join times and faster IUD operations and minimize blocking as the system becomes busy.

Finally, we covered how partitioned tables in SQL Server 2005 affect your choices for the clustered index, what it means to "align" the clustered index with the partitions, and why clustered indexes have to be aligned in order for the partitioned table concept to work as intended. Keep watching for tips on non-clustered indexes (part two) coming in February and optimal index maintenance (part three) in March.

[TABLE]


Rate this Tip
To rate tips, you must be a member of SearchSQLServer.com.
Register now to start rating these tips. Log in if you are already a member.




DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



SQL Server Development - .NET, C#, T-SQL, Visual Basic
HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts