When do I use a clustered index vs. a non-clustered index? This is somewhat of a tough question to answer, and frankly, the one I'm going to give is the age old standard database administrator's answer; "It depends." There are numerous factors into the when and where of index placement. The good news is that there are only two options per-sé. The bad news is that there are fairly esoteric gains and limitations to both.
I'm not going to go into great detail here about how indexes function at their lowest level. Whole books have been written on that subject, and there are folks who solely specialize in the placement of indexes. Just know this; according to the "powers that be" a clustered index is better than no index at all. The biggest difference between a clustered index and a non-clustered index is that when you use a clustered index, the section of the table that comprises the index has its data pages linked differently from those data pages comprising a non-clustered index.
In SQL Server 2005 you'll probably hear the term "heap or B-Tree." What's being referred to here is that a table with non clustered index is generally referred to as a heap. A "B-Tree" or "Balanced Tree" is the general structure that clustered indexes take. They're kind of like a telephone book. We know that in SQL Server 2005, there are 8K data pages. There are also eight different kinds of them too. Index data pages have pointers to smaller subsets of data, which have pointers to even smaller subsets of data, etc. For instance, when you open the phone book, what do you see in the upper left or right hand corners of the opposing pages? You see the range that the page covers. That's how B-Trees function.
So what's the difference here? When a clustered index is applied, the data at the "leaf level" contains the actual data pages where you'll find the data being searched on. In a non-clustered index, the data pages at the leaf level merely have pointers to more data pages containing the actual data being searched upon. This being the case, the data pages at the leaf level in a clustered index are only ordered one way, and "in order."
For instance, if you have a column with an IDENTITY constraint applied to it as well as a clustered index, the numbers comprising the IDENTITY constraint will always be in order. Yet, that is not to say always contiguous, as you can DELETE rows. But they will always be in order. This makes for very fast searching, especially when using this scenario for something like order or invoice ID's.
The trade-offs involved with using clustered or non-clustered indexes are a world of give and take. One of the most important things that I was ever taught is that since clustered indexes keep all data within them at the leaf level of the B-Tree, any modifications to the data require a rearranging of the data pages. This means, if you add a clustered index to a table that is heavily inserted, updated, or deleted, you will probably need to rebuild or de-fragment the index more often than when a non-clustered index is used. This is due to all of the data page movement that occurs. Once again, the gain you receive is faster reads of the data, due to the orderly fashion in which the data is laid out. The other major difference is that you only get one clustered index per table. You can apply 249 non-clustered indexes.
Read more about SQL Server indexes here
Keep in mind that although you only get one clustered index per table, it does not solely need to be comprised of a single column. You are more than welcome to apply that index to multiple columns creating a covering index. Think about the search criteria on the table: What is being queried? If you're querying multiple columns at the same time, a covering index may be your answer. Another fine compromise is evaluating what is being queried, and then applying a clustered index to the column most commonly used in WHERE clause(s). Then apply a covering non-clustered index to the remaining columns in the SELECT statement(s).
Lastly there is also the option to use an Indexed view. This is essentially Microsoft's implementation of "materialized views" as seen in other platforms. Be aware - when you apply an index to a view, you are now creating a new database object, whereas the result sets coming from a non-indexed view are only durable as long as the session is open, and are completely virtual. An Indexed view is not.
The "Best Practices"
Test, test, test. There are numerous tools at your disposal in SQL Server 2005 to help you make the best choices. One of them is the "Display Estimated Execution Plan." You can find the Display Estimated Execution Plan in the Query Menu at the top of the SQL Server Management Studio, by pressing CTRL + L or on the toolbar above the query pane in SSMS.
By implementing this tool the SQL Server is telling you what it thinks it might do. Don't fight it. Its instincts are usually right. Display the estimated execution plan for a query prior to applying an index, and then apply an index to see what improvements in performance are made with your choice of index.
You can also implement the use of the Actual Execution plan as well. This too is found in the Query menu, by pressing CTRL + M, or by a button on the toolbar.
Another excellent tool to help in your choices of tools to implement is the Database Engine Tuning Advisor found under the Tools menu. Not only can this tool help with choosing proper index implementation, it can also help you with the decision to implement one of the most sought after new features of SQL Server 2005: Partitioning. But keep in mind that partitioning is only available for use in production systems in SQL Server 2005 Enterprise Edition, and there is an additional cost associated with that version of the product as compared to Standard Edition.
One last best practice to consider when looking to improve the performance of indexes, is whether or not to move an index structure to a different storage array. If you currently have all of your database objects in a single or even multiple file groups on the same array, substantial increases in performance have been obtained by migrating index structures to RAID 1 arrays. I personally have used this technique with great success.
In short, there are no hard and fast rules here. If you display the actual execution plan on a table that has just had an index applied to it and the query plan still shows a table scan is being executed, you have some choices:
- You can keep the index on the query and hope the statistics improve over time.
- You can remove the index
- You can spend more time working on the query itself.
Do not be discouraged to try new things in order to increase the quality of your database performance. Unfortunately the business has a habit of convincing themselves that databases are relatively static, and sometimes they don't quite understand "What happened? It worked great last week???!!!???" The more data you put into a database, it changes. The more functionality you add to a database, it changes. The more you take out, it changes. With these changes come the need to reevaluate indexing structures, and performance tuning and optimization opportunities.
Some excellent resources for learning the deeper components of indexes and page structures are Kalen Delaney's "Inside SQL Server 2005: The Storage Engine" on Microsoft Press. I'm also a huge fan of Itzik Ben-Gan, and he has two new books out in the same series called "Inside SQL Server 2005: T-SQL Querying" and "Inside SQL Server 2005: T-SQL Programming." These are also both on Microsoft Press, and both of these folks go into much greater detail on this subject.
ABOUT THE AUTHOR:
Laurence Schwarz, for the last 10 years, has been involved with Microsoft SQL Server, in the roles of Database Administrator, Database Engineer and Production Support Analyst.
Copyright 2006 TechTarget