Maintaining SQL Server indexes is an uncommon practice. If a query stops using indexes, oftentimes a new non-clustered index is created that simply holds a different combination of columns or the same columns. A detailed analysis on why SQL Server is ignoring those indexes is not explored.
Let's take a look at how clustered and non-clustered indexes are selected and why query optimizer might choose a table scan instead of a non-clustered index. In this tip, you'll learn how page splits, fragmented indexes, table partitions and statistics updates affect the use of indexes. Ultimately, you'll find out how to maintain SQL Server indexes so that query optimizer uses these indexes, and so these indexes are searched quickly.
Clustered indexes are by far the easiest to understand in the area of index selection. Clustered indexes are basically keys that reference each row uniquely. Even if you define a clustered index and do not declare it as unique, SQL Server still makes the clustered index unique behind the scenes by adding a 4-byte "uniqueifier" to it. The additional "uniqueifier" increases the width of the clustered index, which causes increased maintenance time and slower searches. Since clustered indexes are the key that identifies each row, they are used in every query.
When we start talking about non-clustered indexes, things get confusing. Queries can ignore non-clustered indexes for the following reasons:
- High fragmentation – If an index is fragmented over 40%, the optimizer will probably ignore the index because it's more costly to search a fragmented index than to perform a table scan.
- Uniqueness – If the optimizer determines that a non-clustered index is not very unique, it may decide that a table scan is faster than trying to use the non-clustered index. For example: If a query references a bit column (where bit = 1) and the statistics on the column say that 75% of the rows are 1, then the optimizer will probably decide a table scan will get the results faster versus trying to scan over a non-clustered index.
- Outdated statistics – If the statistics on a column are out of date, then SQL Server can misguide the benefit of a non-clustered index. Automatically updating statistics doesn't just slow down your data modification scripts, but over time it also becomes out of sync with the real statistics of the rows. Occasionally it's a good idea to run sp_updatestats or UPDATE STATISTICS.
- Function usage – SQL Server is unable to use indexes if a function is present in the criteria. If you're referencing a non-clustered index column, but you're using a function such as convert(varchar, Col1_Year) = 2004, then SQL Server cannot use the index on Col1_Year.
Wrong columns – If a non-clustered index is defined on (col1, col2, col3) and your query has a where clause, such as "where col2 = 'somevalue'", that index won't be used. A non-clustered index can only be used if the first column in the index is referenced within the where clause. A where clause, such as "where col3 = 'someval'", would not use the index, but a where clause, like "where col1 = 'someval'" or "where col1='someval and col3 = 'someval2'" would pick up the index.
The index would not use col3 for its seek, since that column is not after col1 in the index definition. If you wanted col3 to have a seek occur in situations such as this, then it is best if you define two separate non-clustered indexes, one on col1 and the other on col3.
To store data, SQL Server uses pages that are 8 kb data blocks. The amount of data filling the pages is called the fill factor, and the higher the fill factor, the more full the 8 kb page is. A higher fill factor means fewer pages will be required resulting in less IO/CPU/RAM usage. At this point, you might want to set all your indexes to 100% fill factor; however, here is the gotcha: Once the pages fill up and a value comes in that fits within a filled-up index range, then SQL Server will make room in an index by doing a "page split." In essence, SQL Server takes the full page and splits it into two separate pages, which have substantially more room at that point. You can account for this issue by setting a fill-factor of 70% or so. This allows 30% free space for incoming values. The problem with this approach is that you continually have to "re-index" the index so that it maintains a free space percentage of 30%.
Clustered index maintenance
Clustered indexes that are static or "ever-increasing" should have a fill factor of 100%. Since the values are always increasing, pages will just be added to the end of the index and virtually no fragmentation will occur. For a more detailed explanation, see part 1 of this series, SQL Server clustered index design for performance. This index category does not need to be re-indexed because it doesn't fragment.
Clustered indexes that are either not static or "ever-increasing" will experience fragmentation and page splits as the data rows move around within the data pages. The indexes in this category have to be re-indexed in order to keep fragmentation low and allow queries to efficiently use the index.
Designing and maintaining SQL Server indexes
When you re-index these clustered indexes, you have to decide what the fill factor should be. Normally this is 70% to 80%, giving you 20% to 30% empty space for new records coming into the page. The optimal settings for your environment will depend on how often records shift around, how many records are inserted and how often re-indexing occurs. The goal is to set a fill factor low enough so that by the time you reach your next maintenance cycle, the pages are around 95% full, but not yet splitting, which happens when they hit the 100% limit.
Non-clustered index maintenance
Non-clustered indexes will always have data shifting around the pages. It's not quite as big of an issue like it is with clustered indexes -- the actual row data shifts with clustered indexes, whereas only row pointers shift with non-clustered indexes. That said, the same rules apply to non-clustered indexes as far as fill factors go. Again, the goal is to set a fill factor low enough so that by the time you reach your next maintenance cycle, the pages are only around 95% full.
Non-clustered indexes will always fragment, and to avoid this you must constantly monitor and maintain them.
Partitioned table index considerations
Partitioned tables allow data to be segregated into different partitions, depending on the data in a column. Many tables are partitioned based on date ranges. Let's say your order table is partitioned into years. Assuming the clustered index is aligned(see part 1 of this series), then you could re-index the non-clustered indexes for, say, year 2000 at 100% fill factor, since that data, technically, won't be shifting around. In this scenario, the year 2008 partition may have a fill factor of 70% on non-clustered indexes to allow for data shifts, but the year 2000 will not have any shifts and can be re-indexed at 100% fill factor so you optimize index seeks.
The same concept would apply to clustered indexes that are either not static or ever-increasing. Clustered indexes with shifting data might be set to 70% fill factor for the year 2008 partition and 100% fill factor for the year 2000.
SQL Server statistics
Statistics are maintained on columns and indexes and they help SQL Server determine how "unique" some value may be -- i.e., if statistics say a value will match approximately 80% of the rows, SQL Server will do a table scan instead. If statistics say a value will probably match around 10% of the rows, then the query optimizer will opt for a seek to minimize database impact.
SQL Server statistics can be maintained automatically or you can run them manually. Since re-indexing changes the statistics results, I recommend that after re-indexing, you manually run sp_updatestats or the T-SQL UPDATE STATISTICS command. Statistics are only maintained on the first column of any compound index, so the "uniqueness" of other columns in the index cannot be determined.
Index maintenance is critical to ensure that queries continue to benefit from index use and to reduce IO/RAM/CPU, which reduces blocking as well.
Run your queries with the option "show execution plan" turned on. If the query is not using your index, then check the following:
- Run dbcc showcontig ('tablename') to see if the table is fragmented.
- Check your "where clause" to see if it references the first column in the index.
- Ensure that your "where clause" does not have a function for the criteria for the first column of the index.
- Update the statistics just in case they are out of date. If the table is fragmented, then run this step after re-indexing.
- Make sure the criteria you are using is unique enough and that SQL Server will see a benefit in using it to search the data.
SQL Server clustered and non-clustered index design
- Part 1: SQL Server clustered index design for performance
- Part 2: Designing SQL Server non-clustered indexes
- Part 3: How to maintain SQL Server indexes
ABOUT THE AUTHOR
Matthew Schroeder is a senior software engineer who works on SQL Server database systems ranging in size from 2 GB to 3+ TB, with between 2k and 40+k trans/sec. He specializes in OLTP/OLAP DBMS systems as well as highly scalable processing systems written in .NET. Matthew is a Microsoft certified MCITP, Database Developer, has a master's degree in computer science and more than 12 years of experience in SQL Server/Oracle. He can be reached at [email protected]