When it comes to raw database performance, one of the biggest performance gains can be achieved through beneficial indexes. The indexes improve access to the data so you don't have to scan an entire table, a costly endeavor in terms of CPU, IO and memory usage. Over time, indexes can become fragmented, yielding lower then optimal SQL Server performance, longer transaction times, blocking, locking and lower throughput.
The extent of index fragmentation depends on table fragmentation, which can be caused by page splits. A page split occurs when an insert or update statement is issued, yielding data from one page that exceeds the total capacity for the page. An overflow to another page is necessary to support the data storage. When the page split occurs, SQL Server leaves the first half of the data on the original page and splits the remainder of the data to another page. This is operation is costly in terms of time and concurrency. The page splitting can ultimately be caused by the index fill factor and by the pad_index being high (i.e., closer to 100) and compacting the pages too tightly.
Resolving index fragmentation is both an art and a science, but it is also a long-term initiative to ensure high performance as the application environment changes. In the tips below, we will explore these items:
| TABLE OF CONTENTS
Understand the data
Know how to create indexes
Identify fragmented indexes
Rebuild fragmented indexes
Configure your database
First and foremost it is imperative to understand the data that is being stored and how the data will be accessed. For new applications, speak with users and application developers about their application flow and data usage requirements -- data access (SELECT) and data manipulation (INSERT, UPDATE and DELETE). For an existing system, issue an sp_depends, which will provide the database objects related to a table. If any of the T-SQL is embedded in the front end, middle tier or ad-hoc code, speak with the application developers to gain insight into their needs. Also speak to users to understand the application issues they experience. In either case, leverage tools or scripts to better understand the business flow and potential for beneficial indexes, as well as to gain validation from users and developers.
Once the data is understood, it is time to design or redesign the indexes. In SQL Server 2000 Books Online, there is a good article about index design that I encourage you to review. As a first rule of thumb, indexes should be considered for columns used in JOIN conditions as well as WHERE, ORDER BY and GROUP BY clauses. Read the article for additional recommendations.
A CRUD chart (Create, Read, Update, Delete) can also help you gain insight into your indexing needs based on data access. Although building a CRUD chart can be a daunting task (most people don't even know how to begin one), creating a chart for key tables may help with overall indexing; since most systems have performance problems related to a subset of the tables, focusing on these will improve the performance with the least amount of effort. The following is a sample CRUD chart for a single table (Customer) by stored procedure\ad-hoc and operation (JOIN, WHERE, ORDER BY, etc.):
This chart provides a scientific understanding of indexes on a per-table basis. From this data, you need to balance index needs based on transaction types, which is more of an art form. Indexes typically benefit SELECT commands, but they can hinder INSERT, UPDATE or DELETE statements because SQL Server has to maintain the index during these operations, yielding additional overhead.
Another critical piece of data is the frequency (daily, weekly, monthly, quarterly, yearly, never) of index rebuilds. The more frequently indexes are rebuilt, the more aggressive the index approach can be. If a system needs to run 24/7 for the next three years to support a single function, the indexes must be designed differently than a database supporting multiple functions with indexes rebuilt on a weekly basis. Nevertheless, systems are not stagnate and they will change over time based on shifting business needs. It pays to plan for these items or to make corrections that will be proactive for the future.
Check out another resource index and database design: SQL Server's DaVinci code for data modeling
SQL Server has two types of indexes: clustered and nonclustered. I would refer to the two previous for detailed information from SQL Server 2000 Books Online, because this is a topic unto itself. In a nutshell, a table can have no indexes, a clustered index, a clustered index and a nonclustered index or just nonclustered indexes. A single clustered index can be created for each table and the data is physically sorted based on this column. For a table without a clustered index, the data is stored based on the order of the data entry.
The two primary configurations for a CREATE INDEX that pertain to index fragmentation are:
If you are not sure of the fragmentation level over a specific interval (i.e., day, week, month, quarter, semi-annual, annual, etc.), a good starting point for the fill factor configuration is 80%. Monitor the fragmentation to determine the ideal configuration on a per-table\clustered index basis; the amount of storage required can be dramatically different for configurations with a 5% to 10% difference. The lower the fill factor (i.e., 40% to 50%), the more storage is needed and the more pages an index will have to scan or seek to fulfill the query request. With a high fill factor (90% to 100%), less storage is needed and fewer pages are scanned, but costly page splits can occur causing performance degradation and index fragmentation. The recommendation? Find the ideal fill factor to avoid page splits and fragmentation while not having to support an excessively large database.
The key SQL Server command to identify index fragmentation is DBCC SHOWCONTIG. The following is sample code to determine the index fragmentation for the Authors table in the Pubs database:
USE Pubs GO DBCC SHOWCONTIG ('Authors') GO
To determine the index fragmentation for the clustered index on the Authors table in the Pubs database, issue the following command:
USE Pubs GO DBCC SHOWCONTIG ('Authors',1) GO
As a point of reference, the 1 value indicates the clustered index. A value of 2 to 255 indicates a specific nonclustered index.
One rule of thumb for determining whether or not to rebuild an index is when the scan density is less than 90%.
Getting back to understanding your data from tip 1: If you notice that particular tables consistently have low scan density, consider lowering the fill factor and pad_index in 5% to 10% increments to improve the fragmentation.
With the rate of data growth yielding larger databases and businesses requiring more uptime, every second counts during the maintenance window. Leverage your backup server or restore a recent backup of the production database to a development/test server and then issue DBCC commands. That will give you insight into the database fragmentation at a point in time, so the maintenance window can be focused on rebuilding indexes. Once you have this data, you can issue one of the commands listed in tip 4 to rebuild the fragmented indexes. Then reissue DBCC SHOWCONTIG to validate that the index fragmentation has been corrected. This will validate your scripts and provide a rough estimate on the time needed for the production system.
Index maintenance is the key to ensuring that indexes are optimally configured. The "Index rebuild options" table outlines typical ways to maintain your indexes.
Related to index fragmentation is disk-level fragmentation, a non-contiguous file system based on file deletions and reorganization of the logical disk file system, which can yield poor performance for I/O intensive processes. The following recommendations minimize the amount of disk-level fragmentation:
Index fragmentation is critical to a high-performance database. Identifying the beneficial indexes and maintaining them over time will ensure high performance throughout the life of the application. Good luck!
About the author: Jeremy Kadlec is the principal database engineer at Edgewood Solutions, a technology services company delivering professional services and product solutions for Microsoft SQL Server. He has authored numerous articles and delivers frequent presentations at regional SQL Server users groups and nationally at SQL PASS. Kadlec is the SearchSQLServer.com Performance Tuning expert. Ask him a question here.
More information from SearchSQLServer.com