Home > SQL Server Tips > Microsoft SQL Server > Designing SQL Server non-clustered indexes for query optimization
SQL Server Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

MICROSOFT SQL SERVER

Designing SQL Server non-clustered indexes for query optimization


Matthew Schroeder
02.13.2008
Rating: -4.24- (out of 5)


Expert advice on database administration
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


Non-clustered indexes are bookmarks that allow SQL Server to find shortcuts to the data you're searching for. Non-clustered indexes are important because they allow you to focus queries on a specific subset of the data instead of scanning the entire table. We'll address this critical topic by first hitting the basics, such as how clustered indexes interact with non-clustered indexes, how to pick fields, when to use compound indexes and how statistics influence non-clustered indexes.

The basics of non-clustered indexes in SQL Server

A non-clustered index consists of the chosen fields and the clustered index value. If the clustered index is not defined as unique, then SQL Server will use a clustered index value plus a uniqueness value. Always define your clustered indexes as unique -- if they are in fact unique -- because it will result in a smaller clustered/non-clustered index size. If your unique clustered index consists of an int and you create a non-clustered index on a year column (defined as smallint), then your non-clustered index will contain an int and smallint for every row in the table. The size would increase according to the data types chosen. So the smaller the clustered/non-clustered index data types are, the smaller the resulting index size will be, and the maintenance capacity will increase.

Choosing fields for non-clustered indexes

The first rule is to never include the clustered index key fields in the non-clustered index. The field is already part of the clustered index, so it will always be used for queries. The only time it makes sense to include any clustered index key in a non-clustered index is when the clustered index is a compound index and the query is referencing the second, third or higher field in the compound index.

Assume you have the following table:

ID (identity, clustered unique) DateFrom DateTo Amt DateInserted Description

Now assume you always run queries such as:

Example 1:

Select *
From tbl [t]
where t.datefrom = '12/12/2006' and
t.DateTo = '12/31/2006' and t.DateInserted
= '12/01/2006'

At this point it makes sense to have a non-clustered index defined on DateFrom, DateTo and DateInserted, since that will always give the best unique results.

Now assume you run multiple queries such as:

Example 2:

Select *
From tbl [t]
where t.datefrom = '12/12/2006' and
t.DateInserted = '12/01/2006'

Select *
From tbl [t]
where t.datefrom = '12/12/2006'

Select *
From tbl [t]
where t.DateTo = '12/31/2006'

Select *
From tbl [t]
where t.DateInserted = '12/01/2006'

Select *
From tbl [t]
where t.DateTo = '12/31/2006' and
t.DateInserted = '12/01/2006'

Select *
From tbl [t]
where t.id = 5 and t.DateTo = '12/31/2006'
and t.DateInserted = '12/01/2006'

Many people, at this point, would be tempted to create the following non-clustered indexes:

  1. DateFrom

  2. DateTo

  3. DateInserted

  4. DateTo and DateInserted

  5. DateFrom and DateInserted

  6. ID, DateTo and DateInserted

You probably expect the index size to increase dramatically at this point, since you are storing DateFrom in two separate locations, DateTo in three locations and DateInserted in four locations. On top of this, you've stored the clustered index key in seven locations. This approach increases I/O for insert, update and delete operations (also known as IUD operations). Updates to the records must be written first to the clustered index data row. Then, the non-clustered indexes will have to be updated so they can be written to.

You should routinely ask yourself these questions:

  • Is the cost of additional I/O for IUD operations and maintenance worth the improved query time?
  • Will the additional I/O and increased maintenance time outweigh any performance boost I get on the queries?
  • What will give me the most unique results with the least overhead as possible?
  • In this case, the best solution would be three non-clustered indexes as follows:

    1. DateFrom

    2. DateTo

    3. DateInserted

    Each field in this scenario is only stored once, except for the primary key which is stored on all three non-clustered indexes. As a result, the index size is much smaller and will require less I/O and less maintenance. SQL Server will query each of the non-clustered indexes, depending on the criteria chosen, and then hash the results together. While this is not as efficient as Example 1, it is much more efficient than defining the five separate non-clustered indexes. Real world queries will more often match Example 2 rather than being structured as Example 1.

    SQL Server statistics

    Statistics tell SQL Server how many rows most likely match a given value. It gives SQL Server an idea of how "unique" a value is, information it then uses to determine whether to use an index. By default, SQL Server automatically updates statistics whenever it thinks approximately 20% of the records have changed. In SQL Server 2000, this is done synchronously with the IUD operation, delaying the completion of the IUD operation while the rows are sampled. In SQL Server 2005, you can have it sample either synchronously with the
    More on SQL Server clustered and non-clustered indexes:
  • Update SQL Server table statistics for performance kick
  • Clustered and non-clustered indexes in SQL Server
  • SQL Server clustered index design for performance
  • IUD operation or asynchronously after the IUD operation is done. The latter approach is better and will cause less blocking because locks will be released sooner. I recommend turning off the database setting "Auto Update Statistics." This setting will increase your server loads at the worst times. Instead of letting SQL Server automatically keep statistics up to date, create a job that calls the command "update statistics" and runs during your slowest time. You can pick your own sampling ratio depending on how accurate you want the statistics to be.

    Statistics are only kept on the first column in any non-clustered index. What does this mean in compound non-clustered indexes? It means SQL Server will use the first field to determine whether an index should be used. Even if the second field in the compound index will match 50% of the rows, the field still needs to be used to return the results (see Example 3). Now, if the non-clustered index were split into two non-clustered indexes, SQL Server might choose to use index 1, but not index 2. This is because the statistics on index 2 may show that it will not benefit the query (see Example 4).

    Example 3

    Assume you have a compound, non-clustered index defined on DateFrom and Amt.

    Statistics would only be kept on the DateFrom field within the index, and SQL Server would have to seek (or scan) across both DateFrom and Amt. Since SQL Server has to traverse more data, the query will be slower.

    Example 4

    Assume you have two non-clustered indexes: The first is defined on DateFrom and the second is defined on Amt.

    Statistics would be kept on both fields because they are separate indexes. SQL Server will examine the statistics on DateFrom and decide to use that index. It will then examine the Amt column and may decide -- based on the statistics -- that the index is not unique enough and should be ignored. At this point, SQL Server would only need to traverse the DateFrom field, rather than both DateFrom and Amt, resulting in a faster query.

    By using non-clustered indexes in SQL Server, you'll be able to focus queries on a data subset. Use the guidelines described in this tip to determine if it's best to create multiple non-clustered indexes or a compound non-clustered index. Also keep in mind the role of statistics and how they impact non-clustered indexes: Statistics affect the choice between using multiple non-clustered indexes and a compound non-clustered index in SQL Server.


    SQL Server clustered and non-clustered index design series

     Part 1: SQL Server clustered index design for performance
     Part 2: Designing SQL Server non-clustered indexes
     Part 3: How to maintain SQL Server indexes

    ABOUT THE AUTHOR:   
    Matthew Schroeder is a senior software engineer working on SQL Server database systems, ranging in size from 2 GB to 3+ TB, with between 2k and 40+k trans/sec. Matt currently works for the gaming vendor, IGT, providing services to gaming companies. He also works as an independent consultant, specializing in SQL Server, Oracle and .NET for industries such as gaming, automotive, e-commerce, entertainment, banking and non-profit. Matt specializes in OLTP/OLAP DBMS systems as well as highly scalable processing systems written in .NET. He is a Microsoft certified MCITP, Database Developer, has a master's degree in Computer Science and more than 12 years of experience in SQL Server/Oracle. He can be reached at cyberstrike@aggressivecoding.com.


    Rate this Tip
    To rate tips, you must be a member of SearchSQLServer.com.
    Register now to start rating these tips. Log in if you are already a member.


    Submit a Tip




    Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


    RELATED CONTENT
    SQL Server performance and tuning
    Tutorial: SQL Server 2005 Analysis Services
    Tutorial: Migrating to SANs from local SQL Server disk storage
    SQL Server memory configurations for procedure cache and buffer cache
    Using the OUTPUT clause for practical SQL Server applications
    Check SQL Server database and log file size with this stored procedure
    SQL Server PerfMon counters for access methods and buffer manager
    Find size of SQL Server tables and other objects with stored procedure
    Monitor SQL Server disk I/O with PerfMon counters
    SQL Server tempdb best practices increase performance
    SQL Server PerfMon counters for Windows operating system (OS)

    SQL Server database design and modeling
    Check SQL Server database and log file size with this stored procedure
    SQL Server tempdb best practices increase performance
    FAQ: SQL Server databases how-to
    How to maintain SQL Server indexes for query optimization
    How to retrieve SQL Server database disk space in use
    Maintain large SQL Server database and resolve website 'Timeout Error'
    How to construct and use SQL OUTER JOINs optimally
    How to use the LEFT vs. RIGHT OUTER JOIN in SQL
    Using the FULL OUTER JOIN in SQL
    SQL OUTER JOIN sample statements for queries

    Microsoft SQL Server
    Upgrade live applications to SQL Server 2005 for high availability
    How to use rank function in SQL Server 2005
    SQL Server high availability when upgrading to SQL Server 2005
    Secure SQL Server from SQL injection attacks
    Create a computed column in SQL Server using XML data
    SQL Server memory configurations for procedure cache and buffer cache
    How insiders hack SQL databases with free tools and a little luck
    Upgrade Active/Active cluster to SQL Server 2005 and Windows 2003
    Using the OUTPUT clause for practical SQL Server applications
    Tips for moving from SQL Server local disk storage to SANs

    RELATED GLOSSARY TERMS
    Terms from Whatis.com − the technology online dictionary
    contiguity  (SearchSQLServer.com)
    contiguous  (SearchSQLServer.com)
    drilldown  (SearchSQLServer.com)
    hashing  (SearchSQLServer.com)
    hybrid online analytical processing  (SearchSQLServer.com)

    RELATED RESOURCES
    2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
    Search Bitpipe.com for the latest white papers and business webcasts
    Whatis.com, the online computer dictionary

    DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.

    HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
    About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
    SEARCH 
    TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

    TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




    All Rights Reserved, Copyright 2005 - 2008, TechTarget | Read our Privacy Policy
      TechTarget - The IT Media ROI Experts