Home > SQL Server Tips > Microsoft SQL Server > Update SQL Server table statistics for performance kick
SQL Server Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

MICROSOFT SQL SERVER

Update SQL Server table statistics for performance kick


Denny Cherry
10.24.2007
Rating: -4.50- (out of 5)


Expert advice on database administration
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


Statistics are a very important and much unappreciated part of Microsoft SQL Server. In basic terms, statistics are metadata about the data within a table and they're created and updated on a column-by-column basis, not on the entire table. The database engine employs statistics when generating execution plans that are used to access data stored within the database. For that reason, having accurate and up-to-date statistics is crucial.

In contrast, if you have invalid or out-of-date statistics, it can lead the SQL Server engine to take the wrong path to the data, and taking the wrong path means that an index scan is made when an index seek would have been appropriate or a seek is performed against the wrong index. Even worse, it would perform a table scan instead of any index operation at all.

Auto create statistics and auto update statistics

Each database has two options related to statistics: auto create statistics and auto update statistics. Typically, both should be left enabled for databases. Statistics are automatically updated when certain thresholds within SQL Server are met, i.e., the number of rows in the table increases or decreases by 10% of the number of rows the statistic was based on. Once the threshold is reached, SQL Server will automatically update the statistic(s) defined in the table.

If you have a table that stays at approximately the same number of rows, but data is added and removed on a regular basis, your statistics will become out of date rather quickly. Also, if you have a very large table, it may take quite a while to reach the 10% metric -- longer than you want in some cases. Manually updating the statistics may be necessary in that case.

Manually updating statistics

There are two ways to manually update statistics on a table. One is via the sp_updatestats system stored procedure. The second is via the UPDATE STATISTICS command. While the sp_updatestats procedure is an easier way to program the command, the UPDATE STATISTICS command is much more flexible and much more powerful. By default, statistics are compiled based on a sampling of the data in the table. The sampling amount varies depending on the size of the table, but it can be as little as just a few percent to as much as 100% of the table. Typically, the default sampling ratio is enough to generate good execution plans. But, there are exceptions to this.

For example, my colleagues and I once had a large table with approximately 12 million records in the table. The number of records in the table was fairly static. However,
More on SQL Server table statistics:
  • Expert Answer: When to update index statistics 
  • Quick steps: 12 indexing tips in 12 minutes
  • Guide: SQL Server 2005 performance tuning tools A-Z
  • approximately 1.2 million records were deleted each morning and throughout the day a new 1.2 million records were added. The primary key for this table was an identity column with a data type of INT. The other indexed column also had a data type of INT. The bulk of the table's queries were done against the secondary index. We noticed that after we deleted the data and did a normal UPDATE STATISTICS command, the statistics would be useful for about three to four hours. At that point, SQL Server would start doing index scans instead of index seeks. The massive number of queries would send SQL Server's CPU to 100% utilization.

    We decided to start updating the statistics using the FULLSCAN flag. This gave SQL Server a more realistic profile of the table, and we were able to run the UPDATE STATISTICS command only once per day instead of having to run it throughout the day when using the default sampling rate.

    How to update statistics in SQL Server

    As I mentioned earlier, there are two ways to update statistics. There's the sp_updatestats procedure that accepts a single input parameter called @resample. The @resample parameter triggers the procedure to get a fresh sample of records from the table, instead of using the prior sample of records. Even if you use the @resample parameter the ratio of records sampled to the total number of records in the table remains the same. In order to change this ratio, you need to use the UPDATE STATISTICS command with the SAMPLE parameter. The sp_updatestats procedure has the same effect as running the UPDATE STATISTICS command using the ALL keyword.

    The exception is that the sp_updatestats procedure checks the internal system tables and looks to see if the table needs its statistics updated. If the table does not, then the table is skipped. In addition to updating the statistics on user tables, the procedure also checks system tables for stale statistics and the statistics are updated as needed.

    use Northwind
    go
    exec sp_updatestats
    go

    The prefered method for updating SQL Server statistics is the UPDATE STATISTICS command. This command allows much greater flexibility when programming, however it does require more initial programming. When you fine-tune UPDATE STATISTICS to perform within your environment, you will find that it is time well spent.

    When fine-tuning the UPDATE STATISTICS command, you can choose from several options. They are the FULLSCAN, SAMPLE (PERCENT, ROWS), RESAMPLE, ALL | COLUMNS | INDEX, and NORECOMPUTE options. The first two are fairly self-explanitory. The FULLSCAN performs a full scan of the index or column in question and uses all values in the column when generating the statistic. The SAMPLE option allows you to recommend the percentage of rows or the number of rows to use as the sample when generating the statistic.

    You'll notice that it says recommend in the previous sentence. While SQL Server will attempt to use your recommended setting, if it finds that there are not enough data pages being processed for the statistic, it will automatically increase the number of pages to approximetly 1,000 data pages. If the SAMPLE size is set to zero (either ROWS or PERCENT), an empty statistic is created.

    The RESAMPLE switch has the same effect as the sp_updatestats procedure. It tells the UPDATE STATISTICS command to resample the records for the statistic. If you have a table that gets new records often, it is important to include the RESAMPLE flag when forcing statistics to be updated. This ensures that your new records are available for sampling. The ALL | COLUMNS | INDEX option specifies which statistics to update. In some cases, only your column statistics need updating, while in other cases only your index statistics need updating. In some cases, both will need updating.

    When using the UPDATE STATISTICS command, you must specify the table or view where statistics need updating. You also have the option to specify the index or statistic name to update. If no index or statistic is named, all statistics are updated (unless restricted with the ALL | COLUMNS | INDEX option).

    use Northwind
    go
    UPDATE STATISTICS TableName WITH SAMPLE 25 PERCENT, ALL
    Go

    False SQL Server statistics

    You may find that you need to test your execution plans on large tables, but you do not have large tables against which to test them. Microsoft includes a
    Visit the SQL Server IT Knowledge Exchange:
  • Denny Cherry's blog
  • Questions from your peers
  • couple of "undocummented" options for the UPDATE STATISTICS command that you can use to force SQL Server to generate statistics that make SQL Server think there is more data in the table than there actually is. These options are ROWCOUNT and PAGECOUNT. As they suggest, these options allow you to trick SQL Server into thinking that there is a specified number of records in the table or a specific number of pages being scanned for the statistic.

    Undocumented options should never be used in a production environment. They should be used in testing only, so you can check what your execution plans look like once the tables are fully populated. Attempting to use these statistics in a production environment can lead to unexpected and unwanted results.

    use Northwind
    go
    UPDATE STATISTICS TableName WITH ROWCOUNT=100000
    Go

    In the example above, I take a table with only a few records in it and generate statistics for a table with 100,000 records. Keep in mind that these options are not supported and will produce strange results from time to time.

    Viewing SQL Server statistic results

    The DBCC SHOW_STATISTICS command will show you the actual data from the statistic, which can be useful when troubleshooting performance problems. When we create our sample table:

    CREATE TABLE TableName
    (C1 int primary key)

    And we load the table with sample records:

    insert into TableName
    select object_id
    from sys.all_objects

    We then update the statistics on this table (you may need to change the name of the statistic):

    UPDATE STATISTICS TableName

    When we view the statistic with DBCC SHOW_STATISTICS:

    DBCC SHOW_STATISTICS(TableName,
    PK__TableName__0EA330E9)

    We get three result sets back:

  • The first shows basic information about the statistic:
  • Name Updated Rows Rows Sampled Steps Density Average Key Length String Index
    PK__TableName__117F9D94 Aug 20 2007 8:28PM 1813 1813 177 1 4 NO

  • The second result set gives us the scan density and average data length of the records for each column in the statistic:
  • All Density Average Length Columns
    0.000551572 4 C1

  • The third result set is the most important. It tells us the ranges of the data and the number of rows per range. (The table truncated to a few sample rows for brevity).
  • RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
    -385 11 1 11 1
    -232 11 1 11 1
    -208 23 1 23 1
    -196 11 1 11 1

    The RANGE_HI_KEY shows the higest value for that key range. The RANGE_ROWS shows the number of records between the RANGE_HI_KEY value and the new RANGE_HI_KEY value. The EQ_ROWS is the estimated number of rows matching the RANGE_HI_KEY value. The DISTINCT_RANGE_ROWS value is the estimated number of distinct values between the current RANGE_HI_KEY and the next RANGE_HI_KEY. The AVG_RANGE_ROWS is the estimated number of duplicate values between the current RANGE_HI_KEY and the next RANGE_HI_KEY.

    If you have wide ranges with few records, you may need to adjust the scan percentage of your index.

    As you may have gathered from this tip, statistic updating and tuning can be a very complex task. When done correctly, it can provide that extra bit of performance that the system may not have provided under the default statistic configuration. These techniques allow you to boost SQL Server performance and get a return on every dollar invested in your SQL Server infrastructure.


    ABOUT THE AUTHOR:   
    Denny Cherry has over a decade of experience managing SQL Server, including MySpace.com's over 175 million user installation, one of the largest in the world. Denny's areas of technical expertise include system architecture, performance tuning, replication and troubleshooting. Denny uses these skills on a regular basis in his current role as a Sr. Database Administrator and Architect at Awareness Technologies. Denny is a longtime member of PASS and Quest Software's Association of SQL Server Experts and has written numerous technical articles on SQL Server management.
    Copyright 2007 TechTarget


    Rate this Tip
    To rate tips, you must be a member of SearchSQLServer.com.
    Register now to start rating these tips. Log in if you are already a member.




    Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


    RELATED CONTENT
    SQL Server performance and tuning
    Virtual database storage for SQL Server: Friend or foe?
    Tutorial: SQL Server 2005 Analysis Services
    Tutorial: Migrating to SANs from local SQL Server disk storage
    SQL Server memory configurations for procedure cache and buffer cache
    Using the OUTPUT clause for practical SQL Server applications
    Check SQL Server database and log file size with this stored procedure
    SQL Server PerfMon counters for access methods and buffer manager
    Find size of SQL Server tables and other objects with stored procedure
    Monitor SQL Server disk I/O with PerfMon counters
    SQL Server tempdb best practices increase performance

    SQL Server overview
    How to use rank function in SQL Server 2005
    SQL Server memory configurations for procedure cache and buffer cache
    Storage area network (SAN) basics every SQL Server DBA must know
    Open SSIS packages without validation using these SQL properties
    SQL Server PerfMon counters for access methods and buffer manager
    SQL and SQL Server Tutorial and Reference Guide
    Monitor SQL Server disk I/O with PerfMon counters
    Track changes to SQL Server 2000 and 2005 with one simple utility
    Tips for scheduling and testing SQL Server backups
    SQL Server tempdb best practices increase performance
    SQL Server overview Research

    Microsoft SQL Server
    Upgrade live applications to SQL Server 2005 for high availability
    How to use rank function in SQL Server 2005
    SQL Server high availability when upgrading to SQL Server 2005
    Secure SQL Server from SQL injection attacks
    Create a computed column in SQL Server using XML data
    SQL Server memory configurations for procedure cache and buffer cache
    How insiders hack SQL databases with free tools and a little luck
    Upgrade Active/Active cluster to SQL Server 2005 and Windows 2003
    Using the OUTPUT clause for practical SQL Server applications
    Tips for moving from SQL Server local disk storage to SANs

    RELATED GLOSSARY TERMS
    Terms from Whatis.com − the technology online dictionary
    contiguity  (SearchSQLServer.com)
    contiguous  (SearchSQLServer.com)
    drilldown  (SearchSQLServer.com)
    hashing  (SearchSQLServer.com)
    hybrid online analytical processing  (SearchSQLServer.com)

    RELATED RESOURCES
    2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
    Search Bitpipe.com for the latest white papers and business webcasts
    Whatis.com, the online computer dictionary

    DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.

    HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
    About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
    SEARCH 
    TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

    TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




    All Rights Reserved, Copyright 2005 - 2008, TechTarget | Read our Privacy Policy
      TechTarget - The IT Media ROI Experts