Home > SQL Server Tips > > Segregating data for optimum parallelization
SQL Server Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 


Segregating data for optimum parallelization


By Serdar Yegulalp, Contributor
10.04.2006
Rating: -4.67- (out of 5)


Expert advice on database administration
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


When it comes to SQL Server, most people think of parallelization in terms of making use of multiple processors or running queries side by side. In this article, however, I'll tell you how to arrange your data to get the most out of parallelization and how to segregate and partition data so it can be accessed as fast as possible.

Splitting data across physical files and filegroups

The most common way to parallelize access to things in a database is to segregate tables into multiple physical files. If you have a table that is being written to and read from much more aggressively than most other tables, it makes sense to put it into its own file so it can expand independently. One approach is just to place each table in its own physical file and I've seen some people follow this rule with frightening zeal. But if you have dozens or even hundreds of tables this can quickly become counterproductive, especially if some of them have much lower traffic than others.

To further parallelize access, files and groups can be split across multiple physical storage devices. In fact, this is the best step to take if you can manage it. Keep in mind that clustered indexes always live in the same filegroup as their attendant table, so they can't be split off. You can, however, split BLOB data, such as columns with ntext or IMAGE datatypes, into its own physical file. That way, you'll have it live on a separate storage device for better parallelism.

Segregating criteria

First, ask yourself this key question: What criteria is the best to use to split up the data and parallelize it? In the time I've been creating and maintaining databases, I've used two basic criteria to figure out how to split things up and group things together:

  1. Frequency of access. How often something is read or accessed is a factor. If you have a number of tables that tend to be read together, you can keep them in the same physical file. If the tables are updated often, though, it might make more sense to break them out into their own files.
  2. Frequency of updates. Tables that get written too frequently should be given their own separate physical files whenever it's practical, and they should have an appropriately large auto-expansion factor. This way, the table can expand or shrink on its own independently of everything else.
I also hinted above that the type of data is also a good segregation criterion, so I usually place text or BLOB data in their own physical files to allow more direct access.

Splitting data across databases

Data can of course also be split across more than one database. The most common use for setting up multiple databases that contain different parts of the same data set is when you're archiving historical data. Anytime you rotate part of a data set out of active use due to aging, put that data in its own database to be accessed separately.

One of the biggest reasons for this segregation is because of backup and restore. If something goes wrong with the historical database and you have reliable backups for it (and you'd better, or Murphy'll get you!), it's a lot easier to restore it independently without breaking symmetry with the rest of your data. Also, if you already have that data in its own database and your system design gracefully takes into account the possibility that such things can go offline, you can recover without having to take everything offline. Better to temporarily pull the plug on your archives than to force everyone to come back later.

Partitioning

Partitioning in SQL Server involves taking data normally located in one table and splitting it across multiple tables in some logical fashion. One obvious example of this (the one cited in SQL Server Books Online, in fact) is to split date-sorted data into 12 subtables, each for one month of the year. A downside of using this kind of partitioning, however, is that you often need to create views to rejoin the data using the UNION operator, which can be a performance killer.

SQL Server 2005 does allow tables to be partitioned in a few native ways that can further parallelize access. The above example -- splitting by months -- would be accomplished using a RANGE FUNCTION. You could split each month into a separate partition and then store it in a separate physical file. Another advantage to partitioning is that indexes (which can also be partitioned like this) can be rebuilt a lot more efficiently, since you can do them piecemeal instead of all at once. For more information on the ins and outs of partitioning in 2005, refer to my article Partition indexes for improved SQL Server 2005 performance.

Conclusion

The biggest reasons for segmenting data for parallelization are of course to improve performance and manageability. The latter is something that usually demonstrates itself over time, but the performance factor can be proven fairly quickly by gathering live performance statistics and comparing them to how things behaved before. Do this with live data whenever possible so you can gauge the degree of improvement you're getting after rearranging your data. The comparisons will indicate whether future similar optimizations will be worth the effort.

About the author: Serdar Yegulalp is editor of the Windows Power Users Newsletter. Check it out for the latest advice and musings on the world of Windows network administrators -- and please share your thoughts as well!

More information on SearchSQLServer.com

  • Tip: Partition indexes for improved SQL Server 2005 performance
  • Tip: Use cube partitions to improve Analysis Services performance
  • Guide: Learning Guide: SQL Server performance tuning A to Z

    Rate this Tip
    To rate tips, you must be a member of SearchSQLServer.com.
    Register now to start rating these tips. Log in if you are already a member.




    Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



    RELATED CONTENT
    Microsoft SQL Server Performance Monitoring and Tuning
    Using traces in SQL Server Profiler
    SQL Server Mailbag: CALs, witnesses and unwanted changes
    SQL Server Mailbag: Data restoration and DB property management
    Working with IntelliSense in SQL Server 2008 Management Studio
    SQL Server Mailbag: Stored procedures, triggers and SSRS reports
    Troubleshooting Distributed Transaction Coordinator errors in SQL Server
    Clearing the Windows page file and its effect on server performance
    Optimizing SQL Server indexes –- even when they're not your indexes
    Performance implications of transaction log autogrowth in SQL Server
    The short course on how SQL Server really works

    SQL Server Database Modeling and Design
    Optimizing SQL Server indexes –- even when they're not your indexes
    Top tips and tricks for SQL Server database development
    Managing the development lifecycle with Visual Studio Team System 2008
    A first look at Visual Studio Team System 2008 Database Edition
    Testing transaction log autogrowth behavior in SQL Server
    Top 10 SQL Server Tips of 2008
    Tutorial: SQL Server indexing tips to improve performance
    Tutorial: Learn SQL Server basics from A-Z
    SQL Server database design disasters: How it all starts
    Can you shrink your SQL Server database to death?

    RELATED GLOSSARY TERMS
    Terms from Whatis.com − the technology online dictionary
    contiguity  (SearchSQLServer.com)
    contiguous  (SearchSQLServer.com)
    drilldown  (SearchSQLServer.com)
    hashing  (SearchSQLServer.com)
    hybrid online analytical processing  (SearchSQLServer.com)

    RELATED RESOURCES
    2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
    Search Bitpipe.com for the latest white papers and business webcasts
    Whatis.com, the online computer dictionary

    DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



  • SQL Server Development - .NET, C#, T-SQL, Visual Basic
    HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
    About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
    SEARCH 
    TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

    TechTarget Corporate Web Site  |  Media Kits  |  Site Map




    All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
      TechTarget - The IT Media ROI Experts