SQL Server has long given database administrators the option of splitting up databases into different files and filegroups. When Microsoft released SQL Server 2005, it greatly expanded the options for using filegroups by introducing partitioning. Additionally, we now have the ability to do online database restores with the SQL Server 2005 engine. So with all these options available, how should you go about optimizing your filegroups for backup and restore? Let's take a look at filegroups and how to set up a back and restore strategy when they are in use.
How do files and filegroups work in SQL Server 2005? Each database is made up of filegroups. You can have several filegroups that make up your database allowing you to separate data. You could chose to separate mostly read tables from mostly write tables, or separate tables from their non-clustered indexes. You can also use table partitioning to separate data.
There are literally hundreds of uses for filegroups. Filegroups are made up of one or more physical files on the disk. Why would you have multiple files in a filegroup? Although there are many reasons, such as a full hard drive, to have multiple files; the important core item to understand is that databases are made up of filegroups and filegroups are made up of files.
How you use filegroups will depend largely on the database in question. Filegroups can be used strictly for recoverability reasons or you can use them to aid in database performance. Sometimes you end up with filegroups or multiple files because of poor capacity planning or unforeseen growth. Whatever reason you have them, be aware of how they affect your ability to backup and restore your database.
Filegroup backups in SQL Server
When backing up a database, one option is to back up a single filegroup instead of the entire database. This can be especially useful when it comes to large databases. Depending on the hardware, a large database, say around 500 GB, could take several hours to back up. In fact, I have seen a system take four to five hours to back up a database of that size. Backing up takes resources, and it may not be desirable to have a full backup job run every night for five hours.
There are several solutions to this problem. I have seen full backups scheduled weekly with a series of transaction log and differential backups performed throughout the week. This can work, but you would still require a single long window to do a full backup once a week.
More on backup and restores in SQL Server:
- Why restore a full SQL database vs. a failed filegroup?
- FAQ: Top 5 SQL Server backup and recovery problems
What if instead you break up your database into seven filegroups, all of similar size? In that case, they would all be around 72 GB, and you would back up one filegroup per night. This would reduce the long, full backup to seven shorter filegroup backups, and in the course of one week, you will have backed up the entire database. I've worked with a number of databases that contain a large amount of data, a sizable portion of which is read-only.
Due to compliancy requirements, such as Sarbanes-Oxley, a large financial database may be 600 GB or 700 GB but often is mostly historical data going back seven years or more. If you have such a database and only 20% of the data changes regularly, you may be able to gain some efficiency by using filegroups. Place the tables that change regularly in your primary filegroup and place the historical or archive tables in an archive filegroup. Now you can back up the primary filegroup daily and maybe only back up the archive filegroup weekly or even monthly.
Filegroup restores in SQL Server
Filegroup restores offer some additional complexities, especially in SQL Server 2005. You can restore a file or filegroup to a database if a single file becomes corrupt. This allows for more flexibility when working with larger databases. Assuming that you have files on different drives, a single drive failure will not necessarily require an entire database to be restored. This can save valuable time when you're recovering from a failure.
Large databases can take a long time to restore -- just as backups can -- but having multiple filegroups allows you to cut down the time it takes. Additionally, SQL Server 2005 introduced online restores. The catch here is that the database will come online one filegroup at a time. In other words, you restore the primary filegroup first and users can be allowed to access data in that filegroup while the restore continues in other filegroups.
As each filegroup restore completes, the data in that filegroup is also made available to the end users. This does require some careful planning on the part of the administrator. You need to make sure that critical data is restored first and archived, and less frequently accessed data is restored later. To do this properly, you must have an excellent understanding of your database and how it is used.
What I am about to say may make some DBAs cringe, but you must spend time talking with the developers of any applications that access the database. You need to know what data is crucial and what data can come online at a later time. Once you fully understand that, you can develop a filegroup strategy that will work well with online backups.
Planning for filegroups
When planning for filegroups, it is just as much about recoverability as it is about performance. Be sure to take into account the need to back up and restore your database. Don't get into a situation where you are using filegroups for the wrong reasons, and be sure to avoid a filegroup structure that would hinder your ability to back up or restore. Another common mistake is using too many filegroups. If you separate data in the wrong way, performance can actually be degraded. So remember to take into account performance when you are planning a restore.
Avoid working in a bubble, and that goes double when working with databases. Too many different processes and applications can touch a database. Even other databases can be dependant on your database. A plan that only optimizes for backup and restore without taking performance into account is not a good plan, and vice versa. Just be aware of the big picture before you get to the implementation phase, and take the extra time up front to prevent disastrous problems later. I am not suggesting you run out and carve up all your databases into multiple files and filegroups, but there is certainly a place and a time for their use.
ABOUT THE AUTHOR:
Eric Johnson (MCSE, MCITP: Database Administrator, MCSD) is a co-founder of Consortio Services and the primary Database Technologies Consultant. He has delivered numerous SQL Server training classes and webcasts as well as presentations at national technology conferences. He has presented at TechMentor on SQL Server 2005 Replication, Reporting Services, and Integration Services. Eric is active in the local SQL Server Community, serving as the president of the Colorado Springs SQL Server Users Group.