When most people think of disaster recovery, their initial thoughts are about recovering from a complete disaster like Hurricane Katrina. In reality, this type of disaster is rare and often hits specific areas prone to earthquakes, hurricanes, tornadoes and other types of natural phenomena. Although it is good to be prepared for a complete disaster, the likelihood of needing this type of recovery is rare.
More often, you'll have to handle accidental data deletions, hardware failures and other application failures. While you may not classify these occurrences as catastrophes, they could easily cause serious system failures; preparing for them is as important, if not much more important, than preparing for a complete disaster. The exception, of course, is if your servers happen to be in area where natural disasters are likely to occur.
The following checklist will help you prepare for disaster recovery.
One key component to any process that needs to be done in an emergency is to have proper documentation as to what needs to occur and when. In a crisis, people usually don't think methodically and things are often done on the fly. Having a script or checklist of what needs to occur will help you stay calm in the event of a system failure. In addition to having a script to follow in a recovery event, a checklist also gives team members the information they might need for the recovery.
One pitfall is that documentation becomes outdated very quickly. This is where you should try to keep things simple, but still include enough information to perform the recovery. I have seen situations where the documentation is very thorough, but not tested. Keep in mind that documentation is just one component of your recovery process.
As I mentioned above, documentation is the starting point for your recovery. You also need to spend a fair amount of time using your documentation/checklist to practice a recovery. In some situations it will be next to impossible to carry out a full system failover, but if you never test it, you'll never know if it will work. Things can be done on a smaller scale to simulate the procedures you will follow. But, keep in mind, if you are not testing the real thing you will probably have holes in the process.
Another benefit of testing is that you will be calmer when must do a recovery. If you have already taken the steps, you will know what to expect and how long things take.
I strongly believe in scripting out as much as possible. It is much easier to apply a script to automate the recovery than it is to go through each step along the way during the process. As you are preparing your recovery checklist, identify the things that can be automated via scripts. Take the time to write them down and document their use. This will save a lot more time and confusion later when you actually need to use them.
In addition, SQL Server gives you the ability to script out just about every object that exists within the DBMS, so you should use these tools and periodically script out your objects. These things include logins, stored procedures, table definitions, triggers, DTS packages and jobs. You may never need to use all the scripts, but having them may be a lifesaver.
After recovery is complete you need some way to close the loop. There should be some type of sign-off process that signals recovery was successful. This may include several people, depending on the type of recovery. This again should be part of your checklist. You want to ensure that the recovery was successful, the data is intact and there is no chance of further data loss prior to getting the user base back on the system.
Have a list of people to be contacted in case of a failure. This should include people who need to do the recovery as well as those who must sign off after recovery is complete.You should have both primary and secondary contacts for each role on the list and at least two ways to contact each person. If you need an answer to move forward on your recovery plan, make sure you have the proper people lined up to make the decisions quickly.
Back up your database
What would a SQL Server disaster recovery plan be without database backups? Plenty of other tips talk about selecting a recovery model and backup plan. Backups are a key component and more can be found in the following resources:
Backing up SQL Server
Guide: How to properly back up a SQL Server
- Worst practices for SQL Server backup and recovery
Having the necessary hardware on hand for a failover is also very important. You may have a one-to-one failover ratio for servers, but that is not very practical for for many companies, let alone for every single server. You need to identify which hardware will be used for a failover and make sure it will be available when needed. Also, servers might fall into different classifications -- small, medium and large -- so you may need to have several failover boxes on hand.
It is also important to have the necessary media on hand in case you need to do a complete system rebuild. Software versions are always changing, so you can't assume that you will always have the exact version in your media cabinet. Keeping servers standardized will make this much simpler, but having the correct versions should be a requirement. Create a media kit that includes all of the necessary software versions and patches. You can make multiple copies; one for on-site use and another to be stored with off-site tape storage.
In addition to documenting and testing your recovery process, having a good change-control process is very important. Just think about all of the effort you put into documenting your failover process and testing it to make sure everything works without a hitch, only to have someone make a modification that invalidates all that you did. The way to combat this is by not letting it happen. Once you get a server into your recovery umbrella, make sure it is under a change-control process, so when changes are introduced, your recovery process will be modified to reflect whatever changes are necessary.
Recovery priority list
Another key piece of documentation is a priority list that designates which servers need to be recovered and the order of recovery. In most cases, it is probably not likely that all of your servers will go down, but if it did happen, it would be next to impossible to recover them all at once. Having a priority list will help you concentrate on critical servers first and leave the others for later. When preparing this list, keep in mind which servers depend on other servers. It's no use recovering a server if it can't be used until another is recovered first.
It's impossible to be prepared for every possible failure that could occur, but being more prepared versus not in the least prepared is definitely a better place to be. Take time to think about what you would do if there was a failure and begin by putting together a checklist of steps that you would follow.
SQL Server Checklists offer you step-by-step advice for administering, tuning and managing
your SQL Servers.
E-mail the editor to suggest additional checklist topics.
ABOUT THE AUTHOR
Greg Robidoux is the president and founder of Edgewood Solutions LLC, a technology services company delivering professional services and product solutions for Microsoft SQL Server. He has authored numerous articles and has delivered presentations at regional SQL Server users' groups and national SQL Server events. Robidoux, who also serves as the SearchSQLServer.com Backup and Recovery expert, welcomes your questions.
More information from SearchSQLServer.com
- Checklist: Backing up SQL Server
- Step-by-Step Guide: How to properly back up a SQL Server
- Tip: Worst practices for SQL Server backup and recovery
This was first published in December 2005