Home > Checklist: How to maintain an effective SQL Server DR strategy
Checklist:
EMAIL THIS LICENSING & REPRINTS

Checklist: How to maintain an effective SQL Server DR strategy

05 Dec 2005 | Greg Robidoux, Edgewood Solutions

Expert advice on database administration
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google

When most people think of disaster recovery, their initial thoughts are about recovering from a complete disaster like Hurricane Katrina. In reality, this type of disaster is rare and often hits specific areas prone to earthquakes, hurricanes, tornadoes and other types of natural phenomena. Although it is good to be prepared for a complete disaster, the likelihood of needing this type of recovery is rare.

More often, you'll have to handle accidental data deletions, hardware failures and other application failures. While you may not classify these occurrences as catastrophes, they could easily cause serious system failures; preparing for them is as important, if not much more important, than preparing for a complete disaster. The exception, of course, is if your servers happen to be in area where natural disasters are likely to occur.

The following checklist will help you prepare for disaster recovery.
 Checklist: How to maintain an effective SQL Server DR strategy
Document
One key component to any process that needs to be done in an emergency is to have proper documentation as to what needs to occur and when. In a crisis, people usually
don't think methodically and things are often done on the fly. Having a script or checklist of what needs to occur will help you stay calm in the event of a system failure. In addition
of a system failure. In addition to having a script to follow in a recovery event, a checklist also gives team members the information they might need for the recovery.
One pitfall is that documentation becomes outdated very quickly. This is where you should try to keep things simple, but still include enough information to perform the recovery.
I have seen situations where the documentation is very thorough, but not tested. Keep in mind that documentation is just one component of your recovery process.
Practice
As I mentioned above, documentation is the starting point for your recovery. You also need to spend a fair amount of time using your documentation/checklist to practice a recovery.
In some situations it will be next to impossible to carry out a full system failover, but if you never test it, you'll never know if it will work. Things can be done on a smaller scale
to simulate the procedures you will follow. But, keep in mind, if you are not testing the real thing you will probably have holes in the process.
Another benefit of testing is that you will be calmer when must do a recovery. If you have already taken the steps, you will know what to expect and how long things take.
Script
I strongly believe in scripting out as much as possible. It is much easier to apply a script to automate the recovery than it is to go through each step along the way during the
process. As you are preparing your recovery checklist, identify the things that can be automated via scripts. Take the time to write them down and document their use. This will
save a lot more time and confusion later when you actually need to use them.
In addition, SQL Server gives you the ability to script out just about every object that exists within the DBMS, so you should use these tools and periodically script out your objects.
These things include logins, stored procedures, table definitions, triggers, DTS packages and jobs. You may never need to use all the scripts, but having them may be a lifesaver.
Sign off
After recovery is complete you need some way to close the loop. There should be some type of sign-off process that signals recovery was successful. This may include several
people, depending on the type of recovery. This again should be part of your checklist. You want to ensure that the recovery was successful, the data is intact and there is no chance
of further data loss prior to getting the user base back on the system.