There are several components to your disaster recovery plan. Enabled people, processes and technology are areas that rely upon one another. I'll discuss the level of importance all three have on a successful DR plan.
The reason for creating a DR plan is to ensure that your systems will continue to run as needed in case of any failure, whether large or small. In most cases, disaster recovery scenarios never get exercised. They act more like an insurance policy minimizing data loss and/or downtime to the smallest amount possible in the case of a failure.
Before you begin putting together a realistic DR plan, you need to think about two things: budget and need. Just about anything is possible if you have enough money, but your budget is one area that often causes you to implement a less than ideal disaster recovery solution. For the second element, need, ask yourself if your disaster recovery plan is the proper plan for meeting your business goals. In many cases, disaster recovery plans are over-engineered because of a budget surplus or the strategists do not understand the true business need. Or, the plan is under-engineered because of a lack of dollars or, again, the assessment of the real business need wasn't thorough enough.
Before you begin selecting and implementing a plan, answer these two questions:
Once you answer these questions, you can begin to formulate the type of solution you need as well as the costs. Oftentimes when speaking with clients the first answer is we cannot afford any data loss and we do not want any downtime. When you discuss the dollars needed in order to meet their criteria, the client's tune usually changes and so does the plan. Next, the real analysis begins.
So let's take a look at the three areas -- people, process and technology -- to see what needs to be put in place in order to have an effective DR plan within your budget.
The people aspect is the most important part in the beginning stages of your disaster recovery plan. The reason for this is the amount of effort that needs to go into planning, assessing, writing processes, selecting the technology and testing it. The level of people resources to monitor will be determined by the amount of money you put toward your technology component.
The amount of money you put toward technology will determine the level of people resources to monitor, failover and failback should go in the opposite direction once the initial stages have been fully researched and a process selected. From the failover and failback aspect, and depending on what method you use, your need for people resources will be either high if you use a low-tech solution or low if you employ a more high-tech solution. The primary reason for this is the level of automation and monitoring that is usually part of the solution. Unfortunately, with a high-tech solution, often you'll need more experienced people to implement it. But, in all respects, a failover situation is not something to be taken lightly and, therefore, a high level of expertise needs to exist regardless of the solution.
Keep in mind that a less automated approach means more people time is spent on the failover process and much more time is spent on the failback process. Also, depending on the complexity of the failover, such as a multi-database or multi-server failure, you may face an increased need for more resources. One person can only do so much, and if you consider that scenario when implementing your plan, it will pay off.
There are several procedures and processes you should address and document in your DR plan. Writing it all down is a very time-consuming people component, but it's a very valuable one in a time of crisis. Once you document the procedures, this task is not complete. As your environment changes, you must continue to update the procedures.
Here are some items to include in your process:
- Plan and assess your needs and budget.
- Create an escalation list of staff members to contact at critical points of failure.
- Create a priority list of your servers and possibly to the database level in priority rank, so you know which needs to address first in the case of a multi-component failure.
- Establish SLAs that provide realistic guidelines with your user community if a system goes down. For instance, how much time will it take to recover based on your needs and budget? You should do that across all of your servers because if there is a widespread problem, multiple groups will probably be involved.
- Develop roles and responsibilities regarding who is responsible for what aspect, so when a crisis does arise, there should be no arguing over who is responsible for handling any particular issue.
- Create an audit/change log of all servers so you can go back and see what items have been updated at a server and database level.
- Break down failover procedures based on your technology solution into the following groups:
- One database
- Multiple databases
- Entire instance
- Entire server
- Grouped application servers -- applications that work as a team (Web servers, app servers, database servers, etc.)
- Entire datacenter
- Break down failback procedures based on your technology solution, which are based on the groups above.
- Assess testing procedures: how often, what is involved and what actually constitutes a valid test.
- Schedule a DR review process, which may be once a quarter or once a year at which time you should assess your plan to ensure it still meets your overall business needs.
After the cost of people who will plan, implement and monitor your failover solution, the technology factor is the most expensive component. Just how expensive it is depends on the solution you select for failover. There are several ways of handling failover, from the simple to complex and from inexpensive to very expensive.
The following table lists the different options that are available:
|Replication||Medium||Medium||Medium with manual processing||Slow with manual processing|
|Continuous Data Protection||Medium||Medium||Medium||Slow|
|Backup and Restore||Low||Low||Slow||Slow|
|Database Mirroring||Low||Low||Fast, but only at the database level||Fast, but only at the database level|
Before you can have an effective disaster recovery plan, assess your need and the budgetary dollars you can allocate toward your failover solution. Based on those needs, select the appropriate technology solution and begin to wrap your processes around the selected technology. Be sure your staff is properly trained on the technology so implementation is done correctly. They must understand the failover process and also understand how to failback to your primary servers when needed. Take the time to understand the true business need and then develop your strategy to meet the need.
About the author: Greg Robidoux is the president and founder of Edgewood Solutions LLC, a technology services company delivering professional services and product solutions for Microsoft SQL Server. He has authored numerous articles and has delivered presentations at regional SQL Server users' groups and national SQL Server events. Robidoux, who also serves as the SearchSQLServer.com Backup and Recovery expert, welcomes your questions.
More on SearchSQLServer.com