As a DBA your primary responsibility is to ensure that the database is online and the data is intact. In addition to making sure the database is available, you are responsible for having a safety net in place to contend with a system failure. What type of failover solution are you relying on, and is it the best fit for your environment? Failover options can range from a simple backup and restore process to a highly sophisticated clustering model.
The problem with failover methods is that their primary purpose is to recover all of your data back to the very latest moment. This is definitely what you want if you need to fail over your entire database to another server or if you need to recover your entire database. But what about those mistakes that wipe out an entire table or, worse yet, multiple tables?
Although all of these methods allow you to have a redundant data set, they have one common flaw when dealing with lost data. Losing data usually happens when someone makes a mistake and deletes data from a table or set of tables or issues a truncate statement on a table. By the time you find out the data is missing, a bunch of other transactions have occurred which makes the recovery even more difficult.
Let's take a look at some of the failover options to see what they provide for recovering lost data:
Backup and restore
One simple approach to data recovery is using the backup and restore method. You can restore your most recent backups (including transaction log backups) to a new database and then use T-SQL statements to retrieve the lost data and repopulate the database where the data was lost. By using the point-in-time recovery feature you can restore your database to the point in time right before the delete or truncate occurred. One thing to note is that you'll need to have your database in the full or bulk-logged recovery model in order to retrieve data from the transaction log.
This method is similar to the backup and restore method, except the backup and restores happen automatically. With log shipping, you'll need to delay when the restores are occurring. If the mistake was caught before the transaction log that contains that erroneous transaction was restored, you could do a point-in-time recovery. Then just follow the same methods as above to find the missing data and restore the data to your production server using T-SQL commands.
There are two types of clustering hardware and software. The primary purpose for using clustering is to have a fast failover if your primary server fails. In order to have a fast failover solution in hardware clustering the primary data source is the same as the dataset, so you are protected against a hardware failure not a data failure. In software clustering there are usually two complete sets of data, and, again, in order for the failover to be as quick as possible, both datasets are identical. In both of these solutions, when data is deleted or truncated from the primary server, the changes are replicated immediately and the data is ultimately lost on both your primary and secondary servers. This is not a good method for recovering lost data.
Continuous data protection (CDP)
A couple of vendors now have tools that take on the guise of continuous data protection or CDP. The concept with these tools is that as data is written to the transaction log the data is then queued up for a fast point-in-time recovery. Since this is a secondary copy of the database and you have the ability to go back to a particular point in time, this option allows you to recover the lost data. Therefore, if you know when the data was deleted or truncated, you could restore a secondary copy of your database to the point in time prior to the statement being issued. Again, this method is similar to the point-in-time recovery options above, but the recovery time could be much faster.
Tools that allow you to do this include:
Although you would use replication primarily to keep multiple datasets in sync for whatever reason, you can also use it as a failover option. In most cases, replication is setup to replicate transactions almost immediately. So, in our case -- where data was deleted on one server -- this transaction would be replicated almost immediately on the secondary server and the data would be lost there as well. One option is to delay how quickly transactions are replicated, so your secondary server is not as current. This would be great if you caught the erroneous transaction right away and you were able to pull the data off the secondary server to repopulate your primary server. But, in reality, if this were a failover server you would want the data to be as current as possible. Likewise, if this were a production server that received replicated data you would almost certainly want this secondary server to be as current as possible. Replication is probably not a good option for data recovery.
Another option is to use recovery tools that allow you to read the SQL Server transaction log and then undo transactions and recover deleted tables -- even the truncated tables in some cases. Since all transactions are recorded in the transaction log, with these tools you can read through and create T-SQL statements to undo the statement or even undo a truncate table statement.
Some of these recovery tools are:
Of all the options listed, these tools offer the greatest flexibility and really fit the niche of recovering lost data. They offer the ability to find the erroneous transaction and create an undo script to undo the transaction. The one downside to undoing one or more transactions is the affect it will have on other transactions that have taken place after the erroneous transaction. By just undoing one transaction, you may cause other issues. You really need to understand how your database works in order to avoid causing more issues.
In most cases when data is deleted, truncated or accidentally updated across the board, it is often during a maintenance task. One simple way to avoid this mistake is to always use a transaction when updating data. A simple BEGIN TRAN followed by the statement is a place to start. If everything was updated correctly, a COMMIT could be issued; or, if things did not go as planned, a ROLLBACK could be issued. Here is a very simple example:
DELETE FROM tblCustomers
At this point, you would make sure it really did what you wanted and then you would issue a COMMIT to save the changes or a ROLLBACK to undo the changes.
Although mistakes like this are not always the case for lost data problems, using a simple BEGIN TRAN and a COMMIT or ROLLBACK when performing maintenance tasks could eliminate a lot of headaches.
Another good way to avoid mistakes, such as losing entire tables, is to ensure that your database is using referential integrity. Unless the affected table was at the end of the relationship, referential integrity rules would not allow these types of errors to occur. Also, make sure you are not using cascading updates or deletes, which could ripple through your entire database.
Security and permissions
One of the best ways to combat the possibility of lost data ever occurring is to make sure you grant permissions on an as-needed basis. I have seen many environments where several people have sysadmin rights to the server or db_owner rights to the database. Such environments are just waiting for mistakes to happen. Review your server and database permissions and make sure only the people that really need access have access.
As you can see from the examples above, there is no perfect way to recover lost data. But there are tools available, which I've listed above. Ideally, the mistake should be caught immediately. The longer it takes to recover the data, the harder it is to recover it. As time keeps ticking, more and more transactions are being issued against your database and you have more data integrity issues to deal with. The best way to avoid these situations in the first place is to ensure you have the proper security model setup, use referential integrity, use transactions during maintenance tasks and make sure your application is following sound development procedures.
About the author: Greg Robidoux is the president and founder of Edgewood Solutions LLC, a technology services company delivering professional services and product solutions for Microsoft SQL Server. He has authored numerous articles and has delivered presentations at regional SQL Server users' groups and national SQL Server events. Robidoux, who also serves as the SearchSQLServer.com Backup and Recovery expert, welcomes your questions.
More information on SearchSQLServer.com