There is one task I hope you never face: May you never have to recreate data at different points in time in order to unfold a suspicious act or uncover lost or stolen data. Most databases store data at a core data level, which reveals only the most recent state of the data to the end user and the database administrator. This issue of only having the latest version of the data leaves you unable to identify the whereabouts of particular data at different points in time during the data lifecycle.
As a DBA and a consultant, I see a lot of databases that store only a current snapshot of data and do not store each historical row of data as it changes through the life of the data. In most cases, this is great for the database, because every iteration of every transaction would require that the size of your database be 100 to 1,000 times the size of the current database. And it is because of this need to keep the database at a manageable level that the historical rows of data are often not stored and therefore not easy to recreate.
The financial industry has taken the opposite approach. Instead of only storing the most recent state of the data, it stores each transaction as it occurs and has reversing entries to undo the change. This approach means that the data is written, but it's never changed. Any historical point in time can easily be displayed without the need for other options to recreate the data. In its pure sense, tracking financial data changes is not as intensive as other data you may store in your database. That said, you should investigate which historical changes you need to track and trace versus which type only requires you to have the most recent version.
There are tools in the marketplace, such as Lumigent Technologies' AuditDB and Idera's SQL compliance manager, that allow you to capture every iteration of every single change that's ever occurred on your database. It takes an extremely large space to store the data, and only with tools such as the ones mentioned above would you be able to trace the various states of your data as it changes over time -- unless you modified your application to store each historical
row of data. Of course, there are other options, such as using triggers to capture each data change, but, again, your storage requirements become quite large, as does the overhead that is placed on your server when using triggers.
Without using a tool or modifying your application to capture each historical row of data, you're left with the painful and sloppy alternative of attempting to recreate the data. In years past, I was tasked with recreating healthcare records from the past several years in order to uncover some suspicious activities that occurred. At that time, the tools mentioned above did not exist and attempting to use triggers, along with the additional storage requirements, was not an option.
The process of recreating each historical view of a particular data set began with the retrieval of the archived backup tapes. To our dismay, we were notified that only one tape per month was archived for long-term storage and therefore the best we could do was create a snapshot one month at a time. As we began to restore the tapes, we were again unpleasantly surprised that some of the tapes were not readable. The database size at the time was only about 10 GB, but the need to restore over and over again and also capture the data we needed required us to restore in place since these were the days of 9 GB drives and an abundant amount of storage did not exist. Today, 10 GB is minuscule. Databases are now in the 100 GB to 500 GB range. So, even though much larger drives exist, the overall problem still remains.
I know the task of recreating historical data is not something that occurs all that often, but I also know I have been faced with this challenge on a couple of occasions. As a DBA, it is your responsibility to protect the data and to assist your company in reproducing as much data as possible. In order to understand the true requirements and the importance of the data, you must ask questions that will help determine the requirements. Based on what you learn, put measures in place to ensure you can recreate what is needed.
Once again, here are the three options to take into consideration so you know what is possible and what is not possible:
- Third-party tools, such as Lumigent's AuditDB or Idera's SQL compliance manager
- The use of triggers or other application modification
- The backup and restore method
Depending on the option you select, you'll need to understand what is possible and what is not possible. By using a third-party tool, you have the ability to recreate every change that has ever occurred. These tools have built in processes to minimize the performance hit on the server and database, and they let you selectively capture the important data. Using triggers or other application changes is another good option, but if you have a very busy system, you'll see a very big performance hit if you implement this measure.
The last approach of using backup and restore needs to be investigated in order for you to understand long-term backup storage. Find out how long backups are stored, what types of backups are stored and also the likelihood that you can restore all of the tapes. Even if there are only full backups once a day, you still have the potential for missing any changes that were made and then unmade during a particular day. In the case I was involved in, there was a lot of activity that may have occurred from month to month that could never be recreated.
The days of recreating computer records based on a paper trail no longer exist. More and more information is captured online only. Without the proper measures in place, data can be lost forever without anyone ever knowing what happened. As a DBA, you need to understand your role for keeping the systems online and as a protector of the data and potentially the entire company.
ABOUT THE AUTHOR
Greg Robidoux is the president and founder of Edgewood Solutions LLC, a technology services company delivering professional services and product solutions for Microsoft SQL Server. He has authored numerous articles and has delivered presentations at regional SQL Server users' groups and national SQL Server events. Robidoux, who also serves as the SearchSQLServer.com Backup and Recovery expert, welcomes your questions.
Copyright 2006 TechTarget