VistaPrint, a company with site databases containing more than 2 terabytes of data, employed a third-party tool to log ship the data from its Bermuda SQL Servers to standby SQL Servers in Massachusetts. The log shipping portion of the plan was a success; however backing up the log shipped databases posed a problem.
SQL Server would not allow backups to be made of custom log shipped databases because they were in an unrecovered state. One senior DBA devised a plan to trick SQL Server into thinking it owns the database files.
Perhaps the most frustrating environments to work in are those in which improvement and automation take a back seat to consistently urgent fire-fighting tasks. Having the time, ability and capability to make process enhancements and produce beneficial tools, automation and monitoring is near and dear to my heart. I take extreme gratification in delivering time-saving, leading-edge and sometimes cutting-edge solutions using my decade-long knowledge of working with SQL Server. I enjoy building automation and tools that everyone else thinks are cool and helpful. Of course, the most obvious benefit is that they oftentimes deliver extreme ROI for the company.
Custom log shipping to standby SQL Servers
Recently, I was able to do just that when VistaPrint needed a better solution for its off-site database backup conundrum. Because VistaPrint's production SQL Server database servers are located in Bermuda and the backup servers are located in Lexington, Mass., problems arose. The company was spending in excess of $25,000 a month on bandwidth costs as well as employee time to manually copy the backup files each week to Lexington. They had to do that to ensure VistaPrint had full database backups off-island in the event of a catastrophic failure at the Bermuda data center.
VistaPrint's site databases, on average, process 26,000 orders per day. There are three main production SQL Servers (SQL 2000 and SQL 2005), not including high availability, disaster recovery, monitoring and replication distribution servers. Current hardware is AMD-based HP DL585 (eight cores) with 32 gigabytes of RAM on an EMC Clariion CX380.
I had already designed and implemented custom LiteSpeed log shipping from Bermuda to Lexington based on Quest Software Inc.'s LiteSpeed backup compression technology, and log shipping copies of VistaPrint's compressed transaction logs from the Bermuda SQL Servers to the standby SQL Servers in Lexington where they are loaded. This maintains the servers in Lexington to within 10 minutes of production data. SQL Server custom monitoring, alerts and email status reports keep everything running smoothly and include email/paging for latency of source backups or destination loads.
We already had the databases in Lexington in almost a real-time state. All we had to do was back up those databases in Lexington and we would have our off-island backups. But another problem arose: Microsoft SQL Server does not allow backups to be made of LogShipped databases because they are in an unrecovered state. If we recovered the databases to make the backups, we couldn't switch back to being unrecovered (at the same LSN), so we could no longer continue log shipping from Bermuda. We would have to initialize log shipping from scratch with full database restores, etc.
Tricking SQL Server into making full backups begins
I decided to test a process that would allow using this log shipped environment to create full backups. I first created an unrecovered database by restoring some small backups using the With NORECOVERY option. I then created a database shell with matching logical file names as the unrecovered database. Next, I stopped SQL Server and copied the unrecovered database files and overwrote the shell database files. When I restarted SQL Server, voila! The unrecovered database was still there unrecovered and the shell database was now an exact recovered copy of the unrecovered database. The shell database was tricked into thinking it owned those copied database files.
Now I could make a backup of the recovered database and have my full backup without sacrificing the unrecovered log shipping database. And, because this was saving us the step of copying backups over the WAN, I nicknamed the process the "Backup Copy Saver" (BCS).
I began building the BCS tool/process that uses several SQL instances, Robocopy and complex SQL scripts. I scheduled jobs (a fully automated process) to copy the core data files from the log shipped environments to dedicated SQL Servers (separate environments for SQL Server 2000 and SQL 2005).
A primary goal was ease of maintenance, since I know all too well what it's like to troubleshoot and fix something complex that you either didn't write or haven't looked at in a year. I also wanted plenty of monitoring and alerting, especially considering this process disables log shipping jobs and stops the SQL services during the file copies. I installed a dedicated SQL 2005 server (called the "BCS Sentry") to maintain and run the code, monitor the process and contain all the components necessary to run the BCS process.
While the BCS process executes, DB101 and SQLSentry constantly keep track of all aspects of the process and send alerts to DBAs about any errors. TechWiki pages on our intranet display the progress and a multitude of stats about the process. The entire process is fully automated, monitored and logged, and no longer requires having to manually copy backup files and use expensive bandwidth.
I decided to add additional automation (following the backup step) that does a test restore using the new backup files. It then runs a full DBCC CheckDB to check for any data inconsistencies -- those that are carried over in full backups. We now perform weekly test restores and DBCCs on all of our production database backups without impacting other environments. Since the full databases now reside in a recovered state on a separate SQL Server, they can be used for query review and other tasks without slowing down other SQL Server instances.
Primary steps for creating full SQL Server backups in a log shipped environment
- Confirm current log shipping latency is within acceptable limits
- Get date of last transaction log loaded on each database to append to the backup filenames (since data in the backup is current as of that transaction log and not the date/time on the backup file)
- Compare database logical files on source to destination since they must match
- Disable log shipping jobs and wait for their completion
- Confirm data is up to date in each database using select scripts against frequently updated tables – proof that we are backing up current databases
- Stop SQL services for both the log shipping server and destination SQL Server
- Delete destination DB files
- Copy DB files from log shipping to destination locations
- Restart all stopped SQL services
- Enable log shipping jobs
- Confirm log shipping is running again
- Confirm all DBs are recovered and usable on the destination server
- Confirm again that the data is within acceptable latency using select scripts against destination database tables -- again proving the data is current in the databases we are using for the backups -- and back up databases on destination
- Restore all the backups just created (overwrite the databases on the destination server
- Execute DBCC CheckDB against all the databases and report any errors
- Delete the older backups from previous runs only if the backup, restore and DBCC CheckDB succeeded
Another key component built into my process of creating SQL Server backup copies in a log shipped environment is restartability. I was dealing with very long duration steps -- backups, restores, DBCCs -- for many large databases, so I needed BCS to restart at any point in the process and pick up where it left off. Occasionally, network hiccups or a mistake in a restore script will cause a file copy or restore to fail. We could not afford to have the process start over from the beginning if it failed 13 hours into execution. Every step and individual database within that step is monitored and tracked so that BCS can be restarted without duplicating successfully completed work.
While these processes have obviously made things much easier for the company and those that manage this process, there was also significant savings we could point to. Due to the innovative process, we've managed to save the company $25,000 per month in bandwidth as well as the time associated with manual administration of the file copies. That adds up to more than $300,000 in one year alone. Needless to say, the process was met with enthusiasm and praise by the management team.
About the author:
Jason A. Gerrish is senior production database administrator at VistaPrint.
Discover the benefits of DBCC CHECKDB for Analysis Services 2016
MAXDOP is a new option for DBCC CHECKDB in SQL Server 2016