One of the toughest things about managing a big database is the backup window: The bigger a database gets, the longer SQL Server backups take. Ironically, larger databases like data warehouses are also the ones that want the smallest nightly maintenance windows because they're desperate for more time to load the data.
Storage area network (SAN) database snapshots give DBAs the ability to bypass the backup window altogether and perform backups inside the SAN instead of doing it at the operating system level. SAN vendors sell snapshot software that integrates with their storage networks. The software coordinates with SQL Server to temporarily stop writing to disk, then tells the SAN to make a point-in-time copy of the server's drives. This snapshot process only takes a few seconds regardless of the drive size.
As beneficial and mysterious as this feature is, I'm not going to focus on how it works. Instead, I'm going to talk about a few of the ways it doesn't work -- or needs to be worked around.
SAN snapshots can slow down writes
Even though a SAN database snapshot takes place nearly instantly, it has long-term effects. There are different ways of handling changes after the snapshot, and they have different performance impacts. Some methods involve a performance hit when writes are made, because they shuffle data around between the original copy and the snapshot. Some methods end up with fragmented data, and the SAN has to do some behind-the-scenes work to clean up the snapshot mess.
There are pros and cons to each method, but the important part is to get performance statistics both with and without snapshots. Do benchmark testing on the SAN, then take a snapshot, then do more benchmark testing. Watch how performance changes with multiple snapshots, preferably using the same number of snapshots that you have planned for the eventual production environment.
Plan filegroup & file layout carefully
SAN snapshot technology isn't as mature as SQL Server, and some of SQL's newer features outpace the development of SAN snapshot software.
For example, some vendors can only snapshot one drive at a time – not a terribly useful feature if the database server has its data and logs on two separate drives. Some SAN vendors use the same physical hard drives for all arrays, sharing spindles for better performance, and they point out that DBAs might as well put everything on the same drive anyway.
Other vendors can snapshot multiple drives at a time, but only if those drives are all on the same SAN controller. This will technically accomplish the job, but it means that the database server's drives can't be load balanced across both SAN controllers. In the case of data warehouses, this is a particularly thorny limitation: I just got done with a data warehouse implementation on a major SAN brand, and we learned about this limitation with the SAN at the last minute. At that point, we'd already purchased controllers, and if we restricted the drives to only one of the two controllers, we couldn't get enough throughput for the data warehouse. We had to make a choice between snapshots and throughput, something we hadn't planned for.
Another limitation is that not all snapshot software can handle one filegroup with multiple files on different drives. This is a common data warehouse configuration, especially with SQL Server 2005's partitioning. As the database moves to faster and faster storage, the DBA can revisit the partitioning scheme and perhaps remove the partitioning or change it to be more snapshot-friendly – but these changes take planning and forethought.
The solution is to involve the SAN vendor from the very start. Ask to talk to other snapshot customers with similarly sized databases. Talk to the company's database administrators and ask how their file structures are set up.
Database snapshots and replication: Know how they relate
Database administrators that plan to use SAN-to-SAN replication for disaster recovery need to plan their snapshot timing to match their replication strategy. Some SAN vendors replicate each snapshot: When a snapshot is made, the SAN immediately starts getting the offsite disaster recovery SAN synchronized with that same snapshot.
For data warehouses, consider doing a snapshot immediately after the nightly ETL load finishes. The offsite DR SAN will be caught up to that point, so if the primary site goes down during the business day, users can access the DR SAN without any disruptions. If, on the other hand, you made the snapshot before the nightly ETL load, the offsite DR SAN would be a day behind, and the nightly loads would have to be rerun at the DR site.
SQL Server snapshots don't replace transaction logs
Snapshots are a great substitute for full backups, but they don't replace transaction logs because they don't track individual transactions inside the database. They are a point-in-time copy, but it is only one point in time – as opposed to transaction logs, which can be restored to any point in time.
To illustrate, let's listen in on one of my recent phone calls:
Me: "Fred, I told you not to call me at this number."
Fred: "I know, but one of my developers just deleted the entire customer table."
Me: "Was this the guy I told you not to hire because he thought 'select statement' meant only drinking the expensive kind of Budweiser?"
Fred: "Yeah, yeah, yeah, but he's my best friend's son and he wasn't doing anything this summer. Anyway, it's 4 p.m. and I need that table online before I go to the golf – I mean, the meeting."
Me: "OK, but I have to restore from the noon SAN snapshot because we only take them once a day. Did you make any customer changes between noon and when he deleted the table?"
Fred: "How the heck should I know?"
SAN database point-in-time recovery would not be as useful here as a good set of transaction log backups. With transaction backups, we could restore to a specific point in time, like one minute before the developer deleted the table. The best backup strategy uses SAN snapshots in conjunction with transaction log backups, or includes another way of stepping back to one point in time.
Snapshots do solve problems
Just because I only discussed the drawbacks doesn't mean snapshots aren't a great tool in most DBA toolsheds. Point-in-time copies are a phenomenal solution to a lot of problems, but they come with their own little surprises. Design the database's snapshot backup strategy well in advance, and it'll pay off with a smooth implementation.
About the author
Brent Ozar is a SQL Server DBA based in Houston with a leading global financial services firm. Brent has a decade of broad IT experience in systems administration and project management before moving into database administration. In his current role, he specializes in database performance tuning, SANs and data warehousing. Brent conducts training sessions, has written several technical articles and blogs at BrentOzar.com. He is also a member of Quest's Association of SQL Server Experts.