In a previous article, SQL Server 2005: When and how to use Database Snapshots, we took a look at the new database snapshot feature which is available only in SQL Server 2005 Enterprise
How it works
The database snapshot creates a database shell of your existing database. Then whenever data pages are changed the changes are written to sparse files. When the data is accessed any data that has not changed is retrieved from the original database and any changed data is retrieved from the sparse files.
To illustrate this, take a look at the picture below. The illustration shows us that 90 percent of the data has not been changed, so 90 percent of the data is still retrieved from the original database files when a query is issued. The additional 10 percent of the data that has changed is retrieved from the sparse file.
Source: SQL Server 2005 Books Online
Sparse files and the database snapshot
When a snapshot is created the initial creation is very quick, because only a shell is created to maintain the data that changes. Over time as data pages are changed, their original pages -- not the changed pages -- are written to the sparse files. The more the data changes in your primary database the more data has to be written to the sparse files. Therefore this will require more disk space to maintain your primary database, as well as the snapshot database and an increase in disk I/O on the your server.
Sparse files are written in 64KB chunks. Each of these increments can hold eight 8KB data pages. So each time any data is changed on a page in your primary database, the page is first copied to the sparse file and then the changes are made to your primary data file. Once these pages are written to the sparse file they no longer need to be written out again, since the entire contents of the page are preserved from what they looked like when the snapshot was created.
In order to optimize and eliminate any disk contentions, it would be wise to create the sparse files on a separate drive and/or array than your primary database. The reason for this is two-fold.
- First when the snapshot is created, no data is written to the sparse file. All data access from the snapshot is actually hitting the primary data files. As data changes over time you can decrease some of the I/O load by having any unchanged data retrieved from the primary database files and all changed data retrieved from the sparse files, which would reside on different arrays and/or disks.
- Second, depending on the volatility your database and the amount of changes, you can decrease I/O bottleneck size by separating the reads from your primary database files and your writes to the sparse files.
Using the database snapshot
The big thing to keep in mind here is that you are still accessing the primary database to fulfill your query requests. When the initial snapshot is created, only an empty shell is created and all data requests are fulfilled by the primary database files. As time goes on and changes are made, there is now some type of split between data requests from the original database files and the sparse files. So even though it looks like there is a separate database, the underlying data is still from the primary database.
Based on this, you need to make sure not to issue other queries that are not outside the normal scope of your every day activities. Let's say you create a snapshot and you issue read writes, to everyone that writes reports. When these report queries are executed, they are still going to impact the primary database, so you need to ensure that any new activities do not impact the primary database activities.
Also, you should review the amount of data that is in the sparse files, versus the overall potential of data that could be written to the sparse files. Basically, when the snapshot is created, the size of the primary database is the potential for the snapshot sparse files. If the amount of data that is in your sparse files becomes half or more than half the database size, maybe a better approach would be to create an entire copy of the database instead of using snapshots.
Overall, I think the introduction of database snapshots is a great new feature. I still wish this was available in all versions of SQL Server 2005 and not just the Enterprise and Developer editions. One area that was not discussed was the use off snapshots with database mirroring. This gives you the best of both worlds, because the mirror is offline and you can not access the data, so this approach gives you the best of both worlds. Take the time to understand how snapshots can work in your environment and also make sure you monitor the impact of maintaining the snapshot as well as data access against the snapshot.
ABOUT THE AUTHOR
Greg Robidoux is the president and founder of Edgewood Solutions LLC, a technology services company delivering professional services and product solutions for Microsoft SQL Server. He has authored numerous articles and has delivered presentations at regional SQL Server users' groups and national SQL Server events. Robidoux, who also serves as the SearchSQLServer.com Backup and Recovery expert, welcomes your questions.
Copyright 2007 TechTarget
- SQL Server 2005 backup and recovery: 5 handy tips
- Database mirroring and its witness
- Learning Center: New tools and enhancements in SQL Server 2005
This was first published in March 2007