In my previous article, "SQL Azure migration: To the cloud and back again," I discussed the options for moving data between local SQL Server instances and SQL Azure. In this article, we will look at more complex data exchange scenarios, including data synchronization and refreshing data in SQL Azure while maintaining availability.
Implementing data synchronization typically requires some up-front analysis to determine the best process and most suitable tools and technologies for the job. Among other things, you need to consider the number of tables to synchronize, required refresh frequency (this could differ greatly among tables in the same database), application uptime requirements and size of the tables. In general, the larger the tables are and the higher the required uptime, the more work is required on your part to implement data synchronization so that it doesn’t interfere with the applications using the database.
One of the easiest approaches to data synchronization is to create staging tables in the destination database and load them with data from the source database. In SQL Azure, do this using SQL Server Integration Services or the bcp utility, as discussed in the previous article. Once the data is in staging tables, run a series of T-SQL statements to compare the data between staging and “master” tables and get them in sync. Here is a simple sequence I’ve been using successfully on many projects:
- Use DELETE FROM command to join the staging and the master table and delete all rows from the master table that have no match in the staging table.
- Use UPDATE FROM command to join the staging table and the master table and update the records in the master table.
- Use INSERT command and insert into the master tables the rows that exist only in the staging table.
If you are using SQL Server 2008 or newer, utilize the MERGE statement to combine the second and third part into a single command. The MERGE statement is nicknamed UPSERT because it combines the ability to insert new rows and update existing rows in a single statement. So, it lends itself nicely for data synchronization.
Using the technique I described works well mainly for small to medium-sized tables because the table will be temporarily locked and therefore unavailable while these updates are taking place. But in my experience, it is a minor disruption that can be mitigated by synchronizing during low database usage times or by breaking up the updates into batches. The disadvantage here is that you have to implement custom code for each table.
I’ve also had success with a slightly modified approach, which takes away the need to implement a set of scripts for each table. After you load the staging tables, execute the sp_rename stored procedure and swap the names of the master and the staging table. This can be done very quickly, even on many tables. Run each table swap inside of a TRY/CATCH block and roll back the transaction if the swap does not succeed.
Another technique many companies use to refresh data and at the same time keep it available is to simply maintain two copies of the database. One database is used by applications and the other one is used by the load process to truncate and reload tables. Once the load is done, you can rename both databases and swap them so that the one with the fresh data becomes the current database. SQL Azure initially didn’t support renaming of databases, but that feature works now. As an alternative to renaming, store a connection string pointing to a SQL Azure database in a local database and have your data load process modify the connection string and point to the refreshed database, that is, if the data load completed successfully.
Another option is to use the Microsoft Sync Framework, a platform for synchronizing databases, files, folders and other items. It allows you to programmatically synchronize database via ADO.NET providers and as of the 2.1 version, you can use the framework to synchronize between SQL Server and SQL Azure. Describing all the features and capabilities of the Sync Framework is beyond the scope of this article. For more information visit the Microsoft Sync Framework Developer Center. One of the advantages of the framework is that once you get up to speed with the basics, you can write applications that give you full control over SQL Azure data synchronization. Among other things, you will be able to utilize its features, such as custom conflict detection and resolution, and change tracking. Both of those features come in handy if you need to implement bidirectional data synchronization.
Microsoft developers used the Sync Framework to develop and release an application called Sync Framework Power Pack for SQL Azure. You can download and install this application, but first install the Microsoft Sync Framework SDK. The application runs as a wizard. After you specify the local and the SQL Azure database, select the tables you want to synchronize. You can also specify how you want to handle the situation in which the same row is updated in both databases. Figure 1 illustrates how you can choose whether the local database or SQL Azure database wins the resolution.
In the last step of the wizard, specify whether it should create a 1 GB or a 10 GB database in SQL Azure. The tool creates the specified database in SQL Azure and sets up both databases with objects needed for synchronization. It will create INSERT/UPDATE/DELETE triggers on each synchronized table. Also, for each table it will create another table with the “_tracking” suffix. It also creates a couple of database configuration tables called scope_config and scope_info. As data gets modified in either database, the triggers update the tracking tables with the details that will be used by the Sync Framework when it’s time to synchronize.
The wizard also creates a SQL Agent job that kicks off the sync executable passing the appropriate parameters. All you need to do is schedule the job to synchronize as often as needed. The tool is not super fast, but it works fairly well and, in many cases, it can handle your synchronization requirements. The biggest drawback is that when you run the wizard, it insists on creating a new SQL Azure database, and it fails if you specify an existing database. So, if you ever want to modify what tables should be synchronized, drop the Azure database and start over.
ABOUT THE AUTHOR
Roman is principal database architect at MyWebGrocer in Colchester, Vt. He specializes in SQL Server development, database performance tuning, ADO.NET and writing database tools. He regularly contributes SQL Server articles to Visual Studio Magazine, SQL Server Magazine and other technical publications and presents at user groups and conferences in the U.S. and Canada.