Evolution of Windows Azure SQL Database
A comprehensive collection of articles, videos and more, hand-picked by our editors
As SQL Azure gets more popular and widely used, users will have to develop reliable processes for migrating data to the cloud or bringing it back, either to local servers or a data warehouse on-premises. In more complex scenarios, some companies need to synchronize the data between local and remote Azure databases.
In this article, the first in a two-part series on SQL Azure migration and synchronization, I will examine several options for moving data. The second will focus on more complex scenarios in which ongoing data synchronization is needed.
For one-directional data movement, use one of the following technologies -- SQL Server Import and Export Wizard, the bcp utility, SQL Server Integration Services (SSIS), or a community software called SQL Azure Migration Wizard. Let’s discuss these in detail.
SQL Server Import and Export Wizard. This utility in general works great for a one-time data migration, or for an occasional data refresh if you don’t mind doing it manually. The interface is simple -- you run the wizard, select the tables you want to migrate, determine their destinations and perhaps tweak column mapping. You can run it from SQL Server Management Studio and connect to SQL Azure as long as you are using the SQL Server 2008 R2 client tools. Running it is a little tricky because you will not see SQL Azure as an option for data source or destination. Instead, select the “.NET Framework Data Provider for SQL Server” option and then configure the properties dialog by supplying SQL Azure Server, username and password as shown in the dialog box in Figure 1.
If your data is sensitive, make sure to set the Encrypt option to “True” so that your data goes over the Internet encrypted. You may find that your wizard might fail because it scripts tables without indexes by default. In SQL Azure it is required for a table to have a clustered index. If you are creating a new table, the wizard doesn’t create any indexes, and data inserts fail. Therefore, you have to either create tables with a clustered index on the destination first, or in the wizard click on “Edit Mappings” for each table and manually modify the “CREATE TABLE” script to create a primary key as well.
Aside from this, my experiences with the wizard and SQL Azure haven’t been very good. Small tables migrated OK, but I was getting timeouts on larger tables. You get very little control over the handling of failures and you cannot set the batch size. For that reason, I recommend using SSIS over the wizard, especially if you have large tables and need more control over the migration process.
SQL Server Integration Services. Using SSIS with Azure is pretty straightforward, as long as you configure your connection in the way I described above. Also, you need to have the R2 version of SSIS to connect to SQL Azure. There are several differences from working with the SQL Server back end. Data transfers are much slower because you are sending data over the Internet, and also because the disk I/O in Azure in many cases doesn’t measure up to high-end database servers. You should encrypt the data, but that slows down data transfers as well.
Like with the wizard, I experienced frequent timeouts with data uploads. Keep in mind that your package might fail if there is a connectivity blip. Therefore, it might make sense to design the packages in a way so that when you restart them, they resume the work at the point of failure, as opposed to restarting all table migrations.
One way of doing that is to implement a logging table that keeps track of what tables have been uploaded. SSIS is the best tool for the job if you need to implement workflow logic, use transformations or send over data from flat files. If you use SSIS, make sure that in the Data Flow task you configure the ADO.NET destination to use the “Use Bulk Insert when possible” option. That allows you to use bulk load capabilities, and in my experience, using that option made data transfers run about four times faster. Also, you may consider changing the default Batch Size to 1,000 or so.
If you lose the connection during data upload, you will not have to start over as you would with a batch size of 0. The data would be committed to the server in batches of 1,000, and you might be able to resume transfer without starting over, as long as you can start sending data from the point where the package failed.
The bcp utility. Another option for uploading or downloading data is using the bcp utility. There is a learning curve associated with using this command-line utility. But if you are comfortable with it, there is a compelling reason for using it -- in general, bcp is the fastest way to load data. In most cases, it outperforms Data Transformation Services or SSIS. Other than that, using bcp with Azure works the same as it does when used against local servers.
SQL Azure Migration Wizard. This tool (SQLAzureMW for short) is an open source utility that can help with your SQL Azure migration. It works really well, and I found it to be much more reliable and flexible than the wizard built into SQL Server Management Studio. You can get it from the CodePlex website, including the source code. The wizard supports the migration of many types of database objects, as you can see in Figure 2.
Once you select the objects you want to migrate, SQLAzureMW scripts out the objects and modifies them behind the scenes to make the syntax compatible with SQL Azure syntax. Then it uses the bcp utility and generates a DAT file for each table, and that contains the data in binary format, as in Figure 3.
Once SQLAzureMW connects to the SQL Azure server, it recreates the objects from generated scripts. Finally, it runs the bcp utility to upload data to the cloud, as seen in Figure 4.
SQLAzureMW provides a user-friendly interface and a lot of options for migrating your data and other objects. Keep in mind though that since it generates a data file for each table, you need to make sure you have sufficient space on the disk. You might still be better off using SSIS for very large tables or for using its workflow capabilities.
ABOUT THE AUTHOR
Roman is principal database architect at MyWebGrocer in Colchester, Vt. He specializes in SQL Server development, database performance tuning, ADO.NET and writing database tools. He regularly contributes SQL Server articles to Visual Studio Magazine, SQL Server Magazine and other technical publications and presents at user groups and conferences in the U.S. and Canada.