In the early days of Window Azure SQL Database -- formerly called SQL Azure -- scaling out your databases was a tedious effort that impacted the connecting applications as well as the databases themselves. Since those days, Microsoft has added a new service that does the scaling for you: SQL Database Federations.
The Federations service is a managed sharding technology available to SQL Database. Sharding, also called federating, refers to the process of breaking large tables into smaller ones and distributing them across databases on multiple hardware devices, removing the size and performance limitations of a single system. By using SQL Database Federations, you can create a federation that splits your large tables across numerous SQL Databases, while hiding the complexity of the underlying structure from your connecting applications. The Federations service is not without limitations, but before we go into those, let's look at how a federation is implemented in SQL Database.
Understanding Windows Azure SQL Database Federations
A federation is a collection of SQL Databases that provides the mechanisms necessary to distribute tables across those databases. To set up a federation, you start by defining a federation object within a regular Windows Azure SQL Database. When you create the object, you must specify a federation name, a distribution key and its data type. The distribution key maps to a specific column in each federated table and determines how data is distributed across the federation. The database that hosts the federation object is considered the root database.
You can create one or more federations in the root database, but the database itself does not contain any of the actual federated data. That data is distributed among the federation members (or shards), each of which is its own SQL Database. The Federations service automatically creates the first SQL Database federation member when you create a federation. That means, whenever you add a federation, you're always starting with two databases: the root and the first member. You can then split the member to create additional members.
Once you've added the federation, you define the tables that should be federated, that is, those tables whose data should be split among the federation members. Each federated table must include a column that maps to the distribution key, and that column (the federated column) must be configured with the exact same data type as the distribution key. The Federations service limits those types to INT, BIGINT, UNIQUEIDENTIFIER and VARBINARY (up to 900 bytes).
To better illustrate how these pieces fit together, let's look at an example. Figure 1 shows a federation named BooksFed. The distribution key defined on the federation is book_id, which is configured with the INT data type. The federation includes three members, each a Windows Azure SQL Database. In addition, three tables participate in the federation: Books, Sales and Marketing. Each table includes the BookID column, which maps to the book_id distribution key. As a result, the data in the table will be distributed based on the values in the BookID columns.
By default, a federation's initial member receives all rows within a federated table. However, when you add members to the federation, you define how the data should split according to the distribution key. In our example, the first member (the one on the top) contains all rows whose BookID value is below 10,000, and the last member (the one on the bottom) contains all the rows whose BookID value is 20,000 or above. The second member contains all the rows in between. This means that a row in any of the federated tables with a BookID value of 5000 will be stored in the first member, and a row with a BookID value of 25000 will be stored in the third member.
Applications connect to the federation via the root database, and the Federations service directs the queries to the appropriate member. As a result, applications do not have to be concerned with the complexities of the underlying connections.
You can also store nonfederated tables -- called reference tables -- in the member database. These tables contain data that does not need to be federated. U.S. postal codes or state names might be good candidates for reference tables. But because reference tables are not automatically replicated across members, you'll have to set up a system to do so if you want that data available to all member databases. Also, be aware that Windows Azure SQL Database does not support cross-database joins.
Limitations of Windows Azure SQL Database Federations
If you've designed your tables with federation in mind, federating them in SQL Database should be a relatively straightforward process. But if your tables don't have this advantage, federating your databases might require significant changes to the database and application in order to work within the limitations of the Federations service. For example, federated members do not support the IDENTITY property. Not only would you have to redefine those columns, but you might also need to find another way to ensure unique values in your columns across the federation members. Additionally, members don't support the TIMESTAMP and ROWVERSION data types, meaning you'd have to eliminate those as well if they're in your database. Your federation column must be configured with the exact data type as your distribution key, so there might be some refactoring required on that end as well.
One of the greatest limitations with SQL Database is the inability to join tables across databases, a consideration particularly important when federating your tables. For example, you cannot enforce referential integrity or perform lookups if they must cross database boundaries. In addition, reference tables do not support foreign key relationships with federated tables, and federated tables require that all unique and clustered indexes contain the federation column. As a result, you cannot ensure a column is unique across a federation unless it contains the distribution key.
You have to take several other management and implementation issues into account if you want to federate your tables. For instance, if you need to change the schema across the federation members, you must update each member. The implication of this is that it's possible for different members to have different schemas. Also, your initial data load into federated tables can be a tedious and time-consuming process, as can scaling back if you need to conserve resources, a process that becomes even more complex if your schemas are out of sync.
Despite these limitations, you might still find that federations are a good fit for your organization, especially if you have the ability to design your databases with federations in mind. The Federations service lets you split your members in real time, without taking that application offline, and it does so quickly and easily. As a result, you can scale out as needed and take advantage of greater physical resources, while reducing I/O bottlenecks and database throttling. Plus, being able to scale out and scale in as needed helps lower your total cost of ownership.
In the end, you might decide that going the SQL Database Federations route is not for you. You might even find that your databases cannot be retrofitted for the federated environment. But if you can accommodate the necessary requirements, SQL Server Federations could prove a valuable resource for scaling out your cloud-based databases.
This was first published in November 2012