SQL Azure, the cloud-based version of SQL Server, comes with a number of constraints, many of which are put in place to keep users from accessing more services than they need. One of the main SQL Azure limitations is on the size and number of databases you can create in any given account.
But those limits can be ameliorated or worked around, depending on the applications you’re creating and your needs as a database user. What’s ironic is that the best, most global solution for the problem isn’t even fully baked yet.
Basic SQL Azure limitations
SQL Azure offers two kinds of databases, Web Edition and Business Edition. The former can scale up to 5 GB; the latter, 50 GB. The names reflect their possible uses: A Web Edition database is fine for blogs or other small-scale storage, while Business Edition databases are good for starter e-commerce.
The other major limit imposed on each of these databases is on how many of each kind you can have per Azure account: Microsoft allows up to 150 Web Edition and 15 Business Edition databases. This partially alleviates the problem of running out of space in any one database, since you can continue to add databases as needed.
While the size limits on each kind of database are pretty much set in stone, the maximum number of databases per account isn’t. Those limits are “soft” -- they’re set as default constraints when you first open up an Azure account. Microsoft operates under the assumption that most people will only need a certain amount of storage space, and so they set those limits fairly high for starters. If you need to blow past those limits, Microsoft can raise them for you -- but they’ll only do it if you ask. It doesn’t happen automatically when you run out of databases.
One thing you can do without explicitly asking for it is change a given database’s edition. You can switch a given database from Web Edition to Business Edition (or back) by simply using the ALTER DATABASE command in Transact-SQL.
Right now the major differences between the two are the amount of space that can be provisioned to each type and the billing options. Though it’s possible that in the future Microsoft may build more features into Business Edition that are not available in Web Edition. If you’ve been holding back from upgrading a database to Business Edition because of the possibility that you’ll be overcharged for capacity you won’t use later on, it’s useful to know you can scale up and then back down again if needed.
Data aging and archiving
The simplest and least sophisticated approach to dealing with the space-related SQL Azure limitations is to devise a plan for archiving data if you don’t already have one. There are two basic ways to handle this:
- Move older or less frequently accessed data to another database via a daily cleanup process. Since, as described above, the number of databases available to you in Azure is not a hard limit, this approach is pretty effective. If the vast majority of your database traffic revolves around recent activity -- for instance, if people only have a historical interest in stuff that happened months ago on your website -- then you can archive by date. Likewise, you could devise a strategy in which the less often something is accessed, the more likely it is to be automatically shunted to another database that is optimized for heavy reads.
- Move older data offline entirely or simply delete it. This is only viable if a) you make the lifetime of your data clear to your users, and b) they don’t object to having things archived offline or overwritten after a certain period of time. This approach works best for data that is inherently ephemeral; it’s probably not suited for data that is of continued interest.
Using sharding against SQL Azure limitations
A second method of dealing with space-related SQL Azure limitations, and one that goes hand in hand with adding databases, is a technique called sharding. This is a way of partitioning data horizontally across databases, allowing data to be scaled across any number of databases. Sharding requires that you modify both the way your data is stored and the way your applications retrieve data, so it’s not something you can simply apply as a bandage to an existing database to make it scale.
A full overview of sharding would be far too big for this space, but a few key things about implementing sharding in Azure are worth mentioning here.
Think twice about your data and application design. There are many high-level data design considerations you need to make before using sharding in SQL Azure. An example: The app needs to assume that any one piece of data (an “atomic unit” of data, in sharding parlance) can live in any shard and has no fixed location. Also, since transactions are not supported between SQL Azure databases, transactions should be confined to within sharded objects and not used by the application at large. Finally, joins across shards are not possible (at least, not yet), so any data that will be joined must be kept in the same shard.
No native support yet. Any sharding you create in the current version of SQL Azure has to be done entirely in your own applications. SQL Azure itself doesn’t have native support for sharding yet. That said, native sharding is planned for SQL Azure via SQL Azure Federations. A program to test out Federations is set to go online from May 1 through Aug. 30, 2011, so if you’re preparing to build an application that uses sharding and want to get an early look at how it’ll work and provide feedback, sign up.
Rebalancing of data happens offline. When you add or remove shards, data may need to be moved between them for the sake of performance and behavioral consistency. Right now, any such changes will have to happen offline because of the lack of native support for sharding.
Most of the short-term problems with SQL Azure storage can be worked around by simply ramping up the number of databases or being judicious about what you keep and for now long. If you’re thinking of using sharding as a solution to storage-related SQL Azure limitations, it should be approached as a way to transcend limitations you might face with most database solutions -- a solution that will require that much more work to implement. It might well be worth the wait for Federations, which will provide consistent mechanisms for sharding, before attempting to implement such a thing by hand in your own SQL Azure apps.