Stretch Database, often called StretchDB, is a new feature coming in SQL Server 2016 that allows a database to...
exist partially on-premises and partially in the cloud as an Azure SQL Database. An enterprise can use StretchDB to augment an on-premises database with a cloud environment, keeping heavily used data in the on-premises instance of SQL Server while StretchDB automatically moves infrequently used data to Azure, where it can be transparently accessed when needed.
What reason is there to store some data locally and some in the cloud? Enterprise-class storage can be very expensive, so it makes sense to have that high-performance storage array serve up data that is being used often and to put data that is seldom accessed in less expensive storage (Figure 1). Perhaps the first step toward the cloud that an organization takes is placing backups in the cloud or perhaps even using the cloud as part of a disaster recovery strategy with an AlwaysOn Availability Group Replica.
StretchDB is an answer to the dilemma of whether or not to delete data
Some organizations would love to delete old data that is rarely (if ever) used, but there may be regulatory and business requirements preventing that from happening. Some organizations archive old data in a different format or application that makes it more difficult to access the old cold data. Often the business owner does not want to archive because of business or even regulatory requirements regarding the amount of historical data to keep. The default action for this group is to keep everything forever. The SAN administrators are always looking for ways to increase capacity and performance. Their default action seems to encourage deletion or archival of data. The database team is often in the middle of this tug of war.
StretchDB is an answer to this situation. Currently, an organization can decide which tables are to be stretched or to have the "remote_data_archive" setting on; the setting is at the table level, not the partition level. Two current scenarios are targeted for the first iteration of StretchDB (Figure 2). The first scenario is where there is already an archive table within the database that you could stretch. The second one is where a single table with mixed hot and cold data is defined by the organization based on a specific date or perhaps a column (e.g., an "Archived" flag).
This isn't just a way to put bits in the cloud as a pure storage play. If that were the case, then remotely storing the data would not pay big benefits without having the resources to return the data being requested. With StretchDB, compute resources via an Azure SQL database are filtering the data that is requested in the cloud.
No application changes required for StretchDB
At the time of publication, a single Azure database has a limit of 500GB. This size limit will most likely increase over time, although there will continue to be a limit. But StretchDB can take advantage of sharding to get past the size limits. Microsoft has indicated that it will make scaling the Azure back end transparent to users, allowing them to stretch as much data as they need to. As a table is being stretched, the Resource Governor also can be deployed to minimize the impact of data movement and to throttle the flow. This can be very advantageous in parts of the world where networking costs have peak and non-peak rates and availability.
StretchDB includes the following functionality:
- Cold data always online
- Significantly lower storage TCO
- Low cost compute
- Access cold data with existing applications
- Easier performance and index maintenance
- Faster backup/restore
- Automatically managed and protected cold data
When a new version of SQL Server is released, there are usually some new features that are relatively easy to implement and require no application changes. This is one of those features. SQL Server figures out if the data being accessed is local or remote and takes the appropriate action. As with any feature, there is some setup, but the steps involved are minimal. Work has also been done on the backup and restore scenarios to minimize data transfer and storage for backups. Maintenance scripts that a DBA uses do not need to be modified. Queries do not need to be modified because the optimizer is smart enough to have a different plan when accessing cold data in the cloud.
At this time, StretchDB is still being developed. The goal is to minimize limitations, but often the first iteration of a new feature will have basic limitations such as data types within the stretched table that may not be supported.
Companies can manage storage costs more effectively
Remember that the remote data has been defined as cold. If the data needs to be deleted or changed once it is stretched to Azure, it requires an administrative function. This affects the backups and indexes as well when data is modified or removed. Obviously, because the data is remote, there will be added latency when accessing cold data. The query processor is smart enough to understand whether the data is local, remote or a combination of the two.
The StretchDB enables organizations to more effectively manage storage costs by keeping hot data local and cold data in the cloud. Customers can define which tables contain cold data either by its entirety or by a specific date data element or even a flag (such as an "Archived" column). StretchDB functionality is transparent to users and developers and no code changes are required to take advantage of this feature. Even the scripts and maintenance tasks do not need to be changed because SQL Server is smart enough to know which data is local and which data is remote. StretchDB is a great example of a hybrid approach that leverages local and remote resources within a single database.
About the author
Rick Heiges is a SQL Server MVP and principal solutions architect for DB Best Technologies LLC. He works with customers educating them about the Microsoft Data Platform to solve business problems and to maximize the value of their data. Rick is very involved in the SQL Server Community via PASS where he spent nine years on the board of directors. You can read his blog on sqlblog.com and follow him on Twitter @heigesr2.
Microsoft's program managers discuss SQL Server 2016 and StretchDB
Experts share their views on the SQL Server 2016 preview and its best new features