BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Microsoft is bringing massively parallel processing (MPP) to the cloud with Azure SQL Data Warehouse. MPP solves large data problems by distributing the data across computing resources so that each node can work on its portion of the data in parallel with all the others. Prior to the Azure SQL Data Warehouse, currently in beta, MPP had been restricted to on-premises technology. Beta user and principal consultant for Denny Cherry and Associates, Denny Cherry said, "Massive parallelism in the cloud is totally new." He added that MPP is going to be the most important feature in Azure SQL Data Warehouse. However, while participating in the ongoing limited public preview, Cherry has seen firsthand what he calls the "growing pains" of the new SQL Azure technology.
"It's definitely a product that is transitioning into something people can use," said Cherry. Microsoft is releasing updates every week as it works to bring MPP to the cloud. However, right now, Cherry still has to deal with beta testing problems like having the data warehouse restart into a new version while he is trying to work.
The Azure SQL Data Warehouse is a petabyte-scale columnar data warehouse service that is distributed, fully managed and able to shrink or grow depending on need. It is designed to work with the upcoming SQL Server 2016. Since the Azure SQL Data Warehouse is based in the parallel processing architecture of the Analytics Platform System, it is capable of integrating with the rest of the Microsoft ecosystem. Azure SQL Data Warehouse is currently in beta testing and Microsoft has not announced a release date. However, on June 24, Microsoft opened Azure SQL Data Warehouse for a limited public preview.
The preview is currently open to a small number of customers, but as new customers sign up to try Azure SQL Data Warehouse, the preview will scale up and accept more people. As a member of the Microsoft Early Access Program, Cherry has worked with Azure SQL Data Warehouse for more than two months. He is already helping two retail clients create proof-of-concept data warehouses in Azure SQL Data Warehouse.
Cherry predicted that retailers will be particularly interested in Azure SQL Data Warehouse because they are already familiar with large-scale data warehousing and have been doing business intelligence and data analysis longer than most other industries. According to Cherry, looking for long-term patterns in customer behavior is integral to success in the retail industry. Because of the ubiquity of business intelligence and big data analytics in the retail industry, companies in that sector are always looking for ways to do these things faster and cheaper.
Denny Cherryprincipal consultant at Denny Cherry and Associates Consulting
Those who are participating in the public preview have gotten firsthand experience with a number of new features beyond the massively parallel processing architecture. Microsoft claims that Azure SQL Data Warehouse as the first enterprise-class cloud data warehouse to be able to grow, shrink or pause in matter of seconds. This works because, for Azure SQL Data Warehouse, data and compute scale independently. The system uses Data Warehouse Units (DWU) as the base unit to measure the scale of nodes in use. Thus, when faster results are needed, you can pay for more DWUs and, when you need less compute power, you can decrease the number of DWUs you use. Since you pay by the DWU, this could be a major cost saver.
Furthermore, any time you don't need to actively run queries or use compute power, you can pause your nodes and remove all DWUs allocated to Azure SQL Data Warehouse while your data storage remains intact. Since the DWUs are no longer allocated to the data warehouse when you are paused, you don't need to pay for them. Because of this scalability, Cherry said, "The nice thing about [Azure SQL Data Warehouse] is that it can be used by a company of any size." He added that all you have to do to adjust the DWU is slide a bar in the Azure SQL Data Warehouse Web portal.
Columnstore indexes are another key part of getting fast query performance. Data warehouse queries are especially suited to a columnstore indexes because they often need to scan entire tables and columnstore indexes allow this to happen more quickly. Furthermore, columnstore indexes get 5x compression over row-oriented storage and up to 10x query performance improvement.
Because Azure SQL Data Warehouse is based on SQL Server's relational database engine, it includes stored procedures, user-defined functions, table partitioning, indexes and collations. This includes Transact-SQL (T-SQL), so users who are already comfortable using T-SQL can apply their knowledge to Azure SQL Data Warehouse.
For instance, Azure SQL Data Warehouse includes PolyBase, a technology that integrates SQL MPP architecture with Hadoop. It can query and import non-relational data, such as that which is held in Azure Blob storage or Hadoop's File System. Also, it uses can standard SQL Server T-SQL commands rather than one of the languages designed for use with Hadoop. Because PolyBase uses external tables to access non-relational data, the table definitions are stored in Azure SQL Data Warehouse and the data is stored on Hadoop or Azure Blob storage.
Find out how Azure SQL Data Warehouse may threaten the Amazon cloud
Learn about the new features available in SQL Server 2016
Read about the higher-performance Gen2 version of Azure SQL Data Warehouse