A conference crawling with SQL database experts might not seem the place for any serious discussions about “big data,” the buzzword du jour describing the massive sets of information companies are grappling with. But the Professional Association of SQL Server’s PASS Summit 2011 last month became just that: Distributed computing framework Hadoop would be hooked into the upcoming SQL Server 2012, Azure and Windows Server, and Redmond would partner with Hadoop developer Hortonworks. So there you have it: Microsoft big data functionality is on its way.
In this month’s edition of “SQL in Five,” Microsoft database platform specialist Mark Kromer chats about Microsoft’s big-data ambitions and how the company plans to play in an ever-expanding market. He also discusses other newsmaking features from the PASS Summit -- new self-service business intelligence tools and improvements to SQL Azure, Microsoft’s cloud database service.
There are a lot of vendors, particularly big ones like IBM, EMC Greenplum and now Oracle, vying in the big data space. How will what Microsoft offers stack up against the competition? Why now?
Mark Kromer: Microsoft is very aggressively taking new big data products and features to market that position Microsoft IT shops and Apache Hadoop users in a very positive way. These new offerings span different Microsoft product groups including SQL Server, Windows Server HPC [High Performance Computing] and Windows Azure. In the SQL Server world, there are downloads available now that connect Hadoop to SQL Server PDW [Parallel Data Warehouse] to enable the powerful MPP [massively parallel processing] data warehouse and analytical power of PDW against your unstructured big data in Hadoop. These are some of our first capabilities available to the market specifically for the purpose of enabling big data solutions using Microsoft technologies.
The SQL Server connector and the PDW connector both use Sqoop (SQL to Hadoop) as a two-way interface into the Microsoft database engines. What is particularly unique about Microsoft’s efforts in big data is that beyond the data platform (SQL Server), Microsoft is also investing big in an alliance with Hortonworks on an Apache Hadoop implementation in the cloud with the Windows Azure platform and with Windows Server. Also, Microsoft is enabling big data implementations using Linq to HPC based on the Microsoft Windows Server High Performance Clusters. This provides Microsoft customers with a complete platform and many different options when architecting big data solutions that need to perform fast analytics on vast amounts of data, including unstructured data.
What will Hortonworks bring to Microsoft’s big data endeavor and how will customers benefit from the relationship?
Kromer: This was part of the SQL PASS Summit announcements that I referenced earlier which creates a partnership between Microsoft and Hortonworks to bring Apache Hadoop-based distributions to Windows Server and to Azure. The reason for these announcements to come out of the PASS Summit is mainly due to the fact SQL Server is the primary product, tool and engineering team for the Microsoft data platform. This gives Microsoft the unique position of being able to provide a complete solution for big data requirements including the distribution, MapReduce and analytics.
Microsoft released the beta versions of connectors to Hadoop for SQL Server and Parallel Data Warehouse late this summer and will now include these capabilities for SQL Server 2012. What makes the upcoming version an ideal vehicle for delivering big data functionality?
Kromer: If you consider an end-to-end big data solution, you would want to include the scaled-out distributed file system, such as HDFS [Hadoop Distributed File System], then the mapping of your data, that is, MapReduce, along with providing end-user analytics that provide that IT value back to your business. With Microsoft, you will be able to use an on-premises Hadoop implementation or Hadoop in Azure and then store the analytics and mined data in SQL Server or SQL Server PDW. In SQL Server 2012, you can then provide end-user analytics in a visually rich, ad hoc, self-service environment using PowerView [formerly Project Crescent], leveraging SQL Server 2012’s column-based storage and BISM [Business Intelligence Semantic Model] in-memory analytics for super-fast reporting against big data.
Data Explorer, designed to help organizations discover, enrich and s hare data, was also unveiled at PASS Summit, and PowerViewwas given touchscreen capabilities and demoed. There are clear advantages to the end user, but what role does the SQL Server professional play in organizations employing these tools?
Kromer: These tools are extensions of the concepts that the Microsoft SQL Server product team started in SQL Server 2008 R2 and what has been ostensibly marketed as “self-service BI.” A tool like PowerView is targeted at end users to perform self-guided data analysis and exploration using in-memory analytics. Essentially, they’ve taken PowerPivot and applied steroids and Silverlight. SQL Server DBAs [database administrators] and developers will need to ensure that they become familiar with the new BISM data modeling techniques, which is an alternative way of creating analytical models in SQL Server Analysis Services. The UDM [Unified Dimensional Model] modeling that exists today in SQL Server will still be there in SQL Server 2012. But these new capabilities are geared much more toward business analysts, Excel and Microsoft Office users and make extensive use of SharePoint as the delivery vehicle for reporting.
There were a slew of additions to SQL Azure announced at the conference -- an increase of database capacity to 150 GB, the availability of SQL Azure Federations and SQL Azure Data Sync among them. How do these features fit into Microsoft’s overall cloud strategy?
Kromer: These were all features that have been long awaited and often requested from many of the SQL Azure users that I work with on a daily basis. A common use case that I run across that drives customers to use SQL Azure over traditional on-premises SQL Server, besides cost advantages, is the ability to now scale out SQL Server using SQL Azure with SQL Federations. Personally, I have been very excited to see this capability grow and evolve. Being able to partition across databases (sharding) in the cloud is a SQL Azure-specific implementation that is fully supported in Transact-SQL. As we talked about earlier, this fits well within Microsoft’s public cloud strategy of enabling highly scalable and affordable data-based applications based on the Microsoft platform.
Mark Kromer has over 16 years experience in IT and software engineering and is well-known in the business intelligence (BI), data warehouse and database communities. He is the Microsoft data platform technology specialist for the mid-Atlantic region.