Structuring a big data strategy
A comprehensive collection of articles, videos and more, hand-picked by our editors
In this edition of SQL in Five, Mark Kromer, national platform architect at Microsoft, says database administrators need to start familiarizing themselves with SQL Server big data technologies.
What do you think should be at the top of every SQL Server database administrator's (DBA) to-do list in 2013?
Mark Kromer: Learn all that you can about SQL Server big data technologies and techniques. That means learn about SQL Server Parallel Data Warehouse (PDW) v2 with Hadoop. Processing enormous amounts of unstructured data from sources like social networking and Internet searches or streaming data from network equipment and other non-traditional data warehouse sources will become the norm.
Therefore, learning how to shard data for distributed processing by RDBMS [relational database management systems] like SQL Server, as well as distributed NoSQL databases like Cassandra and HBase, will start to become normal tools and technologies that DBAs will have to become familiar with. NoSQL databases with schema-less quick-append functionality sharded across distributed nodes is a very compelling architecture when collecting fast-arriving complex data sets like clickstream and log data.
The SQL Server 2012 Parallel Data Warehouse was available to be ordered at the beginning of this month. What are the types of companies and the types of use cases that fit PDW?
Kromer: Some examples that I've seen include enterprise data warehouses and SQL Server big data scenarios such as telecom call data record (CDR) data warehouses and customer analytics. Also, Web analytics where you need to provide advanced analytics across enormous data sets that require a tuned appliance architecture. PDW's distributed data models and processing engine is a perfect fit for providing insights into massive data sets that were very difficult to manage with SQL Server.
Microsoft has released a preview of Data Explorer for Excel. Where do SQL Server professionals fit into this concept of self-service business intelligence?
Kromer: Data Explorer expands upon the concept of providing business users more self-service capabilities into the Microsoft ecosystem that reside in Excel, instead of in Visual Studio. SQL Server pros can expect to see business users and analysts who currently enjoy benefits from using PowerPivot, to look at using Data Explorer in order to mash up data sets and look for a database to store those new data sets.
Microsoft has also announced the beta of the Hortonworks Data Platform for Windows. How can SQL Server pros start to educate themselves about Hortonworks and big data?
Kromer: I like Big Data University and the Microsoft pages for HDInsight - Hadoop on Windows and the Cloud on Azure. I should also note that with the success of cloud providers like Amazon, Microsoft sees a bright future for big data processing in the cloud with Hadoop on Azure. The ability to process complex data sets with a distributed file system that you can create and tear down quickly and cheaply is appealing. The alternative is to stand up local data center racks of Hadoop HDFS nodes yourself and process your data files.
The first PASS Business Analytics Conference is taking place in Chicago next month. How is the role of the SQL Server pro changing in regards to business analytics?
Kromer: Big data is very much a driver in terms of moving all SQL Server pros into business analytics. The real business value behind big data is providing insights into data that was previously not possible. Learning SQL Server big data techniques and providing roles in your organization like data scientists and statisticians [with] access into data that you used to simply archive or discard is where your company will see ROI from big data system investments. Without business analytics, big data and large SQL Server data warehouses are just costing money and time without providing business benefits. Learning analytics for classic business intelligence scenarios was always something that SQL Server pros had to contend with in the past. But now that new and more complex data sets can be leveraged with SQL Server big data infrastructures, analytics becomes even more critical as savvy analysts and data scientists will use your database systems to discover new patterns and new data insights.