News Stay informed about the latest enterprise technology news and product updates.

Q&A: Scaling out with Parallel Data Warehouse for SQL Server 2008 R2

With the official release of R2’s Parallel Data Warehouse edition coming soon, Microsoft’s Mark Kromer breaks down the management and architecture considerations to be aware of.

In the first edition of our SQL in Five series, Microsoft data platform technology specialist Mark Kromer answered five questions on SQL Server business intelligence in the cloud.

In this next segment, Mark discusses what to expect from the yet-to-be-released Parallel Data Warehouse (PDW), Microsoft’s new scale-out edition for SQL Server 2008 R2. Read on for details on how the new architecture could affect DBAs and what organizations should consider before purchasing a PDW license.

Both new editions of SQL Server 2008 R2 – Datacenter and Parallel Data Warehouse – are designed to improve the scalability of SQL Server. How do these two editions differ from each other?

Mark Kromer Mark Kromer

Mark Kromer: These are actually two very different offerings from Microsoft’s SQL Server product line. The Datacenter edition is the most scalable option for the classic SQL Server scale-up model where you would like the option of adding mounds of RAM to your Windows Server, tons of CPUs and unlimited virtualized SQL Servers. Datacenter is the edition of SQL Server for that scenario to take advantage of greater than 8 CPUs, maximum server OS RAM and unlimited virtual servers. Select this edition if you are responsible for scaling SQL Server OLTP high-volume workloads or data warehouses on large servers in your data center.

If you are scaling out SQL Server for data warehouses over 50 TBs, Parallel Data Warehouse (PDW) will be a very good fit because it will ship as an appliance-based rack solution with preconfigured racks of SQL Server. PDW will ship configured for parallel query distribution across multiple SQL Servers with redundant hardware and storage configured to maximize SQL Server speed and efficiency for very large data warehouse workloads.

Parallel Data Warehouse is based on a Massively Parallel Processing (MPP) architecture. Can you explain how this works?

Kromer: Parallel Data Warehouse is based on the architecture that came from the DatAllegro team that Microsoft acquired in 2008. Their technology has been moved forward onto the Microsoft stack and is an MPP-based architecture.

With PDW, you will be able to scale out SQL Server in a much easier way than you can today. The product will be delivered as a turn-key application-based solution and is tuned for data warehouse workloads, where queries will automatically be parallelized across the servers in the rack, providing high availability as a built-in feature. The architecture is a hub-and-spoke design where SQL Server can scale out and support over 100 TBs in a data warehouse workload. PDW is a complete solution that includes redundant server nodes for high availability and automatic query and table hashing across distributed servers in a scale-out model.

The hub-and-spoke design includes a centralized enterprise data warehouse (EDW) with a set of loosely coupled data marts. Part of the technology incorporated into PDW includes a parallel database copy capability that enables rapid data integration between spokes and the SQL Server Parallel Data Warehouse hub, making it easier to build hub-and-spoke solutions that integrate data marts with the enterprise data warehouse.

What kinds of challenges does a move to this type of architecture and edition of SQL Server pose for organizations?

If you have in-house experience on DatAllegro or other MPP database products, then you will likely experience a shorter learning curve.


Kromer: For traditional SQL Server DBA shops that are more accustomed to scale-up architectures and don’t utilize a lot of parallel distributed techniques in SQL Server today, there will be some differences in the way they interact with, administer and monitor Parallel Data Warehouse. There are new commands and services in PDW to become familiar with such as the Data Movement Service (DMS), the brains that tie all the distributed SQL Server nodes together, and the product ships as an appliance with commodity hardware.

This is a different methodology than with other SQL Server editions and may be different than the way current SQL Server teams are used to interacting with the database. If you have in-house experience on DatAllegro or other MPP database products, then you will likely experience a shorter learning curve.

How do some of the management processes change in a Parallel Data Warehouse environment that database folks might not be used to?

Kromer: Parallel Data Warehouse is a distributed SQL Server environment, so to make that management easy on DBAs, PDW will include a control rack of management servers for centralized management of the entire PDW solution. New Parallel Data Warehouse and SQL Server Dynamic Management Views (DMV) will be surfaced and there will be a Web-based admin console. In addition, you can expect to see DMVs for status on hardware and software components. There are also alerts that can be configured for enterprise monitoring tools and for alerting in the administrator tool.

You can expect to make use of the command-line tools for automation and some of the new T-SQL syntax that will be made available for this SQL Server edition around distributed and remote tables.

A Parallel Data Warehouse license is quite a bit pricier than the Enterprise or Standard editions of R2. How would you describe the type of organization that would most likely see the greatest ROI from implementing PDW?

Kromer: What Microsoft has done with SQL Server 2008 R2 is to release distinct editions -- like Datacenter and PDW -- to fit specific customer requirements, common workloads and different needs.

The first thing you should do is look at the types of solutions that you will be supporting with SQL Server. For data warehouses over 50 TBs, Parallel Data Warehouse is going to be the most scalable and best performing option. You should consider the overall total cost of acquisition, total cost of ownership and three-to-five year ROI before making a decision.

With Parallel Data Warehouse, the time and costs involved in architecting a SQL Server data warehouse architecture, high availability and storage scheme are already baked into the cost of PDW and delivered to you in an appliance format with pre-tuned and optimized sequential IO. In most cases, PDW will be a good fit for organizations that support very large data warehouses and need to account for anticipated growth..

Editor’s note: For more from Mark, check out his blog at MSSQLDUDE.

Mark Kromer has over 16 years experience in IT and software engineering and is a leader in the business intelligence, data warehouse and database communities. Currently, he is the Microsoft data platform technology specialist for the mid-Atlantic region, having previously been the Microsoft Services senior product manager for BI solutions and the Oracle principal product manager for project performance analytics.

Dig Deeper on SQL Server Business Intelligence Strategies

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.