Problem solve Get help with specific problems with your technologies, process and projects.

Making sense of Parallel Data Warehouse for SQL Server 2008 R2

SQL Server 2008 R2 will be released in two new editions, one of which has been dubbed Parallel Data Warehouse. But should organizations care?

Those who attended the PASS 2009 summit likely saw the announcement about the new Parallel Data Warehouse features...

for Microsoft SQL Server 2008 R2 (previously codenamed Madison).

If you weren't at the summit, however, and you have a large data warehouse that needs more horsepower than a single server can provide, then the Parallel Data Warehouse edition of R2 is something that you might want to look into.

What does it mean?

What Parallel Data Warehouse does is allow you to scale your data warehouse over multiple physical SQL Servers.

The way this works is that all connections are made to a controller server. You query this controller server and it sends the requests to the child nodes which then process the request and send the results back to the controller node. It then merges the results as needed and sends the results back to the client as you would normally expect.

A basic diagram of how the system would be physically laid out is shown in Figure 1. The client connects to the controller node, which then processes the request and figures out which nodes the data is on. It then sends the requests to the worker nodes to do the heavy lifting.

Figure 1 (click to enlarge)

When working with this configuration, all queries will take a few seconds between when you run the query and when the worker nodes begin working. During this time the controller node is doing its work. Once the controller node has completed its work you will see all the CPUs on the worker nodes start processing the request.

In tests, overall query runtime can be reduced from hours to minutes because of the increased amount of CPUs, memory and disk capacity that can be made available to the query.

What does it cost?

Like all good things, there is a price tag associated with this solution. In this case that price tag is money, and lots of it. Designing a solution using the Parallel Data Warehouse feature has some major hardware requirements to go along with serious software prices.

More on SQL Server 2008 R2

A first look at R2's new features

Q&A: BI in SQL Server 2008 R2

On the hardware side, you need a controller node and at least four worker nodes, all of which must be at least quad-chip servers. You also need to have a very fast storage array such as an IBM or EMC SAN. Then there's the software licensing that is required. Each node requires SQL Server 2008 R2 Parallel Data Warehouse edition licenses which come in at $57,498 per CPU (list).

Now you can't just go out and buy Parallel Data Warehouse edition and get it up and running yourself. You have to purchase the software along with the hardware from an authorized vendor to get an authorized package. They will also refer you to an authorized consultant who can implement the solution for you.

But what you get for your money is a very, very scalable solution that allows you to run some complex queries which you'd probably never be able to run in the past because they would simply take too long to complete and would bring the data warehouse to its knees.

Why should I care?

Obviously the question becomes why you should invest in this solution. If you aren't a large company then you probably shouldn't, as Parallel Data Warehouse is simply too large and costly to make financial sense for you to deploy. If you are a larger company, however, this R2 solution will allow you to get more information from your data.

SQL Server is a data repository, and it takes processing power (and sometimes lots of it) to turn that data into information which the business unit can then use to drive the company forward. This solution will help businesses do this by allowing them to query the data in ways that simply weren't feasible before.

When you combine this new processing power with other features like PowerPivot for Excel (which is also being released with SQL Server 2008 R2), SQL Server Reporting Services and Microsoft SharePoint, you'll get some massive power for your end users. That power can enable them to solve company problems which the business unit may not even be able to see today because the information isn't available from the data.

Denny Cherry has over a decade of experience managing SQL Server, including MySpace.com's over 175-million-user installation, one of the largest in the world. Denny's areas of expertise include system architecture, performance tuning, replication and troubleshooting. He currently holds several Microsoft certifications related to SQL Server and is a Microsoft MVP.

Check out his blog: SQL Server with Mr. Denny.

This was last published in January 2010

Dig Deeper on Microsoft SQL Server 2008 R2

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchDataCenter

SearchDataManagement

SearchAWS

SearchOracle

SearchContentManagement

SearchWindowsServer

Close