BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
SQL Server 2012 Fast Track Data Warehouse (FTDW) defines a reference architecture for building high-performance SQL Server data warehouses on pre-certified hardware from Microsoft partners, such as HP, Dell, IBM and Cisco. The FTDW program represents a collaboration between Microsoft and these companies that has resulted in a comprehensive, step-by-step guide that defines a resource-balanced configuration model for implementing a symmetric multiprocessor (SMP)-based SQL Server database system.
At the heart of the program lies the publication Fast Track Data Warehouse Reference Guide for SQL Server 2012, available for free on Microsoft's FTDW website. The paper outlines a component architecture and methodology for building data warehousing platforms that balance SQL Server data processing with component hardware throughput. The FTDW methodology defines the minimum system architecture necessary to implement and maintain a baseline level of performance for various data warehouse workloads.
A balanced approach to data warehousing
An FTDW data warehouse platform is made up of a number of hardware and software components. The hardware must at least include a server, storage system and the network components to connect the two. The software must include the operating system and database management system -- in this case, Windows Server and SQL Server -- along with the software necessary to facilitate communication among the components.
The reference guide offers a basic methodology for deploying an FTDW with a balanced hardware and software configuration that supports a data warehouse workload. Each component is optimized to work in conjunction with the other components to achieve the greatest throughput. The goal is to reach an efficient balance between hardware resources and SQL Server processing.
The hardware configuration is defined by a specified workload and core set of configuration, validation and database best practices. To this end, FTDW provides specific benchmarks based on real concurrent workloads, along with detailed and validated hardware specifications, with an eye toward a component architecture that balances database capabilities and hardware resources. The reference guide offers a practical framework for reaching an effective balance among the key components of the data warehouse platform.
The data warehouse workload
A database workload can vary significantly from one application to the next. For example, a database that supports online transaction processing (OLTP) generally requires small lookup and data modification requests. The database is usually heavily indexed to support the low-latency retrieval of relatively few rows. However, such workloads can require significant disk head movement and generate random I/O scan patterns.
Data warehouse operations, on the other hand, often result in large, scan-intensive requests, in which case the system can benefit from sequential disk scans and their potentially higher throughputs. The FTDW methodology takes into account the specific needs of the data warehouse workload and emphasizes a balanced component architecture that aims for consistently high I/O scan rates, rather than focusing on operations per second, as is typical when evaluating OLTP databases.
Because FTDW aligns database files and configurations with efficient disk scans (as opposed to seeks), the number of disks needed to process a given workload is reduced. The FTDW methodology prioritizes disk scan performance over other considerations when optimizing hardware and software components. At the same time, it emphasizes the use of clustered indexes and range partitioning to support efficient, scan-based disk I/O, while minimizing the use of nonclustered indexes, which can degrade performance in a data warehouse workload.
The data warehouse configuration
The FTDW guidelines provide details about the hardware and software configurations needed to achieve a balanced component architecture. The hardware side of this equation is based on dedicated storage configurations such as a switched storage-area network or direct serial-attached SCSI. Independent, dedicated storage enclosures and their processors help to maximize disk I/O throughput. However, another consideration is the amount of server memory used, which should be based on benchmark results. The goal is to balance the maximum logical throughput against CPU utilization.
The application configuration outlined in FTDW is concerned primarily with the SQL Server setup. The guidelines address such issues as which startup options to enable when launching SQL Server, the amount of memory to allocate to SQL Server and how to set up Resource Governor. The guidelines also provide detailed specifications about configuring the three storage and file system layers that make up a typical database stack: the physical disk array, the operating system volume assignment and the actual database files.
Data warehouse best practices
The FTDW documentation outlines a number of the best practices for optimizing a system to handle data warehouse workloads, specifically with regard to planning the data architecture, managing data fragmentation and loading data.
Data architecture best practices, for example, cover heap and clustered table structures, table partitions, page compression, and indexes. The guidelines pay particular attention to columnstore indexes, addressing such issues as managing memory and distinguishing between normalized data models and dimensional ones.
In addition, the FTDW documentation offers a number of best practices related to file system fragmentation, focusing specifically on page allocation and extent management. However, it also addresses index fragmentation and filegroup setup. With regard to loading data, the best practices cover incremental loads and data migration for both heap and clustered tables.
The FTDW program
The FTDW program offers extensive guidelines for building a data warehouse. In addition to what we've covered here, the documentation includes details about the processes used to design and qualify the FTDW reference architecture, with a focus on baseline hardware and database validation. In addition, the appendices include information about the FTDW System Sizing tool, as well as details about workload testing and validating a user-defined FTDW platform. Plus, the FTDW website provides links to detailed hardware reference-architecture specifications for each participating hardware vendor.
If you're planning to implement a SQL Server 2012 data warehouse any time soon, the place to start is with FTDW. Even if you don't follow their guidelines to the letter, you'll find it to be a useful resource for better understanding how to plan an effective data warehouse platform.
About the author
Robert Sheldon is a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation.