One of the easiest ways to improve the lifetime performance of a SQL Server database is proper setup of the physical
and logical drives. While it's an easy technique, proper disk subsystem configuration is often overlooked or handled by a member of the IT staff other than a SQL Server DBA.
All too often the disk subsystem, including the disk array, is configured based only on storage capacity, with no question of drive performance. Let's go beyond storage capacity requirements and design for drive performance.
Storage specific terms that everyone should know
Before you sit down with a storage administrator or engineer to map out your plans for disk configuration, here are basic preparation steps to take. Start by getting familiar with the terms below and communicate your requirements much easier.
- RAID – Redundant Array of Inexpensive Disks, also known as Redundant Array of Independent Disks.
- Disk subsystem – A general term that refers to the disks on the server.
- Spindle – Spindles are another way to refer to the physical disk drives that make up the RAID array.
- I/O Ops – Input/Output operations, usually measured per second.
- Queuing – Number of I/O Ops that are pending completion by the disk subsystem.
- SAN – Storage area networks are collections of storage devices and fibre switches connected together along with the servers that access the storage on the device. SAN has also become a generic term, which refers to the physical storage drives such as EMC, 3PAR and Hitachi.
- LUN – Logical Unit Number – This is the identification number assigned to a volume when created on a SAN device.
- Physical drive – How Windows sees any RAID array, single drive or LUN that is attached to the server.
- Logical drive – How Windows presents drives to the user (C:, D:, E:, etc.).
- Block size – The amount of data read from the spindles in a single read operation. This size varies per vendor from 8 KB to 256 MB.
- Hardware array – A RAID array created using a physical RAID controller.
- Software array – A RAID array created within Windows using the computer management snap-in.
- Hot spare – A spindle that sits in the drive cage and is added to the array automatically in the event of a drive failure. While this does not increase capacity, it does reduce the amount of time that the array is susceptible to data loss because of a second failed drive.
- Recovery time – Amount of time needed for the RAID array to become fully redundant after a failed drive has been replaced, either manually or automatically via a hot spare.
What's the best RAID level for your system?
There are many RAID levels available on modern RAID controllers. Only a subset of these is most useful when configuring a Microsoft SQL Server. Each RAID level has a specific use and benefit. Using the wrong type of RAID level can not only hurt system performance, but also add more cost to your server configuration. Experts recommend that you never use software arrays on a database server. Use of software arrays requires additional CPU power from Windows in order to calculate which physical disk the data is written to. In hardware arrays, this overhead is offloaded to a physical PCI, PCIe or PCIx card within the computer (or within the SAN device), which has its own processor and software dedicated to this task.
RAID 1 – Mirror. A RAID 1 array is most useful for high write files, such as the page file, transaction logs and tempdb database. A RAID 1 array takes two physical disks and creates an exact duplicate of the primary drive on the backup drive. There is no performance gain or loss when using a RAID 1 array. This array can survive a single drive failure without incurring any data loss.
RAID 5 – Redundant Stripe Set. A RAID 5 array is most useful for high read files such as the database files (mdf and ndf files) and file shares. It is the most cost-effective, high-speed RAID configuration. With a RAID 5 array, there is a performance impact while writing data to the array because a parity bit must be calculated for each write operation performed. For read performance, the basic formula is (n-1)*o where n is the number of disks in the RAID 5 array and o is the number of I/O operations each disk can perform. Note: While this calculation is not perfectly accurate, it is generally considered close enough for most uses. A RAID 5 array can survive a single drive failure without incurring any data loss.
RAID 6 – Double Redundant Stripe Set. Like a RAID 5 array, a RAID 6 array is most useful for high read files such as the database and file shares. With RAID 6, there is also a performance impact while writing data to the array because two parity bits must be calculated for each write operation performed. The same basic formula is used to calculate the potential performance of the drives (n-2)*o. A RAID 6 array can survive two drive failures without incurring any data loss.
Because of the dual parity bits with RAID 6, it is more expensive to purchase than a RAID 5 array. However, RAID 6 offers a higher level of protection than RAID 5. When choosing between RAID 5 and RAID 6, consider the length of time to rebuild your array, potential loss of a second drive during that rebuild time, and cost.
RAID 10 – Mirrored Strip Sets. A RAID 10 array is most useful for high read or high write operations. RAID 10 is extremely fast; however, it is also extremely expensive (compared to the other RAID levels available). In basic terms, a RAID 10 array is several RAID 1 arrays stripped together for performance. As with a RAID 1 array, as data is written to the active drive in the pair, it is also written to the secondary drive in the pair. A RAID 10 array can survive several drive failures so long as no two drives in a single pair are lost.
More on storage and disk arrays topics in SQL Server
RAID 50 – Stripped RAID 5 Arrays. A RAID 50 array is an extremely high-performing RAID array useful for very high-load databases. This type of array can typically only be done in a SAN environment. Two or more RAID 5 arrays are taken and stripped together and data is then written to the various RAID 5 arrays. While there is no redundancy between RAID 5 arrays, it's unnecessary because the redundancy is handled within the RAID 5 arrays. A RAID 50 array can survive several drive failures so long as only a single drive per RAID 5 array fails.
Best practices for drive and physical file layout configuration
- With the introduction of larger and larger hard drives, storage administrators have been presented with a new issue: increased recovery times. In the past, 172 GB drives were the standard size deployed. Even if you had five drives in a RAID 5 array, the total storage of the array was only ~688 Gigs. At that size, recovery from a failed drive would take only hours. With those five drives now closing in on the 1TB mark per drive, they now create a 4 TB array. This larger array could take up to a day to recover (depending on system load, disk speed and so on).
Because of increased recovery time and, therefore, increased risk of data loss, it is recommended that smaller disks be used in greater numbers. Instead of a single 2 TB array, use smaller drives and make several 500 GB arrays. Not only will this improve your recovery time, but your system performance will increase, too, as you now have many more disks in your arrays. Whenever possible, connect each RAID array to its own RAID controller. That will minimize the possibility of overloading the RAID controller with I/O operations.
- When setting up your disk subsystems, there are five basic groups of files to keep in mind. These are the data files (mdf and ndf files), log files (ldf files), tempdb database, the SQL Server binaries (files that are the actual SQL Server software) and Windows. Windows and the SQL binaries will perform nicely on a single RAID 1 array, which should include the Windows page file. Note: Although it can be moved to another RAID 1 array, it's not necessary and will not provide any additional performance benefit.
Data files should be placed on one or more RAID 5 or RAID 6 arrays (based on system needs). In certain systems, you should place the data files on RAID 1 or RAID 10 arrays, but those systems are in the minority. Place the transaction log files on one or more RAID 1 or RAID 10 arrays (again, based on size and performance needs). The tempdb database should be placed on its own RAID 1 or RAID 10 array. None of these file groups (other than Windows and the SQL binaries) should share physical drives. By separating your files into these groups, you will see a definite improvement in performance.
- To determine the number of spindles, you need to first know a few things about your hardware and your database and application. For the hardware, find out the number of IOPs each physical drive can handle. For the application, you need to know the typical high watermark that your database will need to function. The number of spindles you have times the number of IOPs that each drive can handle must be higher than the high watermark number of IOPs that your database requires.
Drive layout examples for SQL Server systems
Now that we have covered the basics, let's put all this information together and look at some server drive layouts. These systems all have a RAID 1 drive holding the Operating System, page file and SQL Server binaries:
The small database server. This would be a database system having a small number of users and a low number of transactions per second. For a system of this type, putting the data files, transaction logs and tempdb database on a single RAID 5 array will be fine. With a load this low, there should be minimal to no performance loss because of having the files on the same array.
The medium database server. This would be a system that has a larger number of users and no more than 100 transactions per second. At this system size, you will begin to separate parts of the database. Start by separating the database files and log files onto different RAID arrays. The tempdb database can probably remain on the same array as the database files. But, if the tempdb database is heavily used for things like lots of temporary table usage, it should be moved to its own drive array.
The large database server. This system is for hundreds of users and hundreds or thousands of transactions per second. When designing systems of this size, break apart not only the database files, log files and tempdb database files, but also want database into smaller data files on multiple drives. This includes moving the indexes to separate files on RAID 1 or RAID 5 arrays, moving blob data (data stored in TEXT, NTEXT, IMAGE, VARCHAR(MAX) and NVARCHAR(MAX) data types) to a separate RAID 5 array. You can also place different types of tables on different drives by assigning them to different file groups.
It is recommended that for larger systems, each file group of the database should have between .25 to one physical file per physical CPU core in the server. The tempdb database should have one data file per physical CPU core in the server.
It is highly recommended that all partitions are configured by using the DISKPAR (Windows 2000) or DISKPART (Windows 2003) commands. When using DISKPAR, adjust the alignment by 512 bytes, and when using DISKPART, the disk should be aligned to 64. The reason for this is due to the original master boot record design of WinTel based systems. The master boot record for all drives is 63 blocks (1 block = 512 bytes).
The physical disks want to read and write data in 64 block chunks. Because the master boot record is only 63 blocks, this puts the first block of actual data in block location 64, where it should be in block location 65. That forces the disk to read 128 blocks for each 64 blocks read to the disk, thereby increasing the work needed to be done and decreasing performance.
It is so highly recommended that volumes be created with this 64 block offset that Microsoft is including this procedure as the standard when creating partitions starting in Microsoft Windows 2008 Server. There are no published figures on what sort of performance improvement will be seen by creating your disks using this method. It's because any numbers would be relevant to only the system they were taken against, as all databases are different. Unfortunately, once a partition has been created without the alignment offset, there is no easy way to change the offset. The only method for doing that is to create a new volume and partition with the offset, take down the SQL Server and manually migrate the files to the new drive in an offline manor.
When designing larger database servers, it is important to pay careful attention to the disk configuration. An improperly laid out disk configuration can easily have an adverse impact on your database performance.
Even though laying out the disk subsystem of a SQL Server appears to be a simple task, take great care to get the maximum amount of performance from your drive subsystem. A poorly configured array can and will have a detrimental effect on the performance of your database system. When planning and purchasing your storage hardware, think not only of the capacity, but also about the amount of I/O operations that the disks will need to handle -- not only today, but far into the future. Proper planning will give you a high-performing SQL Server today and years from now. When designing a system, your goal should be that it performs well beyond the expected lifetime of the hardware it resides on.
ABOUT THE AUTHOR
Denny Cherry is a DBA and database architect managing one of the largest SQL Server installations in the world, supporting more than 175 million users. Denny's primary areas of expertise are system architecture, performance tuning, replication and troubleshooting.