One of the easiest ways to improve the lifetime performance of a SQL Server database is proper setup of the physical and logical drives. While
it's an easy technique, proper disk subsystem configuration is often overlooked or handled by a member of the IT staff other than a SQL Server
DBA.
All too often the disk subsystem, including the disk array, is configured based only on storage capacity, with no question of drive
performance. Let's go beyond storage capacity requirements and design for drive performance.
Before you sit down with a storage administrator or engineer to map out your plans for disk configuration, here are basic preparation steps to
take. Start by getting familiar with the terms below and communicate your requirements much easier.
RAID – Redundant Array of Inexpensive Disks, also known as Redundant Array of Independent Disks.
Disk subsystem – A general term that refers to the disks on the server.
Spindle – Spindles are another way to refer to the physical disk drives that make up the RAID array.
I/O Ops – Input/Output operations, usually measured per second.
Queuing – Number of I/O Ops that are pending completion by the disk subsystem.
SAN – Storage area networks are collections of storage devices and fibre switches connected together along with the servers that access the storage on the device. SAN has also become a generic term, which refers to the physical storage drives such as EMC, 3PAR and Hitachi.
LUN – Logical Unit Number – This is the identification number assigned to a volume when created on a SAN device.
Physical drive – How Windows sees any RAID array, single drive or LUN that is attached to the server.
Logical drive – How Windows presents drives to the user (C:, D:, E:, etc.).
Block size – The amount of data read from the spindles in a single read operation. This size varies per vendor from 8 KB to 256 MB.
Hardware array – A RAID array created using a physical RAID controller.
Software array – A RAID array created within Windows using the computer management snap-in.
Hot spare – A spindle that sits in the drive cage and is added to the array automatically in the event of a drive failure. While this does not increase capacity, it does reduce the amount of time that the array is susceptible to data loss because of a second failed drive.
Recovery time – Amount of time needed for the RAID array to become fully redundant after a failed drive has been replaced, either manually or automatically via a hot spare.
There are many RAID levels available on modern RAID controllers. Only a subset of these is most useful when configuring a Microsoft SQL Server.
Each RAID level has a specific use and benefit. Using the wrong type of RAID level can not only hurt system performance, but also add more cost to
your server configuration. Experts recommend that you never use software arrays on a database server. Use of software arrays requires additional
CPU power from Windows in order to calculate which physical disk the data is written to. In hardware arrays, this overhead is offloaded to a
physical PCI, PCIe or PCIx card within the computer (or within the SAN device), which has its own processor and software dedicated to this
task.
RAID 1 – Mirror. A RAID 1 array is most useful for high write files, such as the page file, transaction logs and tempdb database. A RAID 1
array takes two physical disks and creates an exact duplicate of the primary drive on the backup drive. There is no performance gain or loss when
using a RAID 1 array. This array can survive a single drive failure without incurring any data loss.
RAID 5 – Redundant Stripe Set. A RAID 5 array is most useful for high read files such as the database files (mdf and ndf files) and
file shares. It is the most cost-effective, high-speed RAID configuration. With a RAID 5 array, there is a performance impact while writing data
to the array because a parity bit must be calculated for each write operation performed. For read performance, the basic formula is (n-1)*o where
n is the number of disks in the RAID 5 array and o is the number of I/O operations each disk can perform. Note: While this
calculation is not perfectly accurate, it is generally considered close enough for most uses. A RAID 5 array can survive a single drive
failure without incurring any data loss.
RAID 6 – Double Redundant Stripe Set. Like a RAID 5 array, a RAID 6 array is most useful for high read files such as the database and
file shares. With RAID 6, there is also a performance impact while writing data to the array because two parity bits must be calculated for each
write operation performed. The same basic formula is used to calculate the potential performance of the drives (n-2)*o. A RAID 6 array can
survive two drive failures without incurring any data loss.
Because of the dual parity bits with RAID 6, it is more expensive to purchase than a RAID 5 array. However, RAID 6 offers a higher level of
protection than RAID 5. When choosing between RAID 5 and RAID 6, consider the length of time to rebuild your array, potential loss of a second
drive during that rebuild time, and cost.
RAID 10 – Mirrored Strip Sets. A RAID 10 array is most useful for high read or high write operations. RAID 10 is extremely fast;
however, it is also extremely expensive (compared to the other RAID levels available). In basic terms, a RAID 10 array is several RAID 1 arrays
stripped together for performance. As with a RAID 1 array, as data is written to the active drive in the pair, it is also written to the secondary
drive in the pair. A RAID 10 array can survive several drive failures so long as no two drives in a single pair are lost.
RAID 50 – Stripped RAID 5 Arrays. A RAID 50 array is an extremely high-performing RAID array useful for very high-load databases. This
type of array can typically only be done in a SAN environment. Two or more RAID 5 arrays are taken and stripped together and data is then written
to the various RAID 5 arrays. While there is no redundancy between RAID 5 arrays, it's unnecessary because the redundancy is handled within the
RAID 5 arrays. A RAID 50 array can survive several drive failures so long as only a single drive per RAID 5 array fails.
With the introduction of larger and larger hard drives, storage administrators have been presented with a new issue: increased
recovery times. In the past, 172 GB drives were the standard size deployed. Even if you had five drives in a RAID 5 array, the total storage of
the array was only ~688 Gigs. At that size, recovery from a failed drive would take only hours. With those five drives now closing in on the 1TB
mark per drive, they now create a 4 TB array. This larger array could take up to a day to recover (depending on system load, disk speed and so
on).
Because of increased recovery time and, therefore, increased risk of data loss, it is recommended that smaller disks be used in greater
numbers. Instead of a single 2 TB array, use smaller drives and make several 500 GB arrays. Not only will this improve your recovery time, but
your system performance will increase, too, as you now have many more disks in your arrays. Whenever possible, connect each RAID array to its own
RAID controller. That will minimize the possibility of overloading the RAID controller with I/O operations.
When setting up your disk subsystems, there are five basic groups of files to keep in mind. These are the data files (mdf and ndf
files), log files (ldf files), tempdb database, the SQL Server binaries (files that are the actual SQL Server software) and Windows. Windows and
the SQL binaries will perform nicely on a single RAID 1 array, which should include the Windows page file. Note: Although it can be moved to
another RAID 1 array, it's not necessary and will not provide any additional performance benefit.
Data files should be placed on one or more RAID 5 or RAID 6 arrays (based on system needs). In certain systems, you should place the data files
on RAID 1 or RAID 10 arrays, but those systems are in the minority. Place the transaction log files on one or more RAID 1 or RAID 10 arrays
(again, based on size and performance needs). The tempdb database should be placed on its own RAID 1 or RAID 10 array. None of these file groups
(other than Windows and the SQL binaries) should share physical drives. By separating your files into these groups, you will see a definite
improvement in performance.
To determine the number of spindles, you need to first know a few things about your hardware and your database and application. For
the hardware, find out the number of IOPs each physical drive can handle. For the application, you need to know the typical high watermark that
your database will need to function. The number of spindles you have times the number of IOPs that each drive can handle must be higher than the
high watermark number of IOPs that your database requires.
Now that we have covered the basics, let's put all this information together and look at some server drive layouts. These systems all have a RAID
1 drive holding the Operating System, page file and SQL Server binaries:
The small database server. This would be a database system having a small number of users and a low number of transactions per second.
For a system of this type, putting the data files, transaction logs and tempdb database on a single RAID 5 array will be fine. With a load this
low, there should be minimal to no performance loss because of having the files on the same array.
The medium database server. This would be a system that has a larger number of users and no more than 100 transactions per second. At
this system size, you will begin to separate parts of the database. Start by separating the database files and log files onto different RAID
arrays. The tempdb database can probably remain on the same array as the database files. But, if the tempdb database is heavily used for things
like lots of temporary table usage, it should be moved to its own drive array.
The large database server. This system is for hundreds of users and hundreds or thousands of transactions per second. When designing
systems of this size, break apart not only the database files, log files and tempdb database files, but also want database into smaller data files
on multiple drives. This includes moving the indexes to separate files on RAID 1 or RAID 5 arrays, moving blob data (data stored in TEXT, NTEXT,
IMAGE, VARCHAR(MAX) and NVARCHAR(MAX) data types) to a separate RAID 5 array. You can also place different types of tables on different drives by
assigning them to different file groups.
It is recommended that for larger systems, each file group of the database should have between .25 to one physical file per physical CPU core
in the server. The tempdb database should have one data file per physical CPU core in the server.
It is highly recommended that all partitions are configured by using the DISKPAR (Windows 2000) or DISKPART (Windows 2003) commands.
When using DISKPAR, adjust the alignment by 512 bytes, and when using DISKPART, the disk should be aligned to 64. The reason for this is due to
the original master boot record design of WinTel based systems. The master boot record for all drives is 63 blocks (1 block = 512 bytes).
The physical disks want to read and write data in 64 block chunks. Because the master boot record is only 63 blocks, this puts the first block
of actual data in block location 64, where it should be in block location 65. That forces the disk to read 128 blocks for each 64 blocks read to
the disk, thereby increasing the work needed to be done and decreasing performance.

It is so highly recommended that volumes be created with this 64 block offset that Microsoft is including this procedure as the standard
when creating partitions starting in Microsoft Windows 2008 Server. There are no published figures on what sort of performance improvement will be
seen by creating your disks using this method. It's because any numbers would be relevant to only the system they were taken against, as all
databases are different. Unfortunately, once a partition has been created without the alignment offset, there is no easy way to change the offset.
The only method for doing that is to create a new volume and partition with the offset, take down the SQL Server and manually migrate the files to
the new drive in an offline manor.
When designing larger database servers, it is important to pay careful attention to the disk configuration. An improperly laid out disk
configuration can easily have an adverse impact on your database performance.
Even though laying out the disk subsystem of a SQL Server appears to be a simple task, take great care to get the maximum amount of performance
from your drive subsystem. A poorly configured array can and will have a detrimental effect on the performance of your database system. When
planning and purchasing your storage hardware, think not only of the capacity, but also about the amount of I/O operations that the disks will
need to handle -- not only today, but far into the future. Proper planning will give you a high-performing SQL Server today and years from now.
When designing a system, your goal should be that it performs well beyond the expected lifetime of the hardware it resides on.
| ABOUT THE AUTHOR: |
|
Denny Cherry is a DBA and database architect managing one of the largest SQL Server installations in
the world, supporting more than 175 million users. Denny's primary areas of expertise are system architecture, performance tuning, replication and
troubleshooting.
Copyright 2007 TechTarget
|
|