Database administrators take SQL Server storage for granted all too often. The storage area network (SAN) is something
a lot of DBAs need to learn more about. By understanding the concept of SANs, we can better manage a SQL Server environment and create the best possible system performance for dollars spent.
Selecting your RAID level and storage tier
When requesting storage from your SAN admin or Windows admin, tell them what RAID level and storage tier you need. That's important because the RAID level determines just how much read and write I/O you need, as well as the amount of redundancy you require. For definitions of the various RAID levels that SAN arrays support, check out this tip I wrote on optimizing disk configuration in SQL Server. It's important that a DBA is involved in the decision process for RAID level volumes -- after all, you know the data within the system much better than any storage engineer does.
Knowing the storage tier that your data will be in (assuming that your storage administrator has set up tiered storage) will make you the storage administrator's best friend. Most databases fall into Tier 1 storage. Tier 1 storage is the fastest, most expensive storage in the system. Lots of Fibre Channel drives, probably 146 Gig or 300 Gig in size, go behind the database to make it run well. Tier 2 storage will usually be larger Fibre Channel drives. This storage is not as expensive and not as fast as they are allocating more data per spindle. Tier 3 storage is the least expensive storage. It is usually made up of very large, very inexpensive disks such as 750 SATA drives.
Note: This tiered layout is by no means a standard. Talk to your storage administrator and see what sort of layout he used.
If you know the system doesn't need to be ultra fast – it will only be used by a small number of people – then you might want to put the system onto Tier 2 storage. The storage won't be as fast as Tier 1, but you'll have built the system for a lot less money and left the Tier 1 storage for systems that need the blazing fast speed.
If you are going to use SATA drives for your database, be very careful. SATA drives perform great in desktop computers, but when you start placing several requests to them at a time, you won't get the performance you're looking for.
Don't be afraid to mix and match storage tiers in a single database system. If you have one drive that needs fast storage and a second drive that needs slower storage for archive data, then request storage from two different tiers. This way, you can get the storage you need and the storage admin can balance storage needs across the entire environment.
Redundancy is critical
When moving your local storage to a storage area network, redundancy -- both within the SAN array as well as the Fibre Channel – is critical to keeping your SQL Server up and running. Within the storage array itself, there should be several hot spare hard drives. For Fibre Channel drives, it's recommended to have one hot spare for every 30 Fibre Channel drives in the system (one hot spare per two shelves). For SATA drives, it's recommended to have one hot spare for every 15 drives (one hot spare per shelf). While this does reduce the amount of total usable storage within the array, it will allow your array to recover automatically from a disk failure.
Within the Fibre Channel fabric itself, all production systems should have two HBAs, with each HBA connected to a different Fibre Channel network. Then each Fibre network should be connected to the storage array. The Fibre networks should not be connected to each other. To remain redundant you have to keep them disconnected from each other. Connecting the networks will turn them into a single Fibre network. This will cause the entire Fibre network to go offline in the event that a single Fibre switch goes offline. When you look at your Fibre network, remember this: If a single Fibre switch goes offline, all Fibre switches connected to it will go offline. So be sure to keep those redundant networks separate.
Backups are no less important
Just because the databases are stored on the SAN, backups are just as important as always. While the SAN environment is very redundant and multiple drive failures are rare, they are still possible. And so is data corruption or accidental data deletion by a user.
However, because you are storing your data on a SAN, you have an option called snapshots. Snapshots in the SAN world are different from snapshots in the database world. In the SAN world, a snapshot means taking an exact duplicate of the LUN, which is presented to the host. Vendors do this slightly differently, so be sure to read
up on what you're getting before you decide to use this method. When using SAN snapshots, make sure the storage array supports consistency groups. An array uses consistency groups to ensure that all the disks are in a consistent state when backed up. This is extremely important in the database world because if the files are out of sync, you won't be able to reattach them.
You also need to know in what state the backup will be restored. There are basically two options: crash recoverable and crash restartable. A crash recoverable backup looks to the SQL Server as if the power cable was pulled from the system, then a backup was taken. A crash restartable backup looks to the SQL Server as if the SQL Server was stopped, then the backup was taken. A crash restartable restore has a much better chance of restoring than a crash recoverable database.
The SAN is not a magical device
Many IT folks treat SANs as if they are the solution to every problem. I've heard time and time again, "Just stick it on the SAN, that'll fix the performance problem." While this can be the case, it's not the right technique. The SAN is a powerful tool, but you need to manage it correctly. Unless you are spending huge amounts of money on your SAN, it's not going to self-diagnose or self-heal (except for failed hard drives). It's not going to find the parts that are overused and move the data somewhere else. That needs to be done manually. Be wary of SANs that claim to do this automatically, as they may end up spending more time cleaning themselves up than actually serving I/O requests. I've heard horror stories of SANs running at peak capacity, and when you look into the system, it's spending 80% of the time trying to optimize itself and only 20% of the time serving I/O requests.
As a DBA, be aware of all these things when working with your SAN array. SQL Server MVP Hilary Cotter has an excellent tip on SAN considerations, which goes over several other items.
Work with your storage engineer. He's there to work with you, not against you. After all, you both have the same goal: to get the systems up and running the best you can with the budget you've been given.
ABOUT THE AUTHOR
Denny Cherry has over a decade of experience managing SQL Server, including MySpace.com's over 175-million-user installation, one of the largest in the world. Denny's areas of expertise include system architecture, performance tuning, replication and troubleshooting. He currently holds several Microsoft certifications related to SQL Server and is a Microsoft MVP.
Check out his blog: SQL Server with Mr. Denny.