Database administrators upgrading from locally attached drives to a fancy storage area network (SAN) can get intimidated by the new technology. SANs are definitely complicated and expensive relative to local disk storage for SQL Server, but there are only a few things database administrators really need to know to get started.
Network Redundancy, Redundancy, Redundancy
When hard drives are physically inside a database server, we don't lose a lot of sleep
wondering what might happen if the drives become disconnected. However, when the drives are located halfway across the datacenter, and they're connected with a couple of cables, the connection stability becomes more important.
To prevent servers from losing communications with their drives, a storage area network typically consists of two separate and redundant networks called fabrics. Each server and each storage controller will have at least two connections, one to each fabric. Those two fabrics are completely different switches that do not communicate with each other. This holds true for fiber SANs and iSCSI SANs. In the case of iSCSI SANs, it means using two physically separate networks with two sets of switches.
This two-fabric approach means that none of these events will take down a server:
- If a switch dies, the other fabric is still fine.
- If a server's connection to the SAN (called a Host Bus Adapter, or HBA) dies, the other HBA still works.
- If a storage controller's HBA dies, the other HBA still works.
- If any single cable breaks, the other fabric still works.
With some vendors' products, we can even upgrade firmware of the HBA or a single storage controller without taking down the server's drive connections.
This level of redundancy isn't cheap, and some shops choose to run without it, but the redundant fabric assures that servers will always have access to their data. SQL Server's I/O handling is pretty robust, but database administrators should still insist on two redundant fabrics.
Multipathing software keeps it together
When a server is connected to two separate switch networks, multipathing software tells it which route to take to reach its drives. Multipathing software looks at all paths to the storage at any one time and decides which paths to use for which requests. The multipathing software chosen depends on the SAN vendor, because the two are tightly tied together.
Shops with EMC SANs will use EMC's multipathing software, shops with NetApp will use NetApp multipathing software and so on. Microsoft does have generic MPIO support built into newer versions of Windows, but read the documentation for both MPIO and the vendor's SAN firmware carefully, because not all versions are compatible with each other.
Clustered database servers make the setup even more complex because not all multipathing software works with Microsoft Cluster Services. With SQL Server clustering, two servers share the same set of hard drives, but only one server (the active one) can access the drives at a time. Bad multipathing software might have side effects, and I've experienced one personally: After I rebooted my passive node in the cluster, the multipathing software on the passive node said, "Hi, I just turned on! I'd like access to the cluster drives please. Now. Give them to me." As a result, every time the passive node rebooted, the cluster failed. The solution is to upgrade the multipathing software.
After multipathing is configured, the most important step is to test it. Test a failure of each part of your new SQL Server storage subsystem. Start a large database backup and pull a fiber cable (or network cable for iSCSI) out, power a switch off or unplug a storage controller. Watch to see how the operating system handles the failover. At no point should SQL Server crash, nor should the backup cancel. It's normal to see errors in the Windows event log from the multipathing drivers because the multipathing software should be alerting whenever a path fails. But these disruptions should never take down the application. If they do, the multipathing software has failed.
Multipathing doesn't mean load balancing
So if the server has multiple connections to its drives, it must get faster connections, right? Not necessarily. With most entry-level and midrange SANs, a single drive letter -- or "LUN" in SAN terms -- is usually only active on a single path. In some cases, servers that use multiple drive letters may be able to load balance them across the two fabric connections, but that's not a given either.
Some multipathing software allows for load balancing, whereas others are strictly for failover. Be very specific with the vendor. Lay out the exact make and model of server, the type of clustering used, the drive configurations (data, transaction logs, backups, TempDB, etc.) and ask how the multipathing works before making a purchase.
Think outside the subnet
iSCSI SAN paths don't have to be in the same building. iSCSI SANs allow the server's hard drives to be located anywhere TCP/IP traffic can reach. Ideally, of course, we want the equipment to be as close as possible to reduce the latency of traffic back and forth -- faster traffic means faster drive access.
However, in times of disaster recovery, sometimes any disk is better than no disk at all. That's where SAN-based data replication comes in handy. Many iSCSI storage controllers can replicate data between each other for redundancy. A database server's primary drives might be located in the same data center, but in the event of a catastrophic failure on that SAN, the data can be accessed at a disaster recovery SAN in another building. The server doesn't change, but it points to disks in another location. The response time will be slower, but sometimes a slow server is better than no server.
When you use this type of replicated storage in conjunction with virtual servers, the offsite DR capabilities get even better. Imagine having the entire server replicated between sites: If the data center goes down, the virtual server can be brought online in minutes at the disaster recovery data center with minimal data loss.
Using SANs for SQL Server provides more flexibility
These concepts of multiple fabrics, advanced redundancy, multipathing and geographic redundancy are worlds away from locally attached storage, and they're empowering for database administrators. Don't be afraid of the multitude of options. Make your life easier and your networks faster and more reliable with technology from storage area networks.
ABOUT THE AUTHOR
Brent Ozar is a SQL Server DBA based in Houston with a leading global financial services firm. Brent has a decade of broad IT experience in systems administration and project management before moving into database administration. In his current role, he specializes in database performance tuning, SANs and data warehousing. Brent conducts training sessions, has written several technical articles and blogs at BrentOzar.com. He is also a member of Quest's Association of SQL Server Experts.