When most people think of the phrase "hardening a server," they tend to think in terms of security practices that make it difficult for people to gain unintended access to your database. That is only one kind of hardening. From the standpoint of Windows server technology, hardening not only includes application and system settings, but technologies you implement in hardware as well.
The whole push to develop Windows Data Center Edition is based on the concept that you can qualify hardware in a way that makes an enterprise application like SQL Server less likely to fail, and therefore more available. Chances are pretty good that you aren't working with Windows Data Center Edition, but there are still lessons to be learned from that program.
When Microsoft did a study to find out just what the common problems with hardware were, the number one issue besides operator error was device driver failure. If you have ever wondered why Microsoft bothers with its "Certified for Windows" program, well, that's the reason. And the very first consideration is to pay strict attention to the Windows Hardware Compatibility List (HCL). Qualify equipment like NIC cards, memory and disk subsystems; set standards for using one vendor or another, and stick to them.
When you harden your server, think in terms of simplifying things. Don't use multiple vendors' NIC cards, for example. If one NIC card fails, you might not be able to diagnose the problem. However, when the same failure shows up in two or more cards, then it is easier to diagnose and swap out. Among the benefits of standardizing is that you will keep fewer parts in inventory and have the parts you need when you need them. For most server admins, when a server goes down, time is of the essence.
Another principle that is helpful to follow is to not stress out your system. Don't run your server at full or near full capacity. When equipment is maxed out, it runs hot and components are more likely to fail. When work is demanding and administrators are performing at full tilt, they stress out and make hardware-related mistakes. You can't fully protect yourself against administrator errors, but you can take steps to minimize problems. If there is a switch that shuts off the system, make sure it isn't easy to flip that switch. Cover it if you can, tape it if you have to, but make a person think about it three times before they take an action that will cause a server to reboot and perhaps force you to roll back transactions or perform data validity checks.
The most hardened systems are those that are multiply redundant. Cluster technology is one way of mitigating risk, and clustering SQL Server is an affordable solution for many SQL installations. However, if you can't cluster entire systems, then look at other options you can afford. For example, IBM's X-Architecture machines maintain a spare processor you can switch to, and you could create a disk failover system or install multihomed (multiple network) connections into your server. Truly high-availability servers like the ones Stratus Technologies has developed are a lot more affordable of late and worth another look. There are many other vendors that sell fault-tolerant solutions, but NEC Corp.'s is one with a lower entry-price system based on Stratus' design.
Barrie Sosinsky is president of consulting company Sosinsky and Associates (Medfield, Mass.). He has written extensively on a variety of computer topics. His company specializes in custom software (database and Web related), training and technical documentation.