Multi-instance, Active/Active SQL Server cluster design is an involved process. To optimize high availability and...
performance, you must consider a number of components in ensuring both instances can run on a single node. I'll walk you through these considerations including options for multi-instance clusters, choosing adequate hardware and what to do before and after cluster installation.
More on Microsoft SQL Server installation
Get advice for planning a Microsoft SQL Server installation
Check out this tip for a no-risk SQL Server upgrade for SSRS
Learn about SQL Server Upgrade Advisor and Upgrade Assistant
Clustering is a high availability technology in which one or more servers (called nodes) connect to shared resources to appear to clients as if they are a single server called a virtual server. If one node fails or is taken offline as part of a planned or unplanned event, the service resources will be provided by another node in the cluster without users knowing. It is important to realize that clients connected to the virtual server running on the first node will be disconnected and when they reconnect to the virtual server again, the services they access will be provided by the failover node. So there is some potential for data loss should they be doing work in the middle of a failover. The applications using a cluster must have some logic to minimize data loss during the failover period. This failover appears as a services stop and restart to existing clients.
|TABLE OF CONTENTS
Distinct cluster configurations
Minimize fail overtime
SQL Server clusters can be configured in four distinct configurations:
- Single instance – formerly called Active/Passive – presents to client applications as a single virtual server where the one node is online and the other node is in a standby state. The node online (called the primary) has all the resources, while the standby node (called the secondary) is offline and does not have ownership of any of the resources.
- Multi-instance – formerly called Active/Active – presents to the client as multiple virtual servers where multiple nodes can be online simultaneously, and resources are shared in distinct units called resource groups. At any one time one node can have no resource groups, one resource group or all resource groups. In this example we are talking about a two node cluster, but clustering is not limited to two nodes. Depending on your version, you can have two node, four node, or eight node clusters. SQL Server 2000 Enterprise Edition supports up to four nodes in a cluster, SQL Server 2005 Standard Edition supports up to two nodes in a failover cluster, and SQL Server 2005 Enterprise Edition supports as many nodes as the OS version will support: Currently this is up to eight nodes in a failover cluster using Windows Server 2003 Enterprise Edition. Note that clustering has special SQL licensing exceptions for certain cluster configurations. Regardless of the SQL configuration, you will need OS licenses for every node in your cluster. With single instance clusters you only need licensing as if it was a stand alone SQL Server.
- N+1 – multiple instance clusters, where multiple nodes all share the same failover node. This is considered more cost effective than multiple single instance clusters as it requires fewer physical machines. However, storage and licensing costs tend to be more expensive than individual machine costs, so this configuration is not a popular choice for very highly available scenarios.
- N+M – multiple instance clusters where multiple nodes all share the same failover nodes. This is commonly deployed as six to two, or five to three combinations on eight node clusters. It is used when some performance degradation is acceptable during severe failover events but the system must remain online. Refer to this article for more on clustering best practices.
Verify the cluster is configured properly. Once you have built your Windows cluster, ensure the following:
Check cluster disk resource group. You will need to add resource groups for each SQL virtual server. For example, your default instance might need to own disks D and E, while your named instance might need to own disks F and G. Create these disk resource groups in advance to make sure they can move around and will come online for all nodes.
Choose and record the network name and IP address you will use for your virtual server. Follow these steps to Configure MSDTC.
If you are installing a SQL 2000 multi-instance cluster, make sure to use the SQL 2000a CD. There is an issue with the RTM version that you may encounter.
Ensure you install off a local CD on a cluster node, or copy the files off the CD locally to the node on the cluster and install there. If you are installing off a network share, ensure all nodes have permission to read and list files and directories on the share without specifying network credentials. It is preferable to always attempt the installation from media locally on the node. There have been cases when installing of installation media on a network has failed.
In SQL Server 2000, you must create a named pipes alias to any named instance before you install it. For example, you want to install a named instance called SQLInstance; you would create an alias to SQLInstance\SQLInstance using the SQL client network utility and then start the installation. For SQL 2000, the virtual server name and the instance name are the same. Consult this article for more on the Installation of a named instance.
The client alias limitation has been removed in SQL 2005. Also removed in SQL 2005 is the 16 node limit for instances running per server (clustered or standalone). You can now run up to 50 instances, should your server have the hardware to support it.
Start the installation from the node that owns the disk resource you want the SQL instance to use. The SQL installer will prompt you for the network name and the IP address, then it will create the actual virtual server. Test failing over the virtual server to each allowed node. Move any additional disk resources into this SQL instance's resource group – for example a disk resource group for the data files and another for the log files. Make the SQL service dependent on any new disk resources or it will be unable to place SQL data or log files on these resources.
Make sure that max server memory is capped for each instance, and the total of all the max server memory is less than total physical RAM. For example, on SQL Server Instance A, you may set max server memory at 1 Gig; on SQL Server Instance B, you may set it at 1 Gig, where the total is 2 Gigs; which is less than your physical RAM.
Force the TCP port – When you fail over the SQL Virtual Server to another node and then fail back to the original node, SQL Virtual Server sometimes fails to bind to the port on which it is listening. Check out this Microsoft article on Binding to the TCP/IP port for more information. Test connection to your SQL Server virtual instances through Query Analyzer by entering TCP:ServerName in the SQL Server dialog box from your desktop. If this fails, then make sure you can connect locally from the virtual server itself.
Verify performance counters are on all nodes. Consult this Mircrosoft page on PRB: SQL Performance Monitor Counters Missing for assistance.
Minimize failover time
When you are using any high availability technology, you want to minimize failover times. There are two factors that impact the failover times:
- Length of outstanding transactions
During the failover process, the database files and transactions logs are transferred to the second node. When the SQL Server starts, it recovers the databases and rollback/forward uncommitted/committed transactions in the transaction log. If your application using the cluster has long-running transactions, this can delay the recovery times, meaning the cluster will take longer to fail over and come back online.
- Network latency
Network latency is best improved by optimizing your network connections. Clustering has two separate networks – one is private for communication between nodes, and the second is a public network, which is how the clients connect to the virtual server. Clustering requires a 150 ms or better ping response time between nodes in a cluster. Should the ping time exceed this, the standby node may decide the primary node has failed and begin the failover process. To prevent this, ensure the private network is isolated from the public network. On your private network, consider isolating traffic by implementing a VLAN (virtual LAN), a separate subnet or use a crossover cable for two-node clusters. Make sure you follow the below guidelines for your private network:
- Use TCP/IP as the only protocol bound to the NIC.
- No default gateway is configured.
- No DNS servers are configured unless the cluster nodes are DNS servers, in which case 127.0.0.1 should be configured.
- No DNS registration or DNS suffix is configured.
- No WINS servers are configured.
- Static IP addresses are used for all nodes.
- NetBIOS over TCP/IP is disabled.
- No NIC teaming is used, where two network interface cards are aggregated together to act as a single NIC card. (Note that NIC teaming can be used on any public NICs).
Single instance clustering is used to provide high availability and high performance. Multiple instance clustering is used to also provide high availability, but is used when high performance is not as critical as high availability. During a failure event, a single node in your multi-instance cluster may be supporting all instances in the cluster. You must test to ensure that the performance delivered by all instances running on a single node will meet the SLAs of your clients. To do this, your multi-instance clusters need to be considerably more beefy than their single instance counterparts. Some cluster engineers suggest each node in a cluster should have twice the processors and twice the RAM as its single instance counterpart.
The cluster hardware must be modular and able to accept hot pluggable components to service the increased load, as a single node supports more and more instances. SQL Server 2005 Enterprise Edition utilizes Hot Swappable Memory should the hardware support it. You should limit the default worker threads for each instance of SQL Server. By default, on a 32 bit server (clustered or unclustered) with four or less processors, you should set the max worker threads to 256 threads. On an eight-way, set the default worker threads to 288 threads. On a 64-bit server with four or less processors set the max worker threads to 512 threads, and on a 64 bit eight-way, set the default worker threads to 576 threads. Use sp_configure 'max worker threads' to adjust the worker threads. By default it will be 0. On a multi-instance cluster, you may find there are not enough system resources to run all instances on a single node with these settings. You may have to adjust the number of threads to a lower amount. You can also set the affinity mask so each instance only uses its own subset of the available processors.
Creating SQL Server multi-instance clusters is complex. You have to ensure that you have adequate hardware for both instances to run on a single node with acceptable performance degradation on both instances. The installation procedure is more complex as you are installing two virtual instances of SQL Server that must coexist. Installed correctly, multi-instance clusters can provide high performance and high availability and can be cheaper than two single instance clusters.
Linchi Shea and Geoff Hiten contributed to this article.
ABOUT THE AUTHOR
Hilary Cotter has been involved in IT for more than 20 years as a Web and database consultant. Microsoft first awarded Cotter the Microsoft SQL Server MVP award in 2001. Cotter received his bachelor of applied science degree in mechanical engineering from the University of Toronto and studied economics at the University of Calgary and computer science at UC Berkley. He is the author of a book on SQL Server transactional replication and is currently working on books on merge replication and Microsoft search technologies.
Copyright 2006 TechTarget