This is the first in a series of articles describing the case history of a database upgrade from SQL Server 2000 Active/Active cluster running on Window 2000 Server to a Windows Server 2003/SQL Server 2005 Active/Active cluster. Consultant Matthew Schroeder walks you through the technical and decision-making process of real-world IT and database management teams. The article is based on two online upgrades: a commercial website and an eBay ordering system. For confidentiality reasons, certain details of the actual project have been changed.
Working in a 24/7 environment is demanding, particularly when money is involved. Rarely do you have the luxury of a large maintenance window. In examining our options for this upgrade, we decided our principal goals were to minimize downtime and preserve the image of IT as a reliable business partner.
Project team composition
The project team comprises members that are spread across multiple departments. Networking is engaged to monitor the networking load placed on the network if the database is located at a remote site separate from the website. Hardware personnel are involved to build any necessary infrastructure, such as clusters – SANs and virtual machines (Microsoft or VMware) – and monitor system performance while the heavy system loads are moved between the various systems. Application
support, or call center personnel, are responsible for the initial line of support because application issues not related to the migration and the advanced engineering groups need to be isolated from frivolous calls. Depending on the type of application moved, either vendor support personnel or internal development personnel, need to be on call to identify potential problems. Point-of-contact end users will be in touch with application support or call center personnel to verify testing. Since you'll be picking key end users, it will reduce the load on application support.
In all, an upgrade of one application in this manner will require help from the following personnel: one networking person (initial spec/on-call), one hardware person (full-time involvement for approximately three days), one DBA (full-time
involvement for a week), two application support people, and typically two or three end users.
The first decision our team made was to transition to a temporary server to run the application while we upgraded the local SQL Server cluster. At that point, we would use database mirroring to sync the transitional database with a database on the newly rebuilt local cluster.
Weighing our upgrade options
Option 1: Schedule the downtime required to upgrade Windows Server and SQL Server in-place
- The upgrade process would be fairly simple.
- The upgrade downtime would be shorter then a total rebuild of the box.
- Existing naming conventions, security, etc. would be preserved.
- We all know how clean Microsoft's upgrades are. That said, we wouldn't be sure at the end of the upgrade process that we had a stable box, exactly comparable to a fresh install.
- As our environment evolved through the tenure of many IT staff members, our systems would lack consistency in naming conventions, security and so on. This approach would not force us to implement a new, more consistent approach.
- The extended downtime would cause losses of hundreds of thousands of dollars.
Option 2: Schedule enough downtime to install a fresh copy of Windows Server 2003/SQL Server 2005
This approach would take between eight and 10 hours to complete. It's not fiscally feasible.
Option 3: Run the application on a transitional server built with Windows Server 2003/SQL Server 2005
- Downtime would be limited to the specific technology used to get a database established and running on the transitional server.
- Downtime should not be required to transition from the transitional server back to the rebuilt Windows 2003/SQL Server 2005 Active/Active cluster.
- The original cluster can be completely rebuilt with a fresh install of Win 2003/SQL Server 2005. This would provide consistent standards and security in a relaxed fashion, since the applications will run on a transitional server in the meantime.
- The technologies to transition the database from the original server to the transitional server could be complicated depending on what method was chosen.
- The technologies to transition the database from the transitional server to the rebuilt original server could be complicated depending on what method was chosen.
And the winner is….
We're going with option 3 and running the application on a transitional server while we rebuild the original servers – cleanly. Stay tuned for the next article where we'll cover the task of transitioning to a temporary server to run the application while we upgrade the local cluster.
Upgrading Active/Active cluster to Windows Server 2003/SQL Server 2005
Part 1: Team composition and upgrade option pros and cons
Part 2: Restoring a SQL Server database to a transition server
Part 3: SQL Server high availability when upgrading to SQL Server 2005
Part 4: Upgrade live applications to SQL Server 2005 for high availability
Part 5: Monitor database mirroring and replication after upgrade
ABOUT THE AUTHOR
Matthew Schroeder is a senior software engineer who works on SQL Server database systems ranging in size from 2 GB to 3+ TB, with between 2k and 40+k trans/sec. He specializes in OLTP/OLAP DBMS systems as well as highly scalable processing systems written in .NET. Matthew is a Microsoft certified MCITP, Database Developer, has a master's degree in computer science and more than 12 years of experience in SQL Server/Oracle. He can be reached at firstname.lastname@example.org.
Do you have comments of your own? Let us know.
Hi, I've just read your article with interest as we are doing exactly this at the moment. Our scenario is greatly complicated by having merge replication running, so we need a next to no downtime solution. One of the methods we are using (we don't have a transitional/new server available for every cluster) is to remove the redundancy temporarily and to build a new cluster. This does have the downside that the 2 active nodes will be on one physical hardware for a bit, and this has an large impact on performance. However, the reduction on speed is outweighed by keeping the system going. We then take the rescued node and create a brand new cluster on it - with one node for the time being - installed with W2K3 and SQL2005. The downtime is now reduced to the time it takes to move the data from the old 2000 node to the new 2005 node. When this is completed, we clear the old SQL2000 node, install W2K3 and SQL2005 and add it into the new cluster. We are then back to active/active.
I accept this is possibly not the most resiliant solution, however if you do not have the luxury of new hardware it does reduce the down time to the absolute minimum.