There comes a point in any DBA's life – whether you want to be the DBA or not – when you start thinking about SQL Server replication.
This article is designed to rapidly introduce you to replication so that you can make smarter decisions about it. I've seen a lot of replication-oriented projects go poorly, but making the right decisions up front is the best way to avoid failure.
What is SQL Server replication?
Simply put, replication is SQL Server's way of getting copies of your data in multiple places at the same time. There are a few different forms of replication to be aware of:
- Snapshot. This is a simple copy of a database. Well, maybe not so simple. The original database actually has to be locked -- which makes it pretty much unusable -- while the snapshot is generated. This ensures that the resulting copy is absolutely identical.
Snapshots are usually not used by themselves; instead, they serve as a starting point for other forms of replication. You take a snapshot to get another copy up and running, and then use some other form of replication to keep the copy up-to-date over time.
Transactional. This is a one-way form of replication, meaning data flows from a master "publisher" to one or more "subscribers." Any changes made to the subscribers are not pushed back to the publisher. Raw data isn't replicated; instead, transactions from the publisher's transaction log are sent to each subscriber, which "replays" those transactions to bring their copy of the data up-to-date.
Transactional with updating subscribers. This variation of transactional replication allows subscribers to send their transactions to the publisher as well. There is absolutely no provision for data conflicts; this system works best when subscriber databases are only updating a subset of the data. In other words, if each subscriber can "own" a chunk of the database – such as data related to a specific geographic region – then this form of replication can work well.
Merge. This is a more complex form of replication that allows for each participating copy of the database to be both a publisher and a subscriber, meaning each copy of the database is fully read-write.
Special resolution modules are needed to deal with any data conflicts – that is, data which changes in more than one location at the same time. A default merge resolver simply lets the change with the most recent timestamp "win."
Where replication goes wrong
Think for a moment about what happens when you have five updating subscribers to a transactional replication publisher. Subscriber A changes some data, and sends that transaction to the Publisher. The Publisher runs the transaction and applies that change to its copy of the database, which results in changes being sent to Subscribers B, C, D, and E. In the meantime, Subscribers B, C, D, and E are applying their own changes, and sending transactions to the publisher – which must redistribute them to everyone else.
In a busy database application, this can quickly generate a lot of traffic, and even if your network links are up to the task, you're exponentially increasing the workload on each SQL Server computer. Each server has to not only keep up with its own changes, but also the changes on every other server. The Publisher is also tasked with consolidating all the changes.
Does it work? Sure – to a point. It isn't magic, though – at some point you'll exceed your network, server workload capacities, or some other bottleneck, and things will stop working. Repairing replication is a pain, usually involving breaking the replication and starting from scratch with a new snapshot.
Replication, in my experience, is not a good way to have a dozen identical copies of a busy database spread across the planet, connected by relatively slow (and inevitably expensive) wide-area network (WAN) links. In most of the cases where I've seen replication fail, it's because the folks didn't understand all the work that replication was doing, and they therefore had unreasonable expectations of it.
If you need a busy database to be globally accessible, a cloud computing platform like Windows Azure is a better bet.
Miss a tip? Check out the archive page for Don's series, SQL Server for the Reluctant DBA.
ABOUT THE AUTHOR:
Don Jones is a co-founder of Concentrated Technology LLC, the author of more than 30 IT books and a speaker at technical conferences worldwide. Contact him through his website at www.ConcentratedTech.com.