This tip is the first in a three-part series on merge replication. Contributor and Microsoft SQL Server MVP Hilary Cotter explains how merge replication works. In tips to be featured next week, he offers merge replication tricks and performance tips.
How merge replication works
Merge replication is a database technology that combines changes occurring on two or more nodes in a publisher/subscriber topology. It is important to note that each node in a merge replication topology isn't completely autonomous; it's not true multi-master replication. You must have a node designated as the Publisher, which acts as a clearing house. It's where all changes on the nodes are collected, reconciled and distributed to all other nodes, the Subscribers.
Changes that have occurred between merges or synchronizations are tracked by triggers on the tables being replicated. These triggers write tracking information to metadata tables (for insert and update activity, the metadata is recorded in the MSmerge_contents table; for delete activity, the metadata is recorded in the MSmerge_tombstone table). In the tables you are replicating, each row is identified by a
Consider a case in which you are replicating the authors' table in the pubs database. The first row is for a Johnson White. This row could have a GUID value of 40353CBD-C085-4953-9F46-B06BC704AFC1. On every Subscriber, this row will have the same GUID value.
When you meet up with an old friend and ask him or her what has happened since you last met, the first question you need to answer is when did you last meet. You use this point of reference as a basis to fill in what has happened since you last saw each other. Merge replication has a similar concept called a generation -- each node maintains its own generation number.
Each time the Merge Agent connects to a Subscriber, it determines the last generation number it synchronized with this Subscriber (it references the MSmerge_replinfo and MSmerge_genhistory tables for this) and asks each Subscriber what has changed since the last time it ran, which is everything above a certain generation value. It then gathers a list of GUIDs (from MSmerge_contents and MSmerge_tombstone) that have this generation value. This list represents the rows that have been modified (inserted, updated or deleted) since the last synchronization.
The synchronization process brings the list of GUIDs to the Publisher and compares the list with a list of what has changed on the Publisher since the last synchronization with that Subscriber. If nothing in the lists are the same, the synchronization process merely fills in the missing data. It will fire stored procedures on the Subscriber that look like sp_ins_GUID, sp_upd_GUID and sp_del_GUID (replacing one GUID with another GUID).
If one of the GUIDs occurs in the list originating at the Subscriber and it also occurs in the list that originates at the Publisher, you have a conflict. Conflicts can be classified into four groups:
- 1. A primary key collision -- the primary key value of a newly inserted row is the same on the Publisher and the Subscriber.
2. The same row is updated on the Publisher and deleted on the Subscriber between synchronizations.
3. The same row is updated on both the Publisher and Subscriber.
4. The same column is updated on the same row on both the Publisher and Subscriber (and you have enabled column-level tracking).
Conflicts can be viewed and rolled back using the Conflict Viewer. To view conflicts, connect with your Publisher, expand the databases folder, expand your database, expand the Publications folder, right click on your publication and select View Conflicts.
You have some measure of control on how conflicts are handled. The first is by setting the priority of your publication. When you are creating your Subscription you will get to a dialog box in the Create Subscription Wizard entitled Set Subscription Priority. By default your Publisher will win; in other words it will always win primary key collisions (and most other conflicts); the Subscribers row will be replaced by the Publishers row. You can set priorities on individual Subscribers to make them more autonomous than others or the Publisher. You can also select Custom Resolvers for each article as illustrated in Figure 1.
Figure 1. Select Custom Resolvers.
In our case, we have selected to use the DateTime resolver. This means that if we have a conflict (other than a primary key collision) the row that has the earlier value of the DateTimeColumn in the authors table will win. You also have the option of creating stored procedure and COM custom conflict resolvers (refer to Creating Merge Replication Custom Conflict Resolvers Using Visual Basic for more information).
Ideally you will reduce the probability of conflicts occurring by carefully partitioning your articles, by reducing the amount of data which is sent to your Subscriber (by filtering); and by running your Merge Agents as frequently as possible.
In part two, we'll take a look at 10 merge replication tips and tricks.
About the author: Hilary Cotter has been involved in IT for more than 20 years as a Web and database consultant. Microsoft first awarded Cotter the Microsoft SQL Server MVP award in 2001. Cotter received his bachelor of applied science degree in mechanical engineering from the University of Toronto and subsequently studied both economics at the University of Calgary and computer science at UC Berkeley. He is the author of a book on SQL Server transactional replication and is currently working on books on merge replication and Microsoft search technologies.
More information from SearchSQLServer.com
This was first published in August 2005