SQL Server replication, a software package included in Microsoft SQL Server, is used to move data from server to server in a transitionally consistent state from one instance to another.
SQL Server replication can
- Be unidirectional or bidirectional
- Be transmitted on a schedule or in real time
- Involve moving data to a single instance or to multiple instances
Replication topology consists of three servers – the subscriber, publisher and distributor. The subscribers are the servers receiving the data. The publisher is the server that holds the initial set of data being made available to the subscribers. The distributor is the server that contains many of the settings. It also holds the data as it is moved from the publisher to the subscribers.
There are three different replication techniques available. They are snapshot replication, merge replication and transactional replication.
Snapshot replication is a single unidirectional push of data. When updated data is fed from the publisher to the subscribers, all of the data is sent each time.
Merge replication is a bidirectional replication that transmits data either in real time, or on a schedule. Merge replication is the only
Transactional replication is unidirectional from the publisher to the subscribers. Data can be sent on a schedule or in real time. As data is transmitted to the subscriber, all data changes are processed in the order they were made on the publisher.
When configuring replication, you can select which objects to replicate. Each database object you replicate is called an article. Articles can be tables, views, stored procedures, functions, rules, data types, etc. When selecting database objects, I recommend selecting all of the dependent objects,
although it's not required. And while it's not required to select child tables, you will want to make sure to replicate any user-defined data types and rules that are used by the tables. Data types can be moved manually to the subscriber, but they must exist on the remote server or the table creation will fail.
When configuring articles, you can configure filters on the articles. These filters are effectively WHERE clauses telling SQL Server replication to transmit only a subset of the data in the table. You can use any column that exists in the table as part of the vertical filter. Horizontal filters can be used too. Horizontal filters are a fancy way of saying that a WHERE clause is applied to the article when data is being replicated and only the data which matches the WHERE clause is transmitted to the subscribers.
When to use which topology
Your key decision here is selecting which replication topology to use. Selecting the wrong topology will give you very unsatisfactory end results.
Use snapshot replication when you want an occasional data push done. Because all data is moved each time the snapshot is pushed, it uses a good amount of bandwidth. Only use snapshot replication over a slow WAN if the amount of data changes to be replicated is greater than the size of the initial data set. In other words, if a large portion of the data is being updated over and over, snapshot will probably be the right technique. If that's not the case, then it won't be.
Merge replication should be used when you need to transmit changes from the publisher to the subscriber, as well as from the subscriber back to the publisher. When you have multiple subscribers, in time the changes are replicated out from the publisher to all the subscribers in the network.
The transactional technique is probably the most common form of replication. It is used to transmit data changes in near real time (or on a schedule) to one or more subscribers. Transactional replication is most often used to move data from server to server in real time.
No matter which SQL Server replication topology you use, each subscriber is completely independent from the other subscribers. Subscribers can fall behind for a number of reasons, including network congestion, disk I/O congestion and locking and blocking by user processes. Because they are independent of each other, a single subscriber slowing down will not cause the other subscribers to slow down.
Data is moved by replication agents and there are several replication agents set up for each publication. The snapshot agent is used by all three replication topologies. When replication is first set up, a snapshot of the articles is taken and uploaded to each of the subscribers. When using merge replication or transactional replication, a log reader agent will run on the distributor and capture the changes in the transaction log. These changes are then logged into the distribution database.
More on SQL Server replication and related topics
Check out this tip on splitting SQL Server backups to multiple files
Learn to use SQL Compare to compare and synchronize databases
Find out what code to use for SQL Server database mirroring
When using transactional replication, the changes are read from the distribution database and applied to the subscriber by the distribution agent. When using merge replication, the changes are read from the distribution database using the merge agent. The merge agent also pulls data changes from the subscriber and uploads them to the publisher for distribution out to the other subscribers.
When setting up subscriptions, you have two options for setting up each subscriber. You can use a push subscription or a pull subscription. A push subscription is when the distribution agent runs on the server hosting the distribution database. A pull subscription is when the distribution agent runs on the server hosting the subscription database.
The advantage of push subscriptions is that the subscription agents are centralized on the distributor. This limits your management overhead and keeps the load of the distribution (or merge) agent off of the subscriber. The advantage of pull subscriptions is that the workload of the distribution agents is spread out across all of the subscribers. Pull subscriptions are recommended for use when the distribution database is located on the same server or instance as the publisher.
When setting up your distributor, you'll be prompted to select a folder for storing the snapshots. When using all push subscriptions, this can be a local drive path. When using pull subscriptions or a combination of push and pull subscriptions, you must specify a network share. This network share should not be an administrative share, and the domain accounts running the SQL Server agent on each of the subscribers need to have read and write access to the network share.
When running all push subscriptions and you have more than 30 or so subscriptions, you'll see timeout errors when the subscriptions attempt to start. The quick fix for this is to edit the job that is the distribution or merge agent. Edit step 2 and change the job type to Operating System command. Then put the full path and executable name of the replication command in front of the existing parameters. The default path for SQL Server 2008 is C:\Program Files\Microsoft SQL Server\100\COM (replace 100 with 80 for SQL 2000, and 90 for SQL 2005). When running a distribution agent, use distrib.exe, and when running a merge agent, use replmerg.exe.
Troubleshooting SQL Server replication can be tricky. The agents don't give much error data by default. You can adjust the amount of error data you receive by changing the -OutputVerboseLevel switch in the job properties. By increasing the number from the default, more error data will be logged to the job step. You can also stop the SQL Agent job that runs the agent, and run the command from a DOS command prompt to easily see more error data.
When SQL Server replication has a lot of data to transmit, a decent amount of network bandwidth is required in order to keep it up to date. If enough bandwidth is not available, then replication will slowly fall further and further behind. If you have a low-bandwidth, low-latency network, it can help to increase the number of threads being used by adding the –SubscriptionStreams switch (or increasing the number if the switch is already there). If you have a high-latency network, then increasing this setting may not increase performance because transactional integrity is maintained between the streams.
ABOUT THE AUTHOR
Denny Cherry has over a decade of experience managing SQL Server, including MySpace.com's over 175-million-user installation, one of the largest in the world. Denny's areas of expertise include system architecture, performance tuning, replication and troubleshooting. He currently holds several Microsoft certifications related to SQL Server and is a Microsoft MVP.
Check out his blog: SQL Server with Mr. Denny.
Do you have a comment on this tip? Let us know.
This was first published in October 2008