There are some issues associated with managing identity columns with replication in your SQL Server database. As...
with previous releases of software, SQL Server 2005 requires that database administrators use special care when replicating tables with identity columns.
First, let me offer a little background about identity columns to help you understand why they're different from any other column with a numeric data type. Identity columns have monotonously increasing numeric values that SQL Server assigns to each row automatically when the row is created. Normally, identity columns have INT or BIGINT data types, although you could use other numeric data types as well. By default, SQL Server seeds such columns at 1 and increments by 1, but you can change to the seed and increment of your liking.
Identity columns are good candidates for a table's primary key because they're unique for each row and cannot be updated without deleting and re-creating the row.
SQL Server replication scenarios
To make this tip easier to follow, let's imagine we're trying to replicate a table with the following schema:
Next, let's consider various replication scenarios where this table could be used:
Publisher and subscriber have the same data, and the subscribing database is used for read-only purposes. This is the simplest scenario; you don't need AccountKey column to have an identity property on the subscriber because you'll never add any rows to it. Instead, as records are added to the publisher database, they'll also be added to the subscriber through replication.
Multiple publishers replicate data to a single subscriber. With this scenario, we still presume that data is replicated in one direction, from publishers to subscriber(s), and no direct data changes occur on the subscriber. Now things are a bit more complicated, though, because we don't want duplicate values for primary key column. No worries – we can seed the AccountKey column at different values on each publisher. For example, if we expect a lot of records to be inserted on each of the three publishers, we can seed them as follows:
Publisher 1:[AccountKey] [int] IDENTITY(1,1)
Publisher 2: [AccountKey] [int] IDENTITY(100000000,1)
Publisher 3: [AccountKey] [int] IDENTITY(200000000,1)
This configuration would allow users to add up to 100 million records to the DimAccount table in each database. Furthermore, it also gives us an easy way of identifying records created on each server. What if we need to add more records on any server? What if our application grows by leaps and bounds and we need to add dozens of new servers with dimAccount tables? No need to panic. First, we can change the identity seed using DBCC CHECKIDENT statement at any time. So if we reach the identity seed of 199,999,999 on publisher 1 and we're about to step into the range of the second publisher, we can change the identity seed on the first server as follows:
DBCC CHECKIDENT (dimAccount, RESEED, 400000000)
Keep in mind that INT data type accepts negative values as well, and this data type can support up to 4 billion records (-2 billion to 2 billion). If you need to store more than 2 billion records, you'll need to switch to the BIGINT data type, which supports a huge range of values, between –9,223,372,036,854,775,808 and 9,223,372,036,854,775,807. I told you there was no need to worry!
Records can be added to the subscriber database, but they don't need to be replicated to publisher(s). This case is somewhat tricky. At first you might think we can add AccountKey with identity property and set its seed to a huge value, perhaps 1,000,000,000; but that alone won't work, for two reasons. First, with such architecture, an attempt to add a replicated record to the subscriber will fail. This is because you cannot explicitly specify the value of an identity column unless you issue SET IDENTITY_INSERT ON statement.
Second, if you enable the IDENTITY INSERT and add a record with identity value of 200,000,001, you'll effectively reset the identity seed on the subscriber. The next record you add directly (not through replication) to the subscriber will have the identity value of 200,000,002, which overlaps with the range of values we assigned to a publisher. Fortunately, we can use IDENTITY, NOT FOR REPLICATION option when defining the column on the subscriber. This option advises SQL Server not to override the current identity seed when records are added to the subscriber database through replication.
Records can be added on the subscriber and must also be delivered to publishers. Allow me to digress for a second and offer a personal opinion about this scenario. Although updateable subscriptions have been supported for years, I highly recommend using this option sparingly, i.e., only when absolutely necessary. The typical developer mentality is to use this (or any other) option because it's available. This is why it's crucial to separate developer and DBA duties.
As a DBA, you need to minimize the overhead on your server, because when systems behave poorly, all fingers point at you. Always require a valid business requirement for updateable subscriptions. "Do it because we want you to" is not a valid reason. Replicating transactions bi-directionally involves a fair amount of overhead. Realize that a transaction cannot be committed on the publisher until it is also committed on the subscriber. This functionality is implemented through replication triggers and what is referred to as the two-phase commit.
In this case, we must use IDENTITY, NOT FOR REPLICATION option both on publisher(s) and on subscribers. Once again, please do not read this tip as a recommendation to use updateable subscriptions when they're not necessary. I provide additional guidance for this scenario in the following section.
A special case of scenario 4 is the "peer-to-peer" publication, available only with SQL Server 2005. I'll save the discussion of peer-to-peer publications for another tip.
Options to manage identity seeds for replicated tables
If you must use updateable subscriptions, you need to define identity ranges on publisher and subscriber servers to avoid the creation of duplicate primary keys.
SQL Server 2005 supports several options for managing identity seed ranges. Note that you can set these options only when you first add the table article to the publication. If you need to change the identity management options, you must remove the article from the publication and add it back. Depending on the table size and your application's availability requirements, dropping and re-adding articles on demand might not be an option. Be sure to carefully choose the proper option.
More on SQL replication
- Guide: Replication techniques in SQL Server
- Podcast: SQL Server replication
Here is the summary of identity management options:
Manual – default and self-explanatory option. SQL Server doesn't manage identity seeds for you. Database administrator must explicitly configure ranges on publisher and subscriber. You can implement identity ranges by simply adding CHECK constraints to the replicated table on the publisher and subscriber(s). The same process works if you're subscribing to transactions replicated by multiple publishers. For example, I could add the following check constraints to dimAccount:
-- on the publisher:
ALTER TABLE DimAccount
ADD CONSTRAINT ck_AcctKey
CHECK NOT FOR REPLICATION (AccountKey < 1000)
-- on the subscriber:
ALTER TABLE DimAccount
ADD CONSTRAINT ck_AcctKey
CHECK NOT FOR REPLICATION (AccountKey >= 1000)
Now, if identity value reaches 1,000 on the publisher, SQL Server will return the following error:
The INSERT statement conflicted with the CHECK constraint "ck_AcctKey".
The conflict occurred in database "AdventureWorksDW", table "dbo.DimAccount",
To resolve the problem, I should find the highest identity value on the subscriber, then re-seed the identity of the replicated table on the publisher so that its primary key values do not overlap with those found on the subscriber.
-- execute on the subscriber:
SELECT MAX(AccountKey) FROM dimAccount
--let's suppose the previous statement returned 1510
-- next reseed the publisher and allow the
subscriber plenty of room to grow:
-- execute this on the publisher so that the
next identity value is 5000:
DBCC CHECKIDENT('DimAccount', RESEED, 5000)
Keep in mind that replication will copy check constraints from publisher to subscriber(s) by default. If you're managing identity seeds manually, be sure to change this default behavior (using article properties' dialog) so that these check constraints aren't replicated.
Automatic – SQL Server automatically assigns identity ranges on publisher and subscriber based on additional parameters you specify. With automatic identity management, you need to provide the following parameters:
Publisher range size – range of identity values on the publisher
Subscriber range size – range of identity values on the subscriber
Once you specify these values, SQL Server automatically adds a check constraint to the replicated table on the publisher, as well as subscriber servers. The check constraint on the publisher allows identity values between the current highest value plus identity seed (for example 500 + 1= 501) and current max value plus publisher size range (for example, 500 + 10,000 = 10,500). The subscriber is seeded at one plus the maximum value allowed on the publisher; continuing from previous examples, the seed on the subscriber would be 10,501. The value of "subscriber range size" parameter is used to determine the upper limit. If the identity range "fills up" and you attempt to add a new record, you will get the following error message:
The insert failed. It conflicted with an identity range check constraint in database 'XYZ', replicated table 'dbo.DimAccount', column 'AccountKey'. If the identity column is automatically managed by replication, update the range as follows: for the Publisher, execute sp_adjustpublisheridentityrange; for the Subscriber, run the Distribution Agent or the Merge Agent.
As the error message indicates, you can fix the problem by executing a system procedure as follows:
This will adjust the check constraint and give the table on the publisher server a new range of identity values to work with.
None – this option is supported only for backward compatibility with previous versions. If you use a wizard to migrate your replicated databases from prior versions to SQL Server 2005, by default this option will be chosen for tables with identity columns. The net effect of this option is that you must manage identity values manually.
In this tip, I discussed various scenarios for replicating tables that have identity columns and options for identity management. You don't have to ditch identity columns to use replication, just handle them with care.
ABOUT THE AUTHOR:
Baya Pavliashvili is a database consultant helping his customers develop highly available and scalable applications with SQL Server and Analysis Services. Throughout his career, he has managed database administrator teams and databases of terabyte caliber. Baya's primary areas of expertise are performance tuning, replication and data warehousing. He can be reached at [email protected]