Home > SQL Server Tips > Database Management and Administration > Managing identity columns with replication in SQL Server
SQL Server Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

DATABASE MANAGEMENT AND ADMINISTRATION

Managing identity columns with replication in SQL Server


Baya Pavliashvili
08.03.2007
Rating: -3.75- (out of 5)


Expert advice on database administration
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


There are some issues associated with managing identity columns with replication in your SQL Server database. As with previous releases of software, SQL Server 2005 requires that database administrators use special care when replicating tables with identity columns.

First, let me offer a little background about identity columns to help you understand why they're different from any other column with a numeric data type. Identity columns have monotonously increasing numeric values that SQL Server assigns to each row automatically when the row is created. Normally, identity columns have INT or BIGINT data types, although you could use other numeric data types as well. By default, SQL Server seeds such columns at 1 and increments by 1, but you can change to the seed and increment of your liking.

Identity columns are good candidates for a table's primary key because they're unique for each row and cannot be updated without deleting and re-creating the row.

SQL Server replication scenarios

To make this tip easier to follow, let's imagine we're trying to replicate a table with the following schema:

Click here to view schema.

Next, let's consider various replication scenarios where this table could be used:

  1. Publisher and subscriber have the same data, and the subscribing database is used for read-only purposes. This is the simplest scenario; you don't need AccountKey column to have an identity property on the subscriber because you'll never add any rows to it. Instead, as records are added to the publisher database, they'll also be added to the subscriber through replication.
  2. Multiple publishers replicate data to a single subscriber. With this scenario, we still presume that data is replicated in one direction, from publishers to subscriber(s), and no direct data changes occur on the subscriber. Now things are a bit more complicated, though, because we don't want duplicate values for primary key column. No worries – we can seed the AccountKey column at different values on each publisher. For example, if we expect a lot of records to be inserted on each of the three publishers, we can seed them as follows:

    This configuration would allow users to add up to 100 million records to the DimAccount table in each database. Furthermore, it also gives us an easy way of identifying records created on each server. What if we need to add more records on any server? What if our application grows by leaps and bounds and we need to add dozens of new servers with dimAccount tables? No need to panic. First, we can change the identity seed using DBCC CHECKIDENT statement at any time. So if we reach the identity seed of 199,999,999 on publisher 1 and we're about to step into the range of the second publisher, we can change the identity seed on the first server ...


    Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



    RELATED CONTENT
    SQL Server Replication
    SQL Server Mailbag: Migrating down to Standard Edition
    Push vs. pull: Configuring SQL Server replication
    Top load balancing methods for SQL Server
    New replication features in SQL Server 2008 and what they mean to you
    SQL Server database replication tutorial
    Licensing a standby server for SQL Server replication
    Upgrade live applications to SQL Server 2005 for high availability
    Tool to synchronize two SQL Server databases
    Simplify SQL Server replication
    Replication techniques in SQL Server

    Microsoft SQL Server 2005
    End of life comes for SQL Server 2005 SP2, 2008
    SQL Server Reporting Services Fast Guide
    SQL Server Service Broker Tutorial and Reference Guide
    Tips for tuning SQL Server 2005 to improve reporting performance
    SQL Server consolidation: Why it's an optimization technique
    Parent-child dimensions in SQL Server 2005 with Analysis Services MDX
    Enforcing data integrity in a SQL Server database
    SSIS error message due to installation problem on SQL Server 2005
    Should you upgrade to SQL Server 2005 or SQL Server 2008?
    Basics for working with DATETIME and SMALLDATETIME in SQL Server 2005
    Microsoft SQL Server 2005 Research

    Microsoft SQL Server Performance Monitoring and Tuning
    SQL Server Mailbag: CALs, witnesses and unwanted changes
    SQL Server Mailbag: Data restoration and DB property management
    Working with IntelliSense in SQL Server 2008 Management Studio
    SQL Server Mailbag: Stored procedures, triggers and SSRS reports
    Troubleshooting Distributed Transaction Coordinator errors in SQL Server
    Clearing the Windows page file and its effect on server performance
    Optimizing SQL Server indexes –- even when they're not your indexes
    Performance implications of transaction log autogrowth in SQL Server
    The short course on how SQL Server really works
    Determining the source of full transaction logs in SQL Server

    RELATED GLOSSARY TERMS
    Terms from Whatis.com − the technology online dictionary
    replication  (SearchSQLServer.com)

    RELATED RESOURCES
    2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
    Search Bitpipe.com for the latest white papers and business webcasts
    Whatis.com, the online computer dictionary


    as follows:

    Keep in mind that INT data type accepts negative values as well, and this data type can support up to 4 billion records (-2 billion to 2 billion). If you need to store more than 2 billion records, you'll need to switch to the BIGINT data type, which supports a huge range of values, between –9,223,372,036,854,775,808 and 9,223,372,036,854,775,807. I told you there was no need to worry!

  3. Records can be added to the subscriber database, but they don't need to be replicated to publisher(s). This case is somewhat tricky. At first you might think we can add AccountKey with identity property and set its seed to a huge value, perhaps 1,000,000,000; but that alone won't work, for two reasons. First, with such architecture, an attempt to add a replicated record to the subscriber will fail. This is because you cannot explicitly specify the value of an identity column unless you issue SET IDENTITY_INSERT ON statement.
  4. Second, if you enable the IDENTITY INSERT and add a record with identity value of 200,000,001, you'll effectively reset the identity seed on the subscriber. The next record you add directly (not through replication) to the subscriber will have the identity value of 200,000,002, which overlaps with the range of values we assigned to a publisher. Fortunately, we can use IDENTITY, NOT FOR REPLICATION option when defining the column on the subscriber. This option advises SQL Server not to override the current identity seed when records are added to the subscriber database through replication.

  5. Records can be added on the subscriber and must also be delivered to publishers. Allow me to digress for a second and offer a personal opinion about this scenario. Although updateable subscriptions have been supported for years, I highly recommend using this option sparingly, i.e., only when absolutely necessary. The typical developer mentality is to use this (or any other) option because it's available. This is why it's crucial to separate developer and DBA duties.
  6. As a DBA, you need to minimize the overhead on your server, because when systems behave poorly, all fingers point at you. Always require a valid business requirement for updateable subscriptions. "Do it because we want you to" is not a valid reason. Replicating transactions bi-directionally involves a fair amount of overhead. Realize that a transaction cannot be committed on the publisher until it is also committed on the subscriber. This functionality is implemented through replication triggers and what is referred to as the two-phase commit.

    In this case, we must use IDENTITY, NOT FOR REPLICATION option both on publisher(s) and on subscribers. Once again, please do not read this tip as a recommendation to use updateable subscriptions when they're not necessary. I provide additional guidance for this scenario in the following section.

  7. A special case of scenario 4 is the "peer-to-peer" publication, available only with SQL Server 2005. I'll save the discussion of peer-to-peer publications for another tip.

Options to manage identity seeds for replicated tables

If you must use updateable subscriptions, you need to define identity ranges on publisher and subscriber servers to avoid the creation of duplicate primary keys.

SQL Server 2005 supports several options for managing identity seed ranges. Note that you can set these options only when you first add the table article to the publication. If you need to change the identity management options, you must remove the article from the publication and add it back. Depending on the table size and your application's availability requirements, dropping and re-adding articles on demand might not be an option. Be sure to carefully choose the proper option.

Here is the summary of identity management options:

  1. Manual – default and self-explanatory option. SQL Server doesn't manage identity seeds for you. Database administrator must explicitly configure ranges on publisher and subscriber. You can implement identity ranges by simply adding CHECK constraints to the replicated table on the publisher and subscriber(s). The same process works if you're subscribing to transactions replicated by multiple publishers. For example, I could add the following check constraints to dimAccount:
  2. Now, if identity value reaches 1,000 on the publisher, SQL Server will return the following error:

    To resolve the problem, I should find the highest identity value on the subscriber, then re-seed the identity of the replicated table on the publisher so that its primary key values do not overlap with those found on the subscriber.

    Keep in mind that replication will copy check constraints from publisher to subscriber(s) by default. If you're managing identity seeds manually, be sure to change this default behavior (using article properties' dialog) so that these check constraints aren't replicated.

  3. Automatic – SQL Server automatically assigns identity ranges on publisher and subscriber based on additional parameters you specify. With automatic identity management, you need to provide the following parameters:
    • Publisher range size – range of identity values on the publisher
    • Subscriber range size – range of identity values on the subscriber

    Once you specify these values, SQL Server automatically adds a check constraint to the replicated table on the publisher, as well as subscriber servers. The check constraint on the publisher allows identity values between the current highest value plus identity seed (for example 500 + 1= 501) and current max value plus publisher size range (for example, 500 + 10,000 = 10,500). The subscriber is seeded at one plus the maximum value allowed on the publisher; continuing from previous examples, the seed on the subscriber would be 10,501. The value of "subscriber range size" parameter is used to determine the upper limit. If the identity range "fills up" and you attempt to add a new record, you will get the following error message:

    As the error message indicates, you can fix the problem by executing a system procedure as follows:

    This will adjust the check constraint and give the table on the publisher server a new range of identity values to work with.

  4. None – this option is supported only for backward compatibility with previous versions. If you use a wizard to migrate your replicated databases from prior versions to SQL Server 2005, by default this option will be chosen for tables with identity columns. The net effect of this option is that you must manage identity values manually.

Summary

In this tip, I discussed various scenarios for replicating tables that have identity columns and options for identity management. You don't have to ditch identity columns to use replication, just handle them with care.


ABOUT THE AUTHOR:   

Baya Pavliashvili is a database consultant helping his customers develop highly available and scalable applications with SQL Server and Analysis Services. Throughout his career, he has managed database administrator teams and databases of terabyte caliber. Baya's primary areas of expertise are performance tuning, replication and data warehousing. He can be reached at baya@bayasqlconsulting.com.
Copyright 2007 TechTarget


Rate this Tip
To rate tips, you must be a member of SearchSQLServer.com.
Register now to start rating these tips. Log in if you are already a member.




DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



SQL Server Development - .NET, C#, T-SQL, Visual Basic
HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts