Problem solve Get help with specific problems with your technologies, process and projects.

Performing incremental loads and extracting data

SQL Server Security Expert Joe Toscano shares source data tables designed to easily perform an incremental load / extract.

I am writing a package using SQL Server Integration Services (SSIS). After the first full run of Extract, Transform, Load (ETL), I don't want to pick up all the data again from my different data sources. Is there a way of performing an incremental load using SSIS packages, so next time I run the package it picks up only changed data from the source and appends it in the destination?
I've had to perform incremental loads when extracting data to populate a data mart. To do this I wrote extraction stored procedures that were executed as a SQL task within my Data Transformation Service (DTS) packages. Our source data tables were designed to allow this to happen quite easily. Here are the columns in each source table that allowed us to easily perform an incremental load/extract:

Identity Column as PK or Unique Constraint (int column)

This column allowed us to see if a row in the source was new or already existed by comparing the source ID value with those that exist in the destination.

LastUpdateDate (DateTime Column)

This column was automatically updated by our application. If we found that our row was an existing row (based on the identity column value above) we were able to compare the source LastUpdateDate column with the DestinationUpdateDate column to see if the row's data changed since the last extraction.

To make things easier for us, we created a table that held the current Id value and current update date values for each table we extracted. This allowed for easy comparison during each load and it turned out that this table was also heavily reported against. (Reports contained the extract datetime so report viewers could clearly see how current the data was.)

If you do not have the luxury of having these columns present in your source you may have to key off of the source table's primary key. You may also have to perform comparisons of the columns that can potentially change.

Dig Deeper on SQL Server Interoperability