In previous versions of SQL Server, you had to manage the process of implementing slowly changing dimensions through custom logic, often embedded in Data Transformation Services (DTS) packages. Within a DTS package, you had to manually compare each relevant column's value between the dimension table and the corresponding data source. If any differences were detected, you'd have to decide whether you wanted to create another row or overwrite the existing row to manage slowly changing dimensions (SCD). SQL Server 2005 Integration Services has a transformation that makes implementing SCD a matter of running a wizard.
Slowly changing dimensions are a part of almost every data warehouse. Rarely does one see dimensions whose member values remain constant. For example, customer dimension attributes such as last name, occupation, home address and others are likely to change from time to time. Some changes are relatively minor. For example, if the department name changes from "marketing" to "marketing and sales," you probably won't need to track the history of changes. However, the adjustments history of customers' income level or income-to-debt ratio can be very significant for financial organizations that sell consumer loans. In such cases, data warehouse architects must come up with a way of tracking the changes.
There are multiple methods for maintaining changes in dimensions member values, but here are the most commonly used methods:
- Overwrite the existing dimension member values. This method does not provide a way of tracking the history of changes. This method is commonly referred to as "type 1 SCD."
- Create a new row with the current dimension members. Tag the existing row as expired. This method is commonly referred to as "type 2 SCD."
Here's how the Slowly Changing Dimensions Wizard works in SSIS and what it can do for a data warehouse architect:
- Before you run the SCD transformation wizard, you should configure your data source and destination. For this example, I used a SQL Server relational database as the data source and a data warehouse (which also happens to be a SQL Server database) as the destination. You could also use the SCD transformation for data sources and destinations other than SQL Server.
- SSIS projects are normally developed through Business Intelligence Development Studio (BIDS). First, create an SSIS project (File→New→Project, and choose Integration Services Project template), then add a data flow task to the control flow. Activate the data flow tab, then identify your data source and destination within BIDS. Next drag the SCD transformation to the Data Flow designer and drag the green arrow, which denotes output from your data source, to the SCD transformation. Then double-click SCD transformation to activate the SCD wizard.
- The initial screen simply welcomes you and informs you of the Slowly Changing Dimensions
wizard's capabilities in SQL Server Integration Services 2005. You can choose to omit this screen
in the future. The next screen allows you to pick one or multiple business keys. A business key
uniquely identifies each record in the table that is used to populate your dimension table.
Business keys are presumed to be static – once the key is assigned to the record it should not
change. Each dimension table could have one or multiple business keys.
Typically your dimension will also have a surrogate key – a column not found in the data source, but rather added to the dimension table during extraction, transformation and loading (ETL) process. Typically, surrogate keys are implemented as identity columns – these columns have no business meaning, but they uniquely identify each dimension record.
- Once you have identified the business key it's time to choose the columns (referred to as attributes) you wish to maintain within your SCD. The wizard gives you three options:
- Fixed attribute – values in this column should not change. If SSIS detects a change in the value of a column tagged as a fixed attribute, it could raise an exception and fail the execution of the package. You'll get an option to configure your package as such on the next screen. Detecting changes in values of a fixed attribute isn't really part of slowly changing dimensions implementation, but this functionality is useful in case you want to identify problems in your source data while your SSIS package is executing.
- Changing attribute – is a type 1 SCD; the values of a changing attribute are simply overwritten. History of changes is not recorded.
- Historical attribute – this is type 2 SCD. The modified value is saved in a new record and the existing record is tagged as expired.
For example, in Figure 1 (Slowly Changing Dimension Columns), the screenshot tags AccountDescription column as a changing attribute, AccountType as a historical attribute and Operator as a fixed attribute.
Figure 1. Change types: fixed attribute, changing attribute and historical attribute.
In Figure 2 (Historical Attribute Options), you can configure details of your implementation. You can advise SSIS to fail the package when the changes in fixed attributes are detected, and you
More on Business Intelligence in SQL Server: SQL Server Integration Services how-to
The second option is useful for attributes that will have duplicate values repeated across multiple dimension members. For example, the same account type will apply to multiple accounts. If the Account Type is modified from "Liabilities" to "Current Liabilities," you might wish to apply this change to all accounts that have this type.
- Next, the wizard allows you to choose how you want to identify current and expired records. You have two options: Tag the record as expired (or obsolete, outdated or another adjective of your choosing) or identify the expired records by adding the date of expiry to such records. If you use the latter option, you can use one of SSIS global variables to determine the date and time value for updating the expired dimension record. I prefer using a flag to identify the current record. All other records are considered expired. However, you can choose from a couple of options.
Figure 2. Configuring details of your implementation.
- The screen in Figure 3 (Data Flow Task) allows you to configure support for inferred dimension members. Inferred members are created when you load a record into the fact table and it has no corresponding dimension record. For correct analyses, each fact table record must be associated with a record in a dimension table.
There are multiple ways of handling this in a data warehouse. The SCD wizard allows you to create an inferred member record with all dimension attributes set to NULL. It is often more appropriate to have a special "unknown" member in your dimension (perhaps with the dimension surrogate key equal to -999 or another value that stands out) as opposed to creating inferred members. In this example we're only concerned with type 1 and type 2 SCD and not with the inferred members. So let's uncheck the default selection and click next.
That is all the information the wizard needs in order to come up with the entire data flow required to maintain the SCD. As you can see from the screenshot, the wizard does quite a bit of work for you:
Figure 3. Use the Data Flow Task to configure support for inferred dimension members.
- You can customize the data flow the wizard generated to suit your application's needs. But
first, let's examine the newly created transformations: The OLE DB command transformation updates
the record_expired attribute if AccountType value has been altered. This transformation receives
input from the Derived Column transformation that determines the value to insert into the
record_expired column. If the value in the data source changes, SSIS creates a new record so the
Union All transformation combines this record with the existing data source records. The OLE DB
Command 1 transformation updates the value of AccountDescription – the changing attribute based on
the business key (AccountCodeAlternateKey) with a command similar to the following:
UPDATE [dbo].[DimAccount] SET [AccountDescription] = ?
WHERE [AccountCodeAlternateKey] = ?
Finally, the Insert Destination transformation populates the destination table DimAccount.
- Now, let's update a few members in the transactional database and then run the package to see if the SCD wizard maintains the changing dimensions correctly. Indeed we can confirm that changing AccountType attribute value from "assets" to "wonderful assets" triggers the creation of a new record in the account dimension and tags the newly created record as current:
AccountCodeAlternateKey AccountDescription AccountType Operator record_expired 1120 Receivables Wonderful Assets + Current 1120 Receivables Assets + Null
Changes in AccountDescription column, on the other hand, do not create a new row – they simply overwrite the existing value:
AccountCodeAlternateKey AccountDescription AccountType Operator record_expired 10 Active Asset Assets + Null
As I mentioned at the beginning of this tip, there are other methods for maintaining slowly changing dimensions in your data warehouse that are not available through the wizard. However, the ability to implement type 1 and type 2 slowly changing dimensions by virtue of a few clicks within SSIS certainly could speed up ETL development.
ABOUT THE AUTHOR
Baya Pavliashvili is a database consultant helping his customers develop highly available and scalable applications with SQL Server and Analysis Services. Throughout his career he has managed database administrator teams and databases of terabyte caliber. Baya's primary areas of expertise are performance tuning, replication and data warehousing. He can be reached at firstname.lastname@example.org.
This was first published in October 2007