Home > SQL Server Tips > Data Warehousing and Business Intelligence > Handle slowly changing dimensions with SSIS 2005 wizard
SQL Server Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

DATA WAREHOUSING AND BUSINESS INTELLIGENCE

Handle slowly changing dimensions with SSIS 2005 wizard


Baya Pavliashvili
10.04.2007
Rating: -3.71- (out of 5)


Expert advice on database administration
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


In previous versions of SQL Server, you had to manage the process of implementing slowly changing dimensions through custom logic, often embedded in Data Transformation Services (DTS) packages. Within a DTS package, you had to manually compare each relevant column's value between the dimension table and the corresponding data source. If any differences were detected, you'd have to decide whether you wanted to create another row or overwrite the existing row to manage slowly changing dimensions (SCD). SQL Server 2005 Integration Services has a transformation that makes implementing SCD a matter of running a wizard.

Slowly changing dimensions are a part of almost every data warehouse. Rarely does one see dimensions whose member values remain constant. For example, customer dimension attributes such as last name, occupation, home address and others are likely to change from time to time. Some changes are relatively minor. For example, if the department name changes from "marketing" to "marketing and sales," you probably won't need to track the history of changes. However, the adjustments history of customers' income level or income-to-debt ratio can be very significant for financial organizations that sell consumer loans. In such cases, data warehouse architects must come up with a way of tracking the changes.

There are multiple methods for maintaining changes in dimensions member values, but here are the most commonly used methods:

  • Overwrite the existing dimension member values. This method does not provide a way of tracking the history of changes. This method is commonly referred to as "type 1 SCD."
  • Create a new row with the current dimension members. Tag the existing row as expired. This method is commonly referred to as "type 2 SCD."
  • Here's how the Slowly Changing Dimensions Wizard works in SSIS and what it can do for a data warehouse architect:

    1. Before you run the SCD transformation wizard, you should configure your data source and destination. For this example, I used a SQL Server relational database as the data source and a data warehouse (which also happens to be a SQL Server database) as the destination. You could also use the SCD transformation for data sources and destinations other than SQL Server.
    2. SSIS projects are normally developed through Business Intelligence Development Studio (BIDS). First, create an SSIS project (File→New→Project, and choose Integration Services Project template), then add a data flow task to the control flow. Activate the data flow tab, then identify your data source and destination within BIDS. Next drag the SCD transformation to the Data Flow designer and drag the green arrow, which denotes output from your data source, to the SCD transformation. Then double-click SCD transformation to activate the SCD wizard.
    3. The initial screen simply welcomes you and informs you of the Slowly Changing Dimensions wizard's capabilities in SQL ...

      Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



      RELATED CONTENT
      SQL Server Business Intelligence (BI) and Data Warehousing
      Using the Pivot transformation in SQL Server Integration Services
      DBA career paths could lead to business intelligence
      Are data warehouses made for the cloud?
      Q&A: Business intelligence gets a facelift in SQL Server 2008 R2
      Project Gemini gets a new name, Madison earns buzz
      Speed up reports in SQL Server Reporting Services with caching
      Data Transformation Services vs. SSIS: The key differences
      Using package configurations in SQL Server Integration Services (SSIS)
      How SQL Server 2008 components impact SharePoint implementations
      Achieving high availability and disaster recovery with SharePoint databases

      SQL Server Interoperability
      Utilize SSAS for data predictions and classification using Excel
      How to create a SQL Server linked server to DB2
      Export SQL Server data to an Excel file using SSIS and Visual Studio
      Performance tuning for SQL Server 2005 and Exchange running on SBS
      Custom VB.Net scripting in SQL Server Integration Services
      Can SQL Server 2000 work on Windows 2003 platform?
      Query to search text in old DTS packages in SQL Server?
      Run DTS packages within SQL Server Integration Services
      SQL Server Integration Services how-to
      Compatibility of SQL Server 2005 and 2000 coexisting

      Microsoft SQL Server 2005
      End of life comes for SQL Server 2005 SP2, 2008
      SQL Server Reporting Services Fast Guide
      SQL Server Service Broker Tutorial and Reference Guide
      Tips for tuning SQL Server 2005 to improve reporting performance
      SQL Server consolidation: Why it's an optimization technique
      Parent-child dimensions in SQL Server 2005 with Analysis Services MDX
      Enforcing data integrity in a SQL Server database
      SSIS error message due to installation problem on SQL Server 2005
      Should you upgrade to SQL Server 2005 or SQL Server 2008?
      Basics for working with DATETIME and SMALLDATETIME in SQL Server 2005
      Microsoft SQL Server 2005 Research

      RELATED GLOSSARY TERMS
      Terms from Whatis.com − the technology online dictionary
      data aggregation  (SearchSQLServer.com)
      data preprocessing  (SearchSQLServer.com)
      data warehouse  (SearchSQLServer.com)
      FileMaker  (SearchSQLServer.com)
      GIS  (SearchSQLServer.com)
      MOLAP  (SearchSQLServer.com)
      pivot table  (SearchSQLServer.com)
      Quiz: SQL Server 2000  (SearchSQLServer.com)
      SQL  (SearchSQLServer.com)
      T-SQL  (SearchSQLServer.com)

      RELATED RESOURCES
      2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
      Search Bitpipe.com for the latest white papers and business webcasts
      Whatis.com, the online computer dictionary


      Server Integration Services 2005. You can choose to omit this screen in the future. The next screen allows you to pick one or multiple business keys. A business key uniquely identifies each record in the table that is used to populate your dimension table. Business keys are presumed to be static – once the key is assigned to the record it should not change. Each dimension table could have one or multiple business keys.

      Typically your dimension will also have a surrogate key – a column not found in the data source, but rather added to the dimension table during extraction, transformation and loading (ETL) process. Typically, surrogate keys are implemented as identity columns – these columns have no business meaning, but they uniquely identify each dimension record.

    4. Once you have identified the business key it's time to choose the columns (referred to as attributes) you wish to maintain within your SCD. The wizard gives you three options:
      • Fixed attribute – values in this column should not change. If SSIS detects a change in the value of a column tagged as a fixed attribute, it could raise an exception and fail the execution of the package. You'll get an option to configure your package as such on the next screen. Detecting changes in values of a fixed attribute isn't really part of slowly changing dimensions implementation, but this functionality is useful in case you want to identify problems in your source data while your SSIS package is executing.
      • Changing attribute – is a type 1 SCD; the values of a changing attribute are simply overwritten. History of changes is not recorded.
      • Historical attribute – this is type 2 SCD. The modified value is saved in a new record and the existing record is tagged as expired.

      For example, in Figure 1 (Slowly Changing Dimension Columns), the screenshot tags AccountDescription column as a changing attribute, AccountType as a historical attribute and Operator as a fixed attribute.

      [IMAGE]

      Figure 1. Change types: fixed attribute, changing attribute and historical attribute.

      In Figure 2 (Historical Attribute Options), you can configure details of your implementation. You can advise SSIS to fail the package when the changes in fixed attributes are detected, and you can change all matching records, including the expired records when modifications are detected in the changing attribute. See my brief discussion earlier regarding the first option.

      The second option is useful for attributes that will have duplicate values repeated across multiple dimension members. For example, the same account type will apply to multiple accounts. If the Account Type is modified from "Liabilities" to "Current Liabilities," you might wish to apply this change to all accounts that have this type.

    5. Next, the wizard allows you to choose how you want to identify current and expired records. You have two options: Tag the record as expired (or obsolete, outdated or another adjective of your choosing) or identify the expired records by adding the date of expiry to such records. If you use the latter option, you can use one of SSIS global variables to determine the date and time value for updating the expired dimension record. I prefer using a flag to identify the current record. All other records are considered expired. However, you can choose from a couple of options.
    6. [IMAGE]

      Figure 2. Configuring details of your implementation.

    7. The screen in Figure 3 (Data Flow Task) allows you to configure support for inferred dimension members. Inferred members are created when you load a record into the fact table and it has no corresponding dimension record. For correct analyses, each fact table record must be associated with a record in a dimension table.
    8. There are multiple ways of handling this in a data warehouse. The SCD wizard allows you to create an inferred member record with all dimension attributes set to NULL. It is often more appropriate to have a special "unknown" member in your dimension (perhaps with the dimension surrogate key equal to -999 or another value that stands out) as opposed to creating inferred members. In this example we're only concerned with type 1 and type 2 SCD and not with the inferred members. So let's uncheck the default selection and click next.

      That is all the information the wizard needs in order to come up with the entire data flow required to maintain the SCD. As you can see from the screenshot, the wizard does quite a bit of work for you:

      [IMAGE]

      Figure 3. Use the Data Flow Task to configure support for inferred dimension members.

    9. You can customize the data flow the wizard generated to suit your application's needs. But first, let's examine the newly created transformations: The OLE DB command transformation updates the record_expired attribute if AccountType value has been altered. This transformation receives input from the Derived Column transformation that determines the value to insert into the record_expired column. If the value in the data source changes, SSIS creates a new record so the Union All transformation combines this record with the existing data source records. The OLE DB Command 1 transformation updates the value of AccountDescription – the changing attribute based on the business key (AccountCodeAlternateKey) with a command similar to the following:

      UPDATE [dbo].[DimAccount] SET [AccountDescription] = ?
      WHERE [AccountCodeAlternateKey] = ?

      Finally, the Insert Destination transformation populates the destination table DimAccount.

    10. Now, let's update a few members in the transactional database and then run the package to see if the SCD wizard maintains the changing dimensions correctly. Indeed we can confirm that changing AccountType attribute value from "assets" to "wonderful assets" triggers the creation of a new record in the account dimension and tags the newly created record as current:

    AccountCodeAlternateKey AccountDescription AccountType Operator record_expired
    1120 Receivables Wonderful Assets + Current
    1120 Receivables Assets + Null

    Changes in AccountDescription column, on the other hand, do not create a new row – they simply overwrite the existing value:

    AccountCodeAlternateKey AccountDescription AccountType Operator record_expired
    10 Active Asset Assets + Null

    As I mentioned at the beginning of this tip, there are other methods for maintaining slowly changing dimensions in your data warehouse that are not available through the wizard. However, the ability to implement type 1 and type 2 slowly changing dimensions by virtue of a few clicks within SSIS certainly could speed up ETL development.


    ABOUT THE AUTHOR:   

    Baya Pavliashvili is a database consultant helping his customers develop highly available and scalable applications with SQL Server and Analysis Services. Throughout his career he has managed database administrator teams and databases of terabyte caliber. Baya's primary areas of expertise are performance tuning, replication and data warehousing. He can be reached at baya@bayasqlconsulting.com.
    Copyright 2007 TechTarget


    Rate this Tip
    To rate tips, you must be a member of SearchSQLServer.com.
    Register now to start rating these tips. Log in if you are already a member.




    DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



    SQL Server Development - .NET, C#, T-SQL, Visual Basic
    HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
    About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
    SEARCH 
    TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

    TechTarget Corporate Web Site  |  Media Kits  |  Site Map




    All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
      TechTarget - The IT Media ROI Experts