Home > SQL Server Tips > Data Warehousing and Business Intelligence > An introduction to data warehousing, part 2: Data extraction
SQL Server Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

DATA WAREHOUSING AND BUSINESS INTELLIGENCE

An introduction to data warehousing, part 2: Data extraction


Baya Pavliashvili
03.14.2001
Rating: --- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


In my previous article, I gave you a quick crash course on the steps you need to take to build a data warehouse. This article will dive into the details of the first step: extracting data from various transactional systems.

So your company or your client needs a data warehouse (DW). Where do you start? Believe it or not, the users of your data warehouse will be the ones that do most of the work. As a database architect, you're there to provide the technical expertise--and to do this, you'll need a good understanding of the business. In fact, by the time the DW project is completed, you'll probably know more about the business than the business owners themselves (whether you want to or not). If you have a luxury of having a business analyst (BA) on the project, that's where they come in handy. However, keep in mind that as a database architect and administrator, you'll need to have as good a knowledge of business as any BA.

First, determine who your DW users will be. Most of the time, these will be upper-level managers. Let's be honest: these types of folks can often be intimidating to work with. However, if you can demonstrate that you can help them manage their business better, they will gladly find a good portion of their precious time to talk to you. The types of questions to ask depends on the project goals. However, with any DW project you need to know the following:

Let's stop here for a minute and think of the tasks you might have to undertake to extract the data.

First and foremost, you'll have to speak with folks who are responsible for each source system (remember step 2?). Most of the time, after explaining your mission, they'll be happy to grant you access to their systems. However, at times you might have to rely on upper management to get such permissions.

Next, you will have to relate data in multiple systems and try to come up with some sort of mapping of data fields. For instance, the


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


RELATED CONTENT
SQL Server Business Intelligence (BI) and Data Warehousing
Ensuring high availability of SSAS databases
Building a data warehousing and BI solution
An overview of SQL Server Report Builder 2.0
An introduction to SQL Server data warehousing concepts
Sharing SSAS cube data in SharePoint with PerformancePoint Server 2007
Microsoft SQL Server Integration Services primer
A short history of SQL Server Integration Services
SQL Server Reporting Services Fast Guide
New data profiling tools in SQL Server 2008
Utilize SSAS for data predictions and classification using Excel

Data Warehousing and Business Intelligence
Sharing SSAS cube data in SharePoint with PerformancePoint Server 2007
New data profiling tools in SQL Server 2008
Utilize SSAS for data predictions and classification using Excel
SQL Server 2008 Integration Services delivers new features
Parent-child dimensions in SQL Server 2005 with Analysis Services MDX
Change data capture in SQL Server 2008 improves BI reporting accuracy
Manage traces in SQL Server 2005 Analysis Services with XMLA commands
Tutorial: SQL Server 2005 Analysis Services
Open SSIS packages without validation using these SQL properties
How to process SQL Server 2005 Analysis Services for data availability

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
data aggregation  (SearchSQLServer.com)
data preprocessing  (SearchSQLServer.com)
data warehouse  (SearchSQLServer.com)
FileMaker  (SearchSQLServer.com)
GIS  (SearchSQLServer.com)
MOLAP  (SearchSQLServer.com)
pivot table  (SearchSQLServer.com)
Quiz: SQL Server 2000  (SearchSQLServer.com)
SQL  (SearchSQLServer.com)
T-SQL  (SearchSQLServer.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary


sales information might be kept in one source and the inventory data might be in another data store. If the company needs to analyze their expenses against their revenues you'll have to grab data from both systems and come up with a way to compare inventory costs to the revenue that was generated against that cost.

At this point it is helpful to build an entity relationship diagram (ERD). If you've ever built a relational database before, you should be familiar with the ERD concept. However, the ERD for a DW differs in that it is not really concerned with primary and foreign key relationships. More important are cardinality rules: for example, each store might receive inventory supplies from multiple offices, or each office provides supplies to multiple stores. Therefore, you'll have to build a mapping table that tracks which stores receive supplies from each office. More importantly, this table will tell you which costs to compare to revenues from each store.

When analyzing reporting requirements, you're very likely to find that some of the requested data is not in the systems that you've been advised to utilize for the DW. At this point the list of all transactional systems utilized (from step 1) within the company is your best friend. Go back through this list and find a system that contains the necessary data elements. If you cannot find what you're looking for, you have no choice but to go back to step 1 and interview the business owners again.

When you do find a system that contains the fields you need, try to get at least a small sample data extract from that system. Understanding the data format is crucial and will save you much time when coding your own extract routines and the dimensional data model.

Once you've identified all the source systems--this is not a trivial task and might take anywhere from several weeks to several months, depending on the size of the project--you're ready to build data extract utilities. If you are a strictly database person, not knowing much about programming languages other than SQL, it's time to call your fellow developers. Data Transformation Services with SQL Server 2000 (and 7.0), for example, will help you in many instances, but more often than not, you will have to build your own application that connects to the third party system and extracts certain portions of the data.

Most of the time you'll have to extract data from a mainframe system or a client-server application. However, it's not unusual to find much of company's data located in spreadsheets, desktop databases, or mail systems. If you were looking for a developer to write your extraction routines, experience with one of the source systems would be a huge plus.

Keep in mind that the DW only needs those data elements that need to appear on the reports--nothing more, nothing less. The implication here is that if you extract all of the data from each source system you won't build anything but a very large, unmanageable mess. Conversely, if you don't get all the important data elements, then two weeks after your DW is in production, the business owners will ask for the missing fields and you'll have to rebuild your data extraction utilities. That is why performing step 4 thoroughly is so important. Be sure to allow sufficient time for this step, since not having all data elements will end up costing much more than extra week or two of requirements gathering.

When you get your hands on the extracted data, you have to have a way of storing it in some kind of a staging area. It is best to use the same database engine for the staging area as you will use for the final DW. This staging area should be a relational database, which has all the appropriate constraints (primary and foreign keys) and indexes defined appropriately. However, don't spend much time normalizing the staging database--after all, you won't ever need to worry about multiple users inserting and updating this data. Instead, the staging database should be de-normalized and optimized for querying.

Some data warehousing projects use the staging database as a base for building cubes. Generally, this is not a good idea, because the data in the staging db is in the same shape as in the source systems, which is not optimized for report generation. After data from the source systems is in the staging area you can build your dimensional model and populate it with data. The dimensional model is what the cubes need to be based on. I will discuss the ABCs of dimensional modeling in the next article.

So in a nutshell, the first step of any data warehousing project is a combination of business analysis, staging database architecture, and development of data extraction utilities. Laying some good groundwork at this point will save you many headaches and dollars at later stages of your project.

Part 1: The basics
Part 2: Data extraction
Part 3: Dimensional modeling

About the Author

Baya Pavliashvili is a MCSE, MCSD and MCDBA with three years experience with SQL Server.

For More Information


Rate this Tip
To rate tips, you must be a member of SearchSQLServer.com.
Register now to start rating these tips. Log in if you are already a member.


Submit a Tip




DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



SQL Server Development - .NET, C#, T-SQL, Visual Basic
HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts