Home > SQL Server Tips > Data Warehousing and Business Intelligence > An introduction to data warehousing, part 2: Data extraction
SQL Server Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

DATA WAREHOUSING AND BUSINESS INTELLIGENCE

An introduction to data warehousing, part 2: Data extraction


Baya Pavliashvili
03.14.2001
Rating: --- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


In my previous article, I gave you a quick crash course on the steps you need to take to build a data warehouse. This article will dive into the details of the first step: extracting data from various transactional systems.

So your company or your client needs a data warehouse (DW). Where do you start? Believe it or not, the users of your data warehouse will be the ones that do most of the work. As a database architect, you're there to provide the technical expertise--and to do this, you'll need a good understanding of the business. In fact, by the time the DW project is completed, you'll probably know more about the business than the business owners themselves (whether you want to or not). If you have a luxury of having a business analyst (BA) on the project, that's where they come in handy. However, keep in mind that as a database architect and administrator, you'll need to have as good a knowledge of business as any BA.

First, determine who your DW users will be. Most of the time, these will be upper-level managers. Let's be honest: these types of folks can often be intimidating to work with. However, if you can demonstrate that you can help them manage their business better, they will gladly find a good portion of their precious time to talk to you. The types of questions to ask depends on the project goals. However, with any DW project you need to know the following:

  1. What are the transactional systems used by the company? Whether or not the management wishes to use any of these systems for the DW effort does not matter. Often they don't realize where the most valuable data of the company resides. After all, that's your job.
  2. Who owns the data? This is a tough one since some of the transactional systems could have been purchased from a third party vendor who does not wish to give out their data structure or the data format. Besides, there might be several people responsible for data upkeep within an organization. When they see a data warehouse architect, the first thing they do is update their resumes--they wonder if you're a sign of things to come. In any case, this is a very important steps so don't take it lightly. Be sure to get not only the names, but also the job titles and duties of each person or organizational unit responsible for data.
  3. What types of reports are expected to be derived from a data warehouse? This could be by far the most exciting step. This is where you get to interview all the future users of DW and get to know all of their expectations. Do not make a mistake of telling them what a DW could do for them! You're just gathering requirements, so the final result of the project is not yet determined. Be sure to document all requirements with the name and title of the person that requested the report.
  4. After you've got the data sources and types of reports you need, it's time to go after the data At this point, it's a good idea to generate a list of all the data fields (columns) that you will need to satisfy reporting requirements.
  5. Once you have a list of fields, look at the source systems. In some cases, if you're lucky, extracting data will be a piece of cake--all you have to do is some formatting and the data is ready to be loaded into your staging area. But approximately 95% of the time this won't be the case.

Let's stop here for a minute and think of the tasks you might have to undertake to extract the data.

First and foremost, you'll have to speak with folks who are responsible for each source system (remember step 2?). Most of the time, after explaining your mission, they'll be happy to grant you access to their systems. However, at times you might have to rely on upper management to get such permissions.

Next, you will have to relate data in multiple systems and try to come up with some sort of mapping of data fields. For instance, the sales information might be kept in one source and the inventory data might be in another data store. If the company needs to analyze their expenses against their revenues you'll have to grab data from both systems and come up with a way to compare inventory costs to the revenue that was generated against that cost.

At this point it is helpful to build an entity relationship diagram (ERD). If you've ever built a relational database before, you should be familiar with the ERD concept. However, the ERD for a DW differs in that it is not really concerned with primary and foreign key relationships. More important are cardinality rules: for example, each store might receive inventory supplies from multiple offices, or each office provides supplies to multiple stores. Therefore, you'll have to build a mapping table that tracks which stores receive supplies from each office. More importantly, this table will tell you which costs to compare to revenues from each store.

When analyzing reporting requirements, you're very likely to find that some of the requested data is not in the systems that you've been advised to utilize for the DW. At this point the list of all transactional systems utilized (from step 1) within the company is your best friend. Go back through this list and find a system that contains the necessary data elements. If you cannot find what you're looking for, you have no choice but to go back to step 1 and interview the business owners again.

When you do find a system that contains the fields you need, try to get at least a small sample data extract from that system. Understanding the data format is crucial and will save you much time when coding your own extract routines and the dimensional data model.

Once you've identified all the source systems--this is not a trivial task and might take anywhere from several weeks to several months, depending on the size of the project--you're ready to build data extract utilities. If you are a strictly database person, not knowing much about programming languages other than SQL, it's time to call your fellow developers. Data Transformation Services with SQL Server 2000 (and 7.0), for example, will help you in many instances, but more often than not, you will have to build your own application that connects to the third party system and extracts certain portions of the data.

Most of the time you'll have to extract data from a mainframe system or a client-server application. However, it's not unusual to find much of company's data located in spreadsheets, desktop databases, or mail systems. If you were looking for a developer to write your extraction routines, experience with one of the source systems would be a huge plus.

Keep in mind that the DW only needs those data elements that need to appear on the reports--nothing more, nothing less. The implication here is that if you extract all of the data from each source system you won't build anything but a very large, unmanageable mess. Conversely, if you don't get all the important data elements, then two weeks after your DW is in production, the business owners will ask for the missing fields and you'll have to rebuild your data extraction utilities. That is why performing step 4 thoroughly is so important. Be sure to allow sufficient time for this step, since not having all data elements will end up costing much more than extra week or two of requirements gathering.

When you get your hands on the extracted data, you have to have a way of storing it in some kind of a staging area. It is best to use the same database engine for the staging area as you will use for the final DW. This staging area should be a relational database, which has all the appropriate constraints (primary and foreign keys) and indexes defined appropriately. However, don't spend much time normalizing the staging database--after all, you won't ever need to worry about multiple users inserting and updating this data. Instead, the staging database should be de-normalized and optimized for querying.

Some data warehousing projects use the staging database as a base for building cubes. Generally, this is not a good idea, because the data in the staging db is in the same shape as in the source systems, which is not optimized for report generation. After data from the source systems is in the staging area you can build your dimensional model and populate it with data. The dimensional model is what the cubes need to be based on. I will discuss the ABCs of dimensional modeling in the next article.

So in a nutshell, the first step of any data warehousing project is a combination of business analysis, staging database architecture, and development of data extraction utilities. Laying some good groundwork at this point will save you many headaches and dollars at later stages of your project.

Part 1: The basics
Part 2: Data extraction
Part 3: Dimensional modeling

About the Author

Baya Pavliashvili is a MCSE, MCSD and MCDBA with three years experience with SQL Server.

For More Information

  • What do you think about this tip? E-mail the editor at tdichiara@techtarget.com with your feedback.
  • Have a tip to offer your fellow DBAs and developers? The best tips submitted will receive a cool prize--submit your tip today!
  • Ask your technical DBMS questions--or help out your peers by answering them--in our live discussion forums.
  • Check out our Ask the Experts feature: Our SQL, database design, Oracle, SQL Server, DB2, metadata, and data warehousing gurus are waiting to answer your toughest questions.

Rate this Tip
To rate tips, you must be a member of SearchSQLServer.com.
Register now to start rating these tips. Log in if you are already a member.


Submit a Tip




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



RELATED CONTENT
SQL Server Business Intelligence (BI) and Data Warehousing
Programming report generation with SQL Server Reporting Services 2008
Using the Pivot transformation in SQL Server Integration Services
DBA career paths could lead to business intelligence
Are data warehouses made for the cloud?
Q&A: Business intelligence gets a facelift in SQL Server 2008 R2
Project Gemini gets a new name, Madison earns buzz
Speed up reports in SQL Server Reporting Services with caching
Data Transformation Services vs. SSIS: The key differences
Using package configurations in SQL Server Integration Services (SSIS)
How SQL Server 2008 components impact SharePoint implementations

Data Warehousing and Business Intelligence
Programming report generation with SQL Server Reporting Services 2008
Recommended practices for SQL Server Analysis Services aggregations
Creating and managing SQL Server Analysis Services partitions
Sharing SSAS cube data in SharePoint with PerformancePoint Server 2007
New data profiling tools in SQL Server 2008
Utilize SSAS for data predictions and classification using Excel
SQL Server 2008 Integration Services delivers new features
Parent-child dimensions in SQL Server 2005 with Analysis Services MDX
Change data capture in SQL Server 2008 improves BI reporting accuracy
Manage traces in SQL Server 2005 Analysis Services with XMLA commands

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
data aggregation  (SearchSQLServer.com)
data preprocessing  (SearchSQLServer.com)
data warehouse  (SearchSQLServer.com)
FileMaker  (SearchSQLServer.com)
GIS  (SearchSQLServer.com)
MOLAP  (SearchSQLServer.com)
pivot table  (SearchSQLServer.com)
Quiz: SQL Server 2000  (SearchSQLServer.com)
SQL  (SearchSQLServer.com)
T-SQL  (SearchSQLServer.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



SQL Server Development - .NET, C#, T-SQL, Visual Basic
HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts