Home > SQL Server Tips > Data Warehousing and Business Intelligence > An introduction to data warehousing, part 1: The basics
SQL Server Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

DATA WAREHOUSING AND BUSINESS INTELLIGENCE

An introduction to data warehousing, part 1: The basics


Baya Pavliashvili, Contributor
02.14.2001
Rating: --- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


Data warehousing (DW) has been around for approximately 20 years. Even so, many people do not have a good understanding of what DW is for, what it is supposed to do, and what it takes to build one. DW has become a popular term, largely due to the increasing number of companies that would like to take advantage of their historical data to make better decisions. Some credit must also go to Microsoft because SQL Server has made building data warehouses easier than ever. Let's talk about DW in general first.

The goal of DW is to provide your company with an easy and quick look at its historical data. Advanced OLAP (on-line analytical processing) tools let DW users generate reports at a click of a mouse and look at the company's performance from various angles. How far back you look at the data depends on the company. If you're in a manufacturing plant producing thousands of parts a day, it might make sense to build a DW to look at performance each week. On the other hand, if you have a handful of customers each day in a rental business you might wish to look at longer periods of time--for example, the past six months or so at a time. If you're into doing trends analysis you'll most likely want your reports spanning at least several years.

The process of building a DW can be fairly complicated or straightforward. A lot depends on where your data currently reside. Regardless of your data sources, building any DW will involve the following steps:

  1. Extracting the transactional data from the data sources into a staging area
  2. Transforming the transactional data
  3. Loading the transformed data into a dimensional database
  4. Building pre-calculated summary values to speed up report generation
  5. Building (or purchasing) a front-end reporting tool

The first step, extracting the data itself, can be the most difficult one to accomplish. Companies that have been in business for a long time (and therefore could benefit the most from a DW) most likely have kept their data in some type of a mainframe or proprietary system. Depending on the type of system, extracting these transactional data can be very difficult. Data Transformation Services (DTS), which is part of SQL Server 7.0 and 2000, lets you import and export data from any OLE DB or ODBC compliant database as long as you have an appropriate provider. However, the reality is that most of the time you won't have the luxury of having such providers in place.

An important follow-up step after data extraction is transforming and relating the data from multiple data sources. The IT environment is not homogenous. Most companies will have their data spread out in a number of various database management systems: MS Access, MS SQL Server, Oracle, Sybase, etc. Many companies will also have much of their data in flat files, spreadsheets, mail systems and other types of data stores. When building a data warehouse you need to relate data from all of these sources and build some type of a staging area that can handle data extracted from any of these source systems. The staging area can be a group of files--especially if you're getting extracts from mainframe systems--or, better yet, it will be a semi-relational database residing on the same server where the dimensional database will be hosted. I call the staging database "semi-relational" because it does not have to be completely normalized.

The third step is building a dimensional database and populating it with data from the source systems. Most of the modern transactional systems are built using relational database models. A relational model is optimized for tracking a large number of transactions and storing the transactional data in an efficient format. However, the relational format is not very efficient when it comes to building reports with summary and aggregate values. The dimensional approach, on the other hand, provides a way to improve query performance without affecting data integrity. However, the query performance improvement comes with a storage space penalty--a dimensional database will generally take up much more space than its relational counterpart. These days, storage space is fairly inexpensive and most companies will be able to afford large hard disks with a minimal effort.

Once you have built a dimensional model you have to find a way to fit the relational data into the format you've just built. This step is not as trivial as it might sound. It might mean combining several fields together, looking up values in one data source depending on the values in another data source, building business logic code to derive values for certain columns, and much more.

The data transformations can be done at one of the two stages: while extracting data from the sources and moving it to the staging area, or while loading data into the dimensional database.

The next step--generating the pre-calculated summary values--has been tremendously simplified by SQL Server Analysis Services (or OLAP Services as it was referred to in SQL Server 7.0). If you've populated your dimensional database and you've built your dimensions correctly, SQL Server Analysis Services does all the aggregate generation work for you. Keep in mind however, that depending on the number of dimensions you have in your DW, building aggregations can take a long time. This is when you need to decide on the dimensional model you wish to use: ROLAP (relational OLAP), MOLAP (Multidimensional OLAP), or HOLAP (Hybrid OLAP). The ROLAP model builds additional tables for storing the aggregates--but this takes much more storage space than a dimensional database, so be careful! The MOLAP model stores the aggregations as well as the data in multidimensional format, which is far more efficient than ROLAP. However, there are some "gotchas" that you need to be aware of when using MOLAP and I'll try to cover these in subsequent articles. The HOLAP approach keeps the data in the relational format, but builds aggregations in multidimensional format, so it's a combination of ROLAP and MOLAP.

Once you've built the dimensional database and the aggregations you can decide how sophisticated your reporting tools need to be. The cost of building a custom reporting (and OLAP) tool will usually outweigh the purchase price of a third party tool. There are several major vendors on the market that have top-notch analytical tools. Consider purchasing one of these suites before delving into the process of developing your own software because reinventing the wheel is not always beneficial or affordable. Building OLAP tools is not a trivial exercise.

This article only covered the very tip of the iceberg. In future articles I'd like to provide more details about each of these steps. Considering the fact that the smallest DW project takes at least six months to complete, you need to do your homework and learn as much as possible about data warehousing before going ahead with such a costly project.

Part 1: The basics
Part 2: Data extraction
Part 3: Dimensional modeling

About the Author

Baya Pavliashvili is a MCSE, MCSD and MCDBA with three years experience with SQL Server.

For More Information

  • What do you think about this tip? E-mail the editor at tdichiara@techtarget.com with your feedback.
  • Have a tip to offer your fellow DBAs and developers? The best tips submitted will receive a cool prize--submit your tip today!
  • Ask your technical DBMS questions--or help out your peers by answering them--in our live discussion forums.
  • Check out our Ask the Experts feature: Our SQL, database design, Oracle, SQL Server, DB2, metadata, and data warehousing gurus are waiting to answer your toughest questions.

Rate this Tip
To rate tips, you must be a member of SearchSQLServer.com.
Register now to start rating these tips. Log in if you are already a member.




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



RELATED CONTENT
SQL Server Business Intelligence (BI) and Data Warehousing
Programming report generation with SQL Server Reporting Services 2008
Using the Pivot transformation in SQL Server Integration Services
DBA career paths could lead to business intelligence
Are data warehouses made for the cloud?
Q&A: Business intelligence gets a facelift in SQL Server 2008 R2
Project Gemini gets a new name, Madison earns buzz
Speed up reports in SQL Server Reporting Services with caching
Data Transformation Services vs. SSIS: The key differences
Using package configurations in SQL Server Integration Services (SSIS)
How SQL Server 2008 components impact SharePoint implementations

Data Warehousing and Business Intelligence
Programming report generation with SQL Server Reporting Services 2008
Recommended practices for SQL Server Analysis Services aggregations
Creating and managing SQL Server Analysis Services partitions
Sharing SSAS cube data in SharePoint with PerformancePoint Server 2007
New data profiling tools in SQL Server 2008
Utilize SSAS for data predictions and classification using Excel
SQL Server 2008 Integration Services delivers new features
Parent-child dimensions in SQL Server 2005 with Analysis Services MDX
Change data capture in SQL Server 2008 improves BI reporting accuracy
Manage traces in SQL Server 2005 Analysis Services with XMLA commands

SQL Server Business Intelligence (BI) and Data Warehousing
Programming report generation with SQL Server Reporting Services 2008
Using the Pivot transformation in SQL Server Integration Services
DBA career paths could lead to business intelligence
Are data warehouses made for the cloud?
Q&A: Business intelligence gets a facelift in SQL Server 2008 R2
Project Gemini gets a new name, Madison earns buzz
Speed up reports in SQL Server Reporting Services with caching
Data Transformation Services vs. SSIS: The key differences
Using package configurations in SQL Server Integration Services (SSIS)
How SQL Server 2008 components impact SharePoint implementations

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
data aggregation  (SearchSQLServer.com)
data preprocessing  (SearchSQLServer.com)
data warehouse  (SearchSQLServer.com)
FileMaker  (SearchSQLServer.com)
GIS  (SearchSQLServer.com)
MOLAP  (SearchSQLServer.com)
pivot table  (SearchSQLServer.com)
Quiz: SQL Server 2000  (SearchSQLServer.com)
SQL  (SearchSQLServer.com)
T-SQL  (SearchSQLServer.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



SQL Server Development - .NET, C#, T-SQL, Visual Basic
HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts