Home > SQL Server Tips > Data Warehousing and Business Intelligence > New data profiling tools in SQL Server 2008
SQL Server Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

DATA WAREHOUSING AND BUSINESS INTELLIGENCE

New data profiling tools in SQL Server 2008


Mark Kromer
01.26.2009
Rating: -4.14- (out of 5)


Expert advice on database administration
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


If you want to use your company's vast amount of data to design better business intelligence (BI) strategies, you need to understand the nature and context of the data source that generates this information. Before you can create computations, summaries and time-phased aggregates, it's important to spend the time up front by using data-profiling tools to analyze your data, better understand your data source and prevent data-quality problems from emerging later on.

Generally, the best BI architects and the most valuable BI systems are those where 75% or more of the project time is spent on data collection, data cleansing and all aspects of extract, transform and load (ETL). This level of commitment means observing approved common practices, including using a data-profiling tool to examine data sources, identifying patterns in data fields and recognizing the best candidates for keys. In particular, data profiling in SQL Server is an important method to analyze information.

[IMAGE]
Figure 1. Data profiling is used prior to writing data integration routines.
Click on image for larger version

Moving beyond building your own data profiling tools
Prior to SQL Server 2008, only companies such as DataFlux and IBM offered complete data-profiling options. You could also, however, hand-code your own data profiler in T-SQL and .NET. In fact, resourceful database administrators have created their own processes to perform identification and profiling of SQL Server data sources. Many T-SQL programmers have scripted personalized routines to identify null counts in address tables, out-of-sequence patterns in zip codes and deduplication routines.

A library post on data quality in SQL Server 2005 is ...


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



RELATED CONTENT
Data Warehousing and Business Intelligence
Recommended practices for SQL Server Analysis Services aggregations
Creating and managing SQL Server Analysis Services partitions
Sharing SSAS cube data in SharePoint with PerformancePoint Server 2007
Utilize SSAS for data predictions and classification using Excel
SQL Server 2008 Integration Services delivers new features
Parent-child dimensions in SQL Server 2005 with Analysis Services MDX
Change data capture in SQL Server 2008 improves BI reporting accuracy
Manage traces in SQL Server 2005 Analysis Services with XMLA commands
Tutorial: SQL Server 2005 Analysis Services
Open SSIS packages without validation using these SQL properties

SQL Server Business Intelligence (BI) and Data Warehousing
Are data warehouses made for the cloud?
Q&A: Business intelligence gets a facelift in SQL Server 2008 R2
Project Gemini gets a new name, Madison earns buzz
Speed up reports in SQL Server Reporting Services with caching
Data Transformation Services vs. SSIS: The key differences
Using package configurations in SQL Server Integration Services (SSIS)
How SQL Server 2008 components impact SharePoint implementations
Achieving high availability and disaster recovery with SharePoint databases
Recommended practices for SQL Server Analysis Services aggregations
SQL Server Reporting Services 2008 offers faster speeds, new variations

Microsoft SQL Server 2008 (Katmai)
Q&A: SQL Server 2008 a better fit for consolidation
End of life comes for SQL Server 2005 SP2, 2008
Microsoft releases SQL Server 2008 R2 CTP
What's new for installation with SQL Server 2008?
SQL Server Reporting Services 2008 offers faster speeds, new variations
Microsoft SQL Server 2008 Learning Guide
A first look at Microsoft SQL Server 2008 R2
Understanding transparent data encryption in SQL Server 2008
Working with sparse columns in SQL Server 2008
Implementing SQL Server 2008 FILESTREAM functionality

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
data aggregation  (SearchSQLServer.com)
data preprocessing  (SearchSQLServer.com)
data warehouse  (SearchSQLServer.com)
FileMaker  (SearchSQLServer.com)
GIS  (SearchSQLServer.com)
MOLAP  (SearchSQLServer.com)
pivot table  (SearchSQLServer.com)
Quiz: SQL Server 2000  (SearchSQLServer.com)
SQL  (SearchSQLServer.com)
T-SQL  (SearchSQLServer.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary


one of my favorites. It was written before Microsoft increased focus on data profiling, and it provides a great example of the complexities involved in hand-coding your own solution.

[IMAGE]
Figure 2. A screenshot from this manually created data profiler.
Click on image for larger version

The Data Profiling task in SQL Server 2008, by contrast, is a Microsoft-produced product that increases efficiency and offers myriad new capabilities. It comes as part of SQL Server Integration Services (SSIS), and it eliminates the need to worry about hand-coding specifics. In fact, the pre-built profiles in SSIS are quite similar to the constraints in the manually created expression builder:

As is the case with most SSIS capabilities, the Data Profiler task works only with SQL Server data sources. It is still, however, an effective tool to reduce the number of tedious processes involved in the data collection stages of building a BI solution. Without profiling data sources and providing this level of data inspection, you leave your business solution open to incorrect data, invalid reporting and bad decision making.

Using the SSIS 2008 Data Profiling task and Data Profile Viewer
Once you grasp the importance of profiling data from your SQL Server data sources, it's clear how the SSIS 2008 Data Profiling task and associated Data Profile Viewer can help. Below is an example from the sample database AdventureWorks, where the Column Value Distribution Profiles tab is highlighted:

[IMAGE]
Figure 3. A sample of Column Value Distribution Profiles tab from the AdventureWorks database.
Click on image for larger version

SSIS includes this group of profiles out-of-the-box, and it proves useful to examine data contents in table fields. In this example from the Data Profile Viewer, I highlighted the City field from the Address table. The top-right-hand panel on the window indicates that the SSIS data profile task found 575 unique values in the City field, and the bottom pane demonstrates the number of occurrences of each unique value. Having this information easily accessible frees time that you would otherwise spend figuring it out manually. In turn you can spend more time comparing the value distributions to determine the best source for your staging tables.

[IMAGE]
Figure 4. The Data Profiling task provides the following out-of-the-box profiles.
Click on image for larger version

To use the Data Profiling task with your databases, create a new SSIS project. You need SQL Server 2008, although the task can be run against data stored in older versions of SQL Server. Following the sample screenshot below, connect your data source from a data flow task to a data profiling task:

[IMAGE]
Figure 5. Start with a simple SSIS package of a data source and a target of the data profiling task.
Click on image for larger version

The Data Profiling task wizard prompts you to enter information about the source table, and then you will have the option to select from the list of profiles. In the sample figures for this tip, I asked SQL Server to execute each of the profiles available. You can then direct the output to an XML file, which can be opened from the Data Profile Viewer shown below.

[IMAGE]
Figure 6. A SSIS 2008 Data Profile Viewer tool.
Click on image for larger version

You can execute the Data Profile Viewer from the list of SQL Server client tools, but it is also accessible from the SQL Server Integration Services program group. You may notice that the screen shows the executed profiles and the table. The Viewer tool does so by parsing the results of the XML file that has been created by running the SSIS package shown above. For this sample data, I ran the package from the Business Intelligence Development Studio in debug mode and in just a few seconds, the AdventureWorks table's results displayed.

Improvement trend for Microsoft BI features and functionality
This example demonstrates that the new Data Profiler task in SSIS 2008 is Microsoft's effort to provide a complete, native toolkit to BI professionals. Microsoft's advances, however, mainly serve to strengthen its own capabilities, so do not expect SSIS- and its BI-centric additions to emerge as a viable replacement for Informatica, DataStage and other preferred options for managing large, heterogeneous data centers. Nevertheless, a feature such as native data profiling does make it possible to build stronger data warehouse and BI solutions in SQL Server without using a third-party technology for filling gaps in functionality.

ABOUT THE AUTHOR:   

[IMAGE]Mark Kromer is an internationally recognized leader in the business intelligence and database communities, having authored articles and blogs for TDWI, Microsoft, MSDN and Technet. Mark has more than 15 years' experience in IT, focusing on database technologies and solutions, including spending the past 2.5 years with Microsoft as the lead product manager for business intelligence customer solutions. In the past, he was responsible for training classes, partner readiness and product development of BI initiatives at Microsoft, Agilent Technologies and Cingular Wireless. He is currently the lead product manager for enterprise BI reporting solutions at Primavera Systems. Check out his blog at http://blogs.msdn.com/makrom



Rate this Tip
To rate tips, you must be a member of SearchSQLServer.com.
Register now to start rating these tips. Log in if you are already a member.




DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



SQL Server Development - .NET, C#, T-SQL, Visual Basic
HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts