SQL Server 2008 Integration Services delivers new features
Integration Services in SQL Server 2008 provides three new features for better performance
and ease of use for administrators and developers. In this tip, SQL Server BI expert Mark Kromer
discusses how data integration has been kicked up a notch with improved scripting capabilities,
data profiling and data flow.
SQL Server has become a very large and all-encompassing enterprise database package over the
years, particularly with the addition of the business intelligence features: Analysis Services, DTS
and now SSIS (SQL Server Integration Services). With SQL Server 2008, you get off-the-shelf
developer tools (BIDS), administrator tools (SSMS), performance tools, systems for data
warehousing, business intelligence, report-building services and so on.
That's
Premium Access
Register now for unlimited access to our premium content across our network of over 70 information Technology web sites.
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States.
Privacy
Dig Deeper
-
People who read this also read...
-
This was first published in October 2008
good and bad news for those whose everyday task is to administer and manage a SQL Server
environment. In this tip, I'll focus on the data integration tool, SQL Server Integration Services
(SSIS), that ships with SQL Server 2008 and three new exciting features. These new features in SSIS
2008 are improved scripting, data profiling and data flow. There are other improvements in SSIS
2008, including change data capture (CDC) and merge capabilities in SQL Server 2008, that I will
not have the space to detail here. To find out more about CDC, visit my previous tip on how the
change data
capture feature improves BI reporting accuracy.
Let's start with the new C# scripting capabilities in SSIS 2008. In the SQL Server 2005 release
of SSIS, scripting tasks were only available in Visual Basic as part of the VSA (Visual Studio for
Applications) component in SSIS. After a lot of customer requests for making C# an optional task
scripting language, Microsoft now makes that available by fully incorporating Visual Studio Tools
for Applications (VSTA) into SSIS as a script task (see Figure 1). Now you have access to the full
.NET Framework and can perform much more complex actions through the scripting task. Drop the
scripting task onto the Control Flow design surface and you will be prompted to select either
Visual C# or Visual Basic as your language of choice. So now SSIS programmers and administrators
are able to move away from VB scripts like this:
'Set new internal variable to DTS/SSIS package variable
Dim fromDate As Date = Me.Variables.FromDate
Do While fromDate <= Me.Variables.ToDate
fromDate = DateAdd(DateInverval.Day, 1, fromDate)
Loop
…to the somewhat more elegant new C# variant:
DateTime fromDate = this.Variables.FromDate; // From SSIS
Variable
While (fromDate <= this.Variables.ToDate)
{
fromDate=fromDate.AddDays(1);
}
Figure 1: New Script Task in SSIS 2008 is VSTA and includes both Visual Basic as well as
Visual C# scripting.
One of the most exciting new components in SSIS 2008 is the data profile task. With this new
feature, you can sample SQL Server data and create profiles of that data to look for patterns, look
for NULLs and duplicate data, among other capabilities. A major drawback of this first data quality
tool to be made available by Microsoft is that the only data source it works with is SQL Server --
at this time.
There are many uses for data profiling, including analyzing statistics from the output and
identifying candidate keys, but for this tip, let's focus on data movement in business intelligence
scenarios.
 |
| More on SQL Server Integration Services: |
|
|
|
|
 |
 |
A data profiler is something that is very helpful in the
time-consuming and tedious exercise of mapping data from one SQL Server database to another or when
looking for the primary candidate for fields such as address, name or product in building
dimensions for your data warehouse.
Many standalone ETL products on the market today include similar functionality and it is almost
a necessity in the toolbox of data integration specialists. It is good to see Microsoft include
data profiling in SSIS. But do not look for SSIS to become a standalone powerhouse ETL tool on its
own anytime soon. This is likely to remain a tool attached to SQL Server and part of the overall
SQL Server package value.
What you see in Figure 2 is the result of a data profiler task that has collected data samples
from a database. When the task has completed in SSIS, you open up this viewer to display the
results. The output from the package will be an XML file, and the best way to view and make sense
of the output is to execute this viewer. The viewer is accessed outside of Visual Studio as a
standalone application that is installed as a client tool with Integration Services. This sample
output in Figure 2 is a look at the column length distribution profile.
Figure 2: Here is sample output from the SSIS package. (Click on image for enlarged
view.)
Next is an enhancement to existing functionality. Microsoft has made buffer and caching
improvements to its SSIS data flow components. Additional asynchronous capabilities in data flow
components will easily provide low-hanging fruit performance gains in your SSIS pipeline. If you're
familiar with SSIS in 2005 and have used data flow components in large, complex packages, then
you've undoubtedly run into situations where performance quickly degrades when piping data. Most
likely you were forced to make design decisions on your pipeline based on emphasizing parallelism.
Well, the good news with SSIS 2008 is you don't need to worry as much about these issues and you
can begin eliminating intricately designed performance-gain tricks.
Most of the bottlenecks I've seen in past SSIS routines occur when it's performing too many
row-level operations or blocking components such as aggregates and sorting or when data needs to be
copied to a new buffer instead of passing the same buffer onto the next transformation. In SSIS
2008, there is now the cache transform, which you can use to persist data throughout your dataflow.
This allows you, for example, to reuse this cache in multiple lookups without requiring the SSIS
engine to repopulate the cache each time. Figure 3 shows setup screens for the cache transform,
which I have attached to my data source:
Figure 3: Create a new cache manager and give it a descriptive name. (Click on image for
enlarged view.)
Figure 4: Now configure the metadata about each column you wish to store in cache. (Click
on image for enlarged view.)
This technique can be used as a nice timesaver and the improvements in data flow components are
significant, but you still must test and benchmark your packages for performance.
SQL Server 2008 and SSIS 2008 are not revolutionary releases for Microsoft. Instead, they expand
upon the breakthrough foundation of SQL Server 2005 and continue to move the product toward
becoming a more complete enterprise-ready, industrial-strength database option for businesses. As
Microsoft moves further away from the old Sybase/SQL Server 7.0 years, SQL Server perceptions
continue to improve, and those in the data warehouse and business intelligence areas will benefit
from the increasing functionality, ease of use and performance that SSIS 2008 has for data
integration. SSIS 2008 is still not a standalone ETL option because it does not include critical
functionality such as managing metadata or high-speed connectors for the plethora of heterogeneous
data sources in the market today.
But another real advantage that SQL Server administrators and developers should be aware of is
that migrating from SQL Server 2005 SSIS packages to SQL Server 2008 packages is going to be
nowhere near the pain involved for those that went through the fun of migrating from
DTS in SQL Server 2000 to SSIS 2005. The fundamentals of the server and engine are, thankfully,
the same this time around.
ABOUT THE AUTHOR
Mark Kromer is an internationally recognized leader in the business intelligence and
database communities, having authored articles and blogs for TDWI, Microsoft, MSDN and Technet.
Mark has more than 15 years' experience in IT, focusing on database technologies and solutions,
including spending the past 2.5 years with Microsoft as the lead product manager for business
intelligence customer solutions. Check out his blog at http://blogs.msdn.com/makrom.
MEMBER FEEDBACK TO THIS TIP
Do you have comments on this tip? Let
us know.
Disclaimer:
Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.
Join the conversationComment
Share
Comments
Results
Contribute to the conversation