SQL Server Integration Services, or SSIS, provides a number of different mechanisms to create and pull data from...
By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
a data source. One is the ExecuteSQL task, which lets you use a T-SQL statement, much as you might pass a bit of T-SQL from a front-end application to SQL Server on the backend. The T-SQL in question can be a full statement or a reference to an existing stored procedure.
T-SQL, however, is not the only way to get data into SSIS.
There's also the Data Flow task, which leads people to ask: Why use T-SQL in SSIS? Why not just use data flows? What's the advantage to T-SQL, if any? I went searching for a response to those questions, and found a number of very well explained answers at Jamie Thomson's blog at Conchango. I'll rephrase a few of them here, along with some of my own commentary gleaned from personal experience.
Using an existing piece of T-SQL or stored procedure is a timesaver, rather than recreating the whole thing in a data flow. If you have an existing stored procedure that took a good deal of sweat and concentration to build, there's no point in tearing it down and rebuilding it; you can simply use it as-is. There are some slight differences in the syntax for evoking a stored procedure (mostly the way parameters are passed), but very little that would prevent existing stored procedures from being reused in ExecuteSQL. Later, I will go into a couple of things that might cause problems.
Your SQL Server programmers don't have to learn much of anything new to make their code work in SSIS. They can write T-SQL code, and it can simply be dropped into SSIS workflows, with relatively little modification. If you're taking over someone else's work and migrating it progressively into SSIS, you may be more inclined to reuse the existing work instead of trying to re-engineer something from scratch.
The majority of work is done wherever your database is, instead of where SSIS itself is running (which may not be the same machine). This may not be an advantage, but it is a behavior worth noting. It can work for you or against you.
T-SQL uses transactions within the database itself instead of across MSDTC (as SSIS does).
Now, here are some reasons why T-SQL in an SSIS data flow may not be a good idea:
Some stored procedures do not expose data in a preview. The simple reason is that it's not always possible to predict what columns a given stored procedure will produce when it's run. That said, some stored procedures can't be used in place of a data flow if you need a predictable "output contract" for that data flow. However, you can use a view to expose a data flow preview. The view is bound tightly to the schema(s) it works with and doesn't change, unless you explicitly recast it (i.e., you edit it). So if you have an existing view that you want to use as a data source in SSIS, you'll have previewing as a bonus feature.
- SSIS data flows are better for more complex, multi-step operations. These operations include jobs that require:
- a lot of programmatic work
- aggregation from different data sources or types
- structured exception handling
- additional transformation
- or, jobs that can't be accessed from SQL itself
These are things that are not always done elegantly or efficiently in the context of T-SQL.
- SSIS data flows are self-documenting, which is another possible advantage of SSIS data flows over T-SQL. But, it's true only to a point. If you don't have any understanding of the schema being accessed, you're just as likely to be in the dark. On the other hand, stored procedures and T-SQL in general are not self-documenting at all -- unless whoever wrote them took the time to document each in detail (and how often does that happen?).
From all of this, it's possible to derive two simple rules about using T-SQL versus data flow:
T-SQL is best suited for operations where you're simply gleaning data from an existing set, where the process for doing so is not likely to change anytime soon and doesn't involve a lot of additional transformation.
- The Data Flow task is best for when you're devising an entirely new data transformation -- something where existing T-SQL (or T-SQL itself) won't comfortably do the job. This is not something you can really gauge unless you've worked a great deal with T-SQL and know its limitations. So, you need to know at least as much about T-SQL as you do about SSIS in order to take advantage of both.
ABOUT THE AUTHOR
Serdar Yegulalp has been writing about computers and information technology for more than 15 years for a variety of publications, including InformationWeek and Windows Magazine.