If you've ever moved data into or out of SQL Server, chances are you've done it through Microsoft's DTS or Microsoft's Data Transformation Services tools. DTS is an ETL tool (short for extraction, transforming and loading), which allows database administrators to create sets of steps that describe how data is to be transformed. These "packages," as they're called, can then be executed via a number of different programming languages or directly within SQL Server itself.
The idea was and still is a good one, but ultimately too limited -- it remained a glorified if convenient version of the bulk-copy system used in earlier versions of SQL Server. With SQL Server 2005, Microsoft decided to turn DTS into a whole new kind of ETL tool — SQL Server Integration Services (SSIS). Moving data from one database to another or exporting a database in another format are only the most basic things you can do with SSIS. Here area additional reasons to use SSIS:
If you have a data set that isn't "clean," i.e., it hasn't been normalized, SSIS has a broad tool set for cleaning and normalizing data. It includes functions like "fuzzy" matching and grouping, so data that is meant to be the same (but has been input or expressed differently) can be filtered and normalized. One good example of this would be data from different geographic regions that has different numeric formats: The same ideas that are expressed differently, need to be normalized before
Connectivity to Analysis Services cubes.
You can use SSIS to pipe data directly into a cube for data mining. Instead of importing the data into a native database and then mining it there, an SSIS package itself can be used as the data source.
Integration with SQL Server Reporting Services.
As with analysis cubes, SSIS packages can be used as direct data sources for creating reports.
Full .NET support.
This is probably a given, but SSIS programming can be done either natively or through COM object interfaces in any of the .NET languages — or, for that matter, any language that supports COM objects.
The end result is a system where the data sets you import and the packages you create can work more like programmatic objects than static data. This is precisely the idea, and it
allows the user to programmatically integrate the way the reporting services, the analysis services, the data and the database itself all talk to each other. For instance, archived legacy data can continue to be used for trend analysis or data mining even if it's technically offline. Another example is live integration of data from multiple static sets, such as the case I mentioned earlier about data from multiple geographic locations.
Because SSIS is programmable, its functions can be invoked in a broad range of circumstances. For instance, an ASP.NET-driven Web site could be a management front end to run SSIS packages, or one could write a standalone application to connect to SSIS and obtain and refine data for its own use. The idea of being able to drive SQL Server programmatically has been around for a long time, but SSIS makes it possible to talk to everything else that SQL Server uses as well -- all at once and in a unified way. A programmer can also create custom SSIS extensions such as custom data transforms or enumerators. You can reuse them in other applications or other environments entirely.
Packages in SSIS can be written using the Business Intelligence Design Studio, which ships with
SQL Server 2005. The resulting files are mostly XML, so they can be inspected and edited by hand if
need be. Note that some of the functions in SSIS — data mining connectivity and fuzzy grouping and
lookup, to name two — are only available in the Enterprise Edition of SQL Server 2005. SSIS isn't
available at all in SQL Server 2005 Express Edition, so if you create SSIS packages, you can only
use them in the full edition of SQL Server. To that end, when you're creating programmatic
solutions in SSIS, they're really designed to work in other environments that fully support SSIS as
ABOUT THE AUTHOR
Serdar Yegulalp has been writing about computers and information technology for more than 15 years for a variety of publications, including InformationWeek and Windows Magazine.
This was first published in November 2006