If most of your data resides in SQL Server databases and your source data is readily accessible, chances are you’re using SQL Server Integration Services (SSIS) to perform your extract, transform and load (ETL) operations. SSIS comes free with SQL Server, it’s easy to install and it’s an excellent ETL tool.
Yet, many organizations are going with third-party ETL tools, even licensed SQL Server shops. Some might be familiar with a specific product; others have a lack of knowledge about SSIS or have data management needs that SSIS cannot easily address, such as enterprise resource planning (ERP) and customer relationship management (CRM) capabilities.
There’s certainly no shortage of products out there. In fact, there are so many ETL tools that selecting one can be a daunting task. With that in mind, I looked at five of the more popular products for this ETL tools comparison.
In selecting these tools, I reviewed numerous online discussion threads and article comments to get a sense of which products are generating the biggest buzz. I eliminated any tools affiliated with a specific relational database management system, such as Oracle Warehouse Builder. Note, however, I’m not endorsing or recommending any one product, nor am I presenting them in any particular order. I’m simply providing an overview of what I’ve discovered.
Few products garner as much attention as Informatica PowerCenter. With its easy-to-use interface and service-oriented architecture, PowerCenter supports data migration and consolidation, replication and synchronization, and data warehousing. You can use PowerCenter to retrieve data from most business systems and deliver that data throughout the enterprise in batches or in near real time, real time or on demand.
The product also supports comprehensive reporting and auditing capabilities. In addition, development teams throughout the organization can share and reuse data definitions and data mapping logic across projects and platforms. Because of PowerCenter’s metadata-driven architecture, definitions can be standardized, and technical and business metadata can be integrated into a single data integration catalog.
SAS Enterprise Data Integration Server
Data integration is the name of the game with SAS Enterprise Data Integration Server. With its integrated development and workflow structure, this ETL tool lets you access a variety of databases, enterprise applications, mainframe data sources, message-oriented middleware, structured and semi-structured data, static and streaming Web data, and numerous file types -- all using both native formats and open standards.
You can also migrate and synchronize data between databases, applications, mainframe legacy files and other sources. Because of its multithreaded parallel processing architecture, Integration Server lets you move data quickly and efficiently, and there’s plenty you can do with that data. The transformation library includes more than 300 predefined table and column transformations, and during the data conversions, you can capture and document metadata and data integration processes. A common metadata repository supports centralized storage and process management so you can reuse work and minimize development efforts.
BusinessObjects Data Integrator
When organizations are looking for a system to address their ERP, CRM, business intelligence (BI), and data migration needs, they often seek out products like BusinessObjects Data Integrator. The product provides a single view of enterprise metadata and supports advanced data profiling so you can understand the structure, content and quality of your data. This metadata integration lets you perform end-to-end impact analyses that show you how changes to the source data can affect the ETL and BI environments.
The integration also lets you audit data throughout the ETL process. Another benefit of the metadata integration is the product’s inherent data lineage, which lets users see how data was computed, when it was updated and where it originated. Like other products, Data Integrator supports team-based development and provides a central metadata repository.
Pentaho Data Integration
Many organizations now turn to open source technologies to meet their ETL needs. One of the most popular of these products is the Community Edition of Pentaho Data Integration, also known as Kettle. This product takes a metadata-driven approach to creating complex jobs and transformations in a drag-and-drop GUI environment. As a result, you don’t have to generate custom code, and you have access to ERP connectors, data quality plug-ins, and over 150 out-of-the-box mapping objects that support advanced warehousing components such as slowly changing dimensions and junk dimensions.
Kettle also provides a unified ETL, modeling and visualization development environment that is built on a scalable, standards-based architecture. Note that the Community Edition is a self-supported product. For technical support, managed upgrades and enterprise features, you’ll have to upgrade to the Enterprise Edition.
Talend Open Studio
Another popular open source ETL product is Talend Open Studio. Like Kettle, Open Studio is a metadata-driven solution that supports data migration, integration, and synchronization. Open Studio takes a top-down approach to business modeling that lets line-of-business stakeholders participate in the design of integration processes and monitor the development of those processes.
The development environment provides both a graphical and functional view of the integration processes and includes a graphical palette of open source components and connectors that address all types of tasks and operations. You can also track data throughout the transformation processes and automatically generate technical reference documentation. In addition, Open Studio supports comprehensive connectivity to packaged applications such as ERP and CRM solutions as well as to data warehouses and online analytical processing (OLAP) applications.
Choosing a Third-Party ETL Tool
As you can see, third-party ETL tools can be quite varied, and in addition to the ones I’ve described here, there are many more from which to choose. If you’ve already implemented SQL Server in your organization, it’s worth the effort to try to make SSIS work. However, if you’re data management needs go beyond the ETL functionality available in SSIS, you might consider one of the many products out there.
Before you select a tool, however, assess the types of business systems from which you’ll be extracting data and analyze the specifics of that data. You’ll also need to determine what functionality beyond your basic ETL requirements you’ll need to implement in your system. For example, you might need to incorporate ERP and CRM capabilities. You’ll also want to determine whether your organization is willing to work with open source. Some stakeholders are not comfortable with implementing open source technology. In the end, however, it’s up to you to do the evaluation necessary to determine the best ETL tool for your organization.
ABOUT THE AUTHOR
Robert Sheldon is a technical consultant and the author of numerous books, articles and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. Find more information at http://rhsheldon.com.