carloscastilla - Fotolia
Microsoft and Talend have unveiled Stitch Data Loader support for Microsoft Azure SQL Data Warehouse. This loader for Azure arises out of data integration vendor Talend's late-2018 acquisition of Stitch Inc.
Stitch's tools are intended to help a wider spectrum of data users create multiple-source data integrations, particularly on the cloud. The Stitch loader works with the Talend Data Fabric and can move data into Azure SQL Data Warehouse while managing schema changes and security. Talend supports other clouds, as well.
Last year, Talend released a bulk uploader that supported data movement into Azure back-end cloud storage formats. The latest update with Stitch provides an easier means for users to set up diverse data ingestion on the Azure cloud.
The Stitch Data Loader for Azure SQL Data Warehouse was introduced on Feb. 7.
Easing move to cloud
This tooling is an example of growth in extract, load and transform (ELT) approaches that complement or replace traditional extract, transform and load methods of data preparation.
"Historically, you would have transformed the data before loading it into the data warehouse," said Dylan Baker, principal at London-based DBAnalytics and a user of the Stitch software. "You did that because storage and compute in the data warehouse were expensive, and you wanted to do that work before the data got there."
Reduced cost and new tooling for cloud data warehouses now make it more feasible to dump data into the data warehouse for quicker analytical iterations, and tools like Stitch play a role in handling that data more efficiently, Baker said.
In Baker's practice, the data sources going into the cloud data warehouses are diverse, including Shopify, HubSpot and Salesforce data. JSON is a commonly used format, but in-house proprietary formats are also used.
Baker has used AWS and its Amazon Redshift system, as well as BigQuery on Google Cloud Platform, for cloud data warehousing. While he hasn't employed Microsoft Azure or Azure SQL Data Warehouse thus far, he said customers he works with see Azure cloud capabilities becoming more popular and catching up to the Redshift franchise.
Baker said tools like Stitch help make it easier and more economical to set up and manage data warehouse analytics for businesses of various sizes.
The call of cloud computing
"The growth of cloud is rapid," said Stitch co-founder Jake Stein, who is now senior vice president for Stitch at Talend. "What we see is an explosion of cloud data warehouses."
"They only work if you get your data into them. We see data ingestion as the one common theme," he said. "Data lives in a lot of places, but there should be one place for analyzing it."
With cloud-based data warehouses like Amazon Redshift and Azure SQL Data Warehouse, a new implementation pattern is emerging, Stein said. The new generation of cloud data warehouses makes it easier to do ELT, he added.
"The transform happens after the load. That allows us to be laser-focused on the extraction and the loading, and the transform can happen later, using a variety of tools," Stein said, naming the Apache Spark processing engine among such tools.
Pay by the drip
Dylan Bakerprincipal at DBAnalytics
Talend's acquisition of Stitch was a bet on a trend toward the consumerization of IT and wider use of data preparation tools, according to Ashley Stirrup, chief marketing officer at Talend, based in Redwood City, Calif.
Besides taking a place in the Talend enterprise software lineup, Stitch will continue to offer a free Stitch Data Loader trial. It lets line-of-business users "do some research, start using the software for free and prove out concepts," Stirrup said.
"People can try something, get all sorts of data, get success, 'pay by the drip' and grow as their needs grow," Stirrup said.
Matching those capabilities with Talend software will ensure better data cleansing and data governance can be done on top of the fast Stitch data ingestion, he said.
Customers are finding such governance capabilities of greater use for cloud data, he said, as organizations adapt their data practices to comply with data privacy legislation such as GDPR.
Beyond the cavalier
Baker agreed GDPR support is important. He said the early days of cloud data saw a rush to incorporate new sources that wasn't always accompanied by thoughtful governance.
"People might have been more cavalier in the past. Now, GDPR is something we think about a lot," he said. "Data governance is top of mind now -- teams increasingly will have to think more about it."