While cloud computing is still a relatively new term, the concept of outsourced data analytics has been around...
for a while. Still, recent buzz around cloud databases has ramped up interest in the potential of "data warehousing as a service" – for obvious reasons.
Because of the rapid scalability made possible by cloud-based models, along with the potentially cheaper alternative it provides for smaller businesses, data warehousing seems like a natural fit for the cloud. But is this really the case?
"It depends. I think there is certainly a natural fit in that the cloud provides an easy way for data warehouses to scale," said John Welch, a chief architect with Mariner, a North Carolina-based business intelligence consulting firm. "But there's still a fundamental issue in that the cloud today has some real problems transferring data to and from the cloud, particularly when you're talking about large amounts. So there are pros and cons of looking at it from that standpoint."
"When you start thinking about taking all that data and pushing it across a much slower internet-based connection to move data up to a cloud-based server … your window gets a lot longer, just in terms of the pure data transmittal that has to be done," he said. "So that's really the big drawback from the traditional data warehousing standpoint."
Welch added that for this reason, several cloud services are moving toward a trickle feed model, where each individual transaction is sent to the cloud server as it happens, rather than having one large, daily batch load. Unfortunately, trickle feed models have drawback as well.
"The challenge there is that it typically isn't the most efficient way to process the data," said Welch, adding that in some cases, certain data quality or profiling techniques may rely on a large batch of data coming in as a single unit. Otherwise, the data might be profiled or assessed inaccurately, he said.
Microsoft has taken some steps to address the issue of moving data in the cloud. The most recent community technical preview (CTP) of SQL Azure Database includes BULK INSERT support to speed up the process.
"I've seen reports of up to four, six and even 10 times faster using [BULK INSERT]," Welch said. "Obviously, the mileage varies depending on your Internet connection and exactly what you're doing with it. But it does [account for] a pretty significant difference in speed as far as doing those bulk batch-type operations."
The BULK INSERT addition is just one example of the many changes Microsoft has made to Azure in the past year, which has been a welcome sight for DBAs and developers.
"I applaud how closely Microsoft has worked with the community to adapt their product to a rapidly changing marketplace," said Brent Ozar, a SQL Server expert with Quest Software Inc. "Knowing that Microsoft has been willing to change Azure has been reassuring for those of us who are developing applications that will count on its future expansion."
Another barrier to SQL Server data warehouses in the cloud involves the amount of storage available with SQL Azure Database. Currently, there is a cutoff of 10 GB of data for Azure Business Edition, which Welch described as "fairly limiting" for BI applications and data warehouses.
"That's enough for a small departmental application, but if you're looking at doing larger projects, the 10 GBs is limiting. So certainly increasing those limits over the next couple of years will be helpful," he said.
Welch added that Microsoft is already talking about increasing the amount of data that can be stored on Azure. The result could be a major boost for Microsoft in the battle with other cloud platforms with an eye on data warehousing, such as Greenplum's Enterprise Data Cloud (EDC) initiative or nCluster Cloud Edition from Aster Data Systems.
There are other steps Welch said he hopes to see Microsoft take with Azure in the future, such as the deployment of other parts of the SQL Server stack – like SQL Server Analysis Services or Reporting Services -- into a cloud-based model.
"I'd like to see them make those available in the same way that they are making the SQL Azure Database available, in a highly-scalable, on-demand type of model," he explained. "There are no definite plans around this, but it's something I look forward to."
The current CTP for SQL Azure Database is feature-complete. David Robinson, a senior program manager on Microsoft's SQL Azure team, said in a blog post that the Azure CTP will remain free until the service goes live in November. SQL Azure Database Business Edition is currently set to be priced at $99.99 per month.
Check out more preview coverage of PASS Summit 2009.