Kit Wai Chan - Fotolia
Microsoft's push on the analytics front continued this week as the company rolled out a "go-live" preview version of SQL Server 2017 accompanied by an updated version of its R programming services and a new Python language interface.
The moves are intended to extend SQL Server's use in analytics, especially advanced predictive analytics employing machine learning and other artificial intelligence (AI) capabilities. The release -- originally referred to as SQL Server vNext before formally being dubbed SQL Server 2017 this week -- has many other features, including elements that improve IT shops' abilities to move databases to the cloud and back again. Not incidentally, SQL Server 2017 will be the first version of the software to run on Linux, as well as Windows. It also supports Docker containers.
With the Community Technology Preview (CTP) 2.0 update of SQL Server 2017, Microsoft's data management platform includes in-database support for Python, a language that finds wide use in machine learning. A renamed Microsoft Machine Learning Services component enables Python to run directly on the database server, or along with embedded T-SQL scripts.
"Today, it's either Python or R for advanced analytics," said Warner Chaves, a principal consultant for SQL Server at The Pythian Group Inc. and a Microsoft MVP. "Microsoft realizes it can't freeze out Python."
He said it is very common today for people to use R for experiments or personal prototypes. But many people will move that work to Python when it is time to go into production for operations.
News of in-database Python library support shows some bet hedging by Microsoft. Its biggest analytical language bet to date has been on R, which has grown in use in recent years among data scientists and statisticians working on analytics.
Hedging is a realistic approach for the company to take, according to Chaves, because Python finds as much use as R when it comes to analytics these days.
What's in a name?
R Services for SQL Server 2016 has also become part of Microsoft Machine Learning Services in SQL Server 2017 CTP 2.0, encompassing both R and Python libraries.
The R focus at the company has been considerable, beginning with the company's 2015 acquisition of distributed R server maker Revolution Analytics. But Microsoft also began to bring Python into the fold that same year to help power the Spark analytical processing engine as part of its cloud-based Azure HDInsight big data platform.
Warner ChavesSQL Server principal consultant, The Pythian Group Inc.
In 2015, the company turned to Python specialist Continuum Analytics and its Python distribution, to broaden the developer tool set for AI on Azure. Both R and Python on SQL Server 2017 can be used to run AI jobs in the database using NVIDIA GPUs for acceleration, according to Microsoft.
Python as part of SQL Server could help it reach a yet wider audience. "The story is more complete now, Chavez said. "Python is not an outsider."
Microsoft has worked to enhance its R capabilities as well. This week, the company said it would include pre-trained cognitive models for sentiment analysis as part of the Microsoft R Server 9.1 release. With this release, work done in Sparklyr and H2O open source machine language kits can be incorporated into R Server.
Also with this release, a wider selection of algorithms can run in parallel processing mode, model scoring is sped up, and support is now offered for Optimized Row Columnar, or ORC, file formats.
The Microsoft R distribution adds a great deal of value over the basic distribution from The R Foundation, according to Thomas Dinsmore, an independent consultant. "I'm impressed by the operationalizing work that Microsoft has done with R," he said. "R 9.1 has a number of new tools that make it easier to use and that improve performance."
That is critical, he continued, especially for applications like fraud detection, where sub-second response time can be a requirement.
Dinsmore said R and Python are both finding considerable use in analytics, with R perhaps earning more favor among statisticians, and Python often a preference among programmers. "As a rule, Python is preferred if you're developing commercial applications," he said, noting that Python licensing shows benefits versus R in commercial deployments.
Novel graphic data
Running R and Python in the database means these languages can leverage the power of the data management system both on the cloud and on premises, Joseph Sirosh, corporate vice president of data platforms at Microsoft, told a SQL Server 2017 launch audience as part of this week's Microsoft Data Amp 2017 online event.
He listed other SQL Server enhancements for analytics, including an adaptive query processing feature that continually fine-tunes repeated queries to speed performance.
Sirosh also highlighted SQL Server 2017's ability to handle graph data objects. Such technology has arisen in recent years as an alternative to relational methods for handling relations between data points and, in Sirosh's estimation, it is likely to open up a whole new set of applications.
"Graph processing is important for fraud detection, social network analysis and modeling internet of things networks," he said. "That's where a lot of modeling data is being generated in cutting-edge applications."
All in the SQL Server family
Pythian's Chaves said SQL Server has grown to be more a family of products and less a single server offering. His broad breakdown comprises SQL Server itself, running on premises or on the cloud; Azure SQL Data Warehouse; and Azure SQL Database, a managed database as a service.
One upcoming update to Azure SQL Database was of particular interest to Chaves: Azure SQL Managed Instances. The update, he said, could ease migrations of SQL Server to the Azure cloud, as the new managed instance approach appears to be more compatible with approaches common on premises today.
"The new service will map a lot more closely to how SQL Server is actually done on premises," he said.
Meanwhile, Chaves cautioned that while the Microsoft go-live designation for SQL Server 2017 CTP 2.0 means developers are licensed to deploy it in production, for now it is still beta software, and better employed for experiments than for operations.
Take a look at improvements in SQL Server 2016
Find out about PolyBase support in SQL Server
Learn more about Microsoft Analytics Platform System