The integration of R into SQL Server 2016 was big news, but you could be forgiven for not seeing clearly how the...
open source programming language and analytics environment fits into the larger SQL Server technology picture at Microsoft.
In addition to R itself, we now have Microsoft R Open, R Client and R Server, as well as SQL Server R Services. It's difficult to know where one ends and the other begins.
And Microsoft isn't stopping there. SQL Server 2017, initially referred to as vNext, incorporates the Python language along with R. As part of that addition, the company is creating a combined platform and changing the name of R Server to Microsoft Machine Learning Server and renaming SQL Server R Services as SQL Server Machine Learning Services.
Regardless of the changes, incorporating the R technology into its offerings represents a significant step for Microsoft in its bid to dominate the world of advanced analytics and machine learning. R is a mature language with a large and loyal user base, and expanding on it could prove to be a win for both Microsoft and its customers. With that in mind, let's make sense of each of the existing products and how they fit together to define Microsoft's R strategy.
The R analytical language
Microsoft acquired R vendor Revolution Analytics in 2015; a deal that provided the foundation for putting R in SQL Server 2016. The R programming language started in academia, but has steadily made its way into the enterprise. Data scientists in many fields use R for statistical analysis, data mining, machine learning and advanced graphics. The language includes a set of sophisticated functions that support linear and nonlinear modeling, as well as clustering, classifications and other forms of data manipulation.
In addition to being well-suited to advanced statistics, R's open source nature and free availability under the GNU General Public License (GPL) have also contributed to its rising star power. The language is supported by a large and active community of developers, contributors, data scientists and other users around the globe.
Despite its popularity, R faces some challenges. Applying the language to large data sets is difficult because it relies on single-threaded processing, unlike newer big data tools that take advantage of parallel processing technologies and leverage multiple CPUs simultaneously.
Another limiting factor is that R is an in-memory tool, so it is restricted to the memory available to the host system at any given time. Implementing R-based applications in a production environment can also be difficult. So Microsoft has come up with R Open, its own version of R that addresses some of the language's limitations.
Microsoft R Open
R Open is an enhanced R distribution that is also freely available under the GPL. And because it's based on standard R, R Open is compatible with existing R scripts, packages and applications.
But R Open also includes multithreaded math libraries that are designed to improve performance and resource usage. Operations such as matrix inverse, multiply and decomposition can take advantage of the parallel processing capabilities of the host system.
R Open is available for both Windows and Linux platforms and is a requirement for using R Client, R Server and SQL Server R Services, though you can also use it separately from those products. For example, you can use R Open in RStudio, an open source integrated development environment for building R applications. However, you will get the best performance out of R Open in conjunction with Microsoft's own R products.
Microsoft R Client
R Client is a free development tool built on top of R Open. Data scientists can use the tool to develop and test predictive models and carry out high-performance analytics, incorporating open source R packages.
Like R Open, R Client is available for both Windows and Linux platforms. Each version supports ScaleR, a proprietary computing and analytical engine that can partition massively large data sets into smaller ones to be distributed and analyzed in parallel. ScaleR makes it possible for R Client to support advanced multithreaded processing, as well as remote computing.
Even so, R Client has its own limitations. As with R, it's restricted to local memory, so all data associated with a project must fit into what is available to the local system. In addition, multiprocessing is limited to only two threads, no matter what the computer's capabilities. Fortunately, data scientists can build analytical models in R Client and then push the computational operations to R Server or SQL Server R Services to take advantage of their more robust processing capabilities.
Microsoft R Server
R Server is an enterprise-class analytics platform that supports big data analytics, predictive modeling and machine learning. The commercial counterpart to R Client, it addresses many of the latter's limitations. For example, R Server can take full advantage of the ScaleR technology's capabilities and can handle larger workloads at greater scale.
In addition, R Server supports chunked processing, which breaks up data into smaller units, making it easier to process the data and work with it in memory. You can implement R Server on client-server architectures that support remote access over authenticated connections. It can host and manage parallel and distributed R workloads across Windows and Linux servers, as well as Hadoop and Apache Spark clusters.
R Server is a stand-alone product that is separate from both SQL Server and SQL Server R Services, and it is built directly into SQL Server 2016. This has caused some confusion around how R Server fits in with SQL Server, in part because you can use the SQL Server installation media to install the Windows version of R Server. To confuse matters even further, you need a SQL Server 2016 Enterprise Edition license to use R Server on Windows, even though it's a separate product.
You should consider R Server if you don't need to integrate with SQL Server, but require the kind of processing, distribution and scalability capabilities that come with an enterprise tool. However, if you require these capabilities and you want SQL Server and R integration, you must turn to SQL Server R Services.
SQL Server R Services
Microsoft documentation often refers to R Services as In-Database and R Server as Standalone. There's good reason for the In-Database label. With R Services, Microsoft has integrated R into the SQL Server database engine, facilitating secure communications between the R runtime and other SQL Server features.
To implement R in SQL Server 2016, you must include R Services as part of your SQL Server installation, selecting it as you would other database engine features. During the installation process, SQL Server installs extensions to the database engine to support R execution. The setup operation also installs R Open and the ScaleR libraries.
Data scientists can connect to R Services in order to run R code or deploy R tools created in R Client to SQL Server 2016 for processing. R code can also be embedded in user-defined stored procedures, which can then be called from applications to carry out R-based computations.
R Services is available in most editions of SQL Server 2016, except for the basic version of the lightweight Express Edition. However, performance and scalability are limited in the Standard Edition. If you want to take full advantage of the capabilities offered by R Services, you need the Enterprise Edition.
The same applies to the renamed Machine Learning Services in SQL Server 2017, currently available in a Community Technology Preview 2.0 release. And with Python support on the horizon as part of that update, the world of SQL Server analytics is likely to only become more interesting.
Python becomes first class
R's commercial analytics appeal
Amazon Web Services tackles R language