Sergiy Serdyuk - Fotolia
Python enthusiasts will be able to use the programming language with Microsoft's SQL Server 2017 database -- within limits.
At first, the idea of integrating an analytics programming language into SQL Server might seem a little odd; however, it isn't entirely unprecedented. Microsoft added the R language to SQL Server 2016 because it can handle complex mathematical data analysis. Now, with Python-SQL Server integration, the R framework, Python and T-SQL, Microsoft's implementation of standard SQL, can be used simultaneously in the same database.
Microsoft's support for Python, beginning with the Community Technology Preview 2.0 release of SQL Server 2017, will be somewhat limited, at least initially. The preview of Python support is only available in the Windows version of SQL Server 2017, although Python integration will likely make it into the new Linux version of SQL Server eventually.
To get started, SQL Server users can download and install version 3.5 of the open source CPython interpreter during the SQL Server setup process, along with a set of Python data-science packages containing pre-built code modules. Microsoft only supports CPython 3.5; it was selected to avoid some of the compatibility issues that exist in newer versions of the interpreter, according to the company.
Microsoft's decision to integrate Python into SQL Server raises a couple questions. First and foremost, how do SQL Server users benefit from Python integration? And what is Microsoft doing to protect SQL Server databases against malicious Python scripts?
What a Python-SQL Server combo adds
Python is a general-purpose programming language that is known for being especially easy to learn. Although Python can be used for many purposes, it is primarily used for data analytics and machine learning. Although such capabilities are obviously useful, they alone don't address why Microsoft is enabling Python-SQL Server integration. Why not simply continue to let users write stand-alone Python code that queries a back-end SQL Server database from a client system, something Microsoft already supports?
The answer is performance and scalability. While it is possible for to write a script in Python to query SQL Server locally, doing so requires SQL Server data to be sent to the client system for analysis. Not only is that inefficient, in some cases it can also limit the volume of data that can be analyzed.
By comparison, running Python directly on SQL Server makes the following possible:
- It enables the compute process to be executed at the server, with only the results sent back to the machine that initiated the query. That reduces data movement, and, because SQL Server databases tend to run on high-performance hardware, taking advantage of server resources should speed processing itself.
- As a result, data scientists and other analysts are more likely to be able to analyze full data sets, instead of having to limit themselves to samples.
- It should also be easier for both data scientists and application developers to put Python-based analytical models into production. Python models and scripts can be embedded in T-SQL stored procedures for execution, without having to translate them into another language first.
- In addition, the integration of Python into SQL Server lets users run parallel analytics jobs that would otherwise be difficult to do. A SQL query could, for example, send data to multiple, parallel Python processes that run simultaneously on the server.
- With the integration, Python users can take advantage of SQL Server features such as in-memory tables.
Python in SQL Server a security risk?
Whether data is being transmitted back and forth through SQL Server queries or exported to an application server, the movement of data raises questions about its security. The ability to analyze data in place by running Python code directly on SQL Server, without it ever being transmitted across the wire to another machine, can certainly help with security in and of itself.
Nonetheless, Microsoft has also been careful to implement the Python integration in a secure manner. Storing Python code within a database as a T-SQL stored procedure provides a security boost. More important, this code doesn't execute within the SQL Server address space, thereby helping to ensure that poorly written Python code can't harm SQL Server.
Although Python-SQL Server integration may seem like an enterprise-class feature, all SQL Server 2017 editions will support Python, including the free Express Edition. Microsoft has also extended support for both R and Python in Azure SQL Database.
Should you consolidate SQL Server Enterprise Edition?
Machine learning in SQL Server 101
What does Linux in SQL Server mean?
Learn to use Python, Ruby and other languages