Dynamic data masking is one of many new security features coming in SQL Server 2016. It is also in the Azure SQL database V12 preview. This feature lets you hide part of the data from normal users. For example, an application may display a Social Security Number (SSN) and mask out all but the last four digits of the number. Instead of displaying 123-45-6789, it would display XXX-XX-6789. If a privileged user is viewing the information, the masking is not applied. This is not encryption. The data is not masked on the storage device; it is masked or obfuscated on output based on whether the caller is normal or privileged. This is not a total security solution, but it can be deployed as part of your security strategy.
Dynamic data masking is useful in a couple of scenarios. The first is probably obvious: for regulatory compliance. Whether the regulation is governmental or dictated by an industrial body, protecting the display of sensitive data is often demanded or considered a best practice. Another scenario where this feature could be leveraged is internally with developers. It is not uncommon for developers to request real data for testing and development purposes. Often, when supplying data for this situation, you need to scrub sensitive data, whether through scrambling or actual removal. Scrubbed data can affect the effectiveness of testing by skewing query results. By applying dynamic data masking in this scenario, the data can be protected while providing a more realistic test bed. Often, static data masking, where data is stored in a masked format instead of only being masked upon output, is used in this situation, but this method introduces other problems, such as breaking data integrity.
In this article, I'll provide an example of how to set up dynamic data masking in SQL Server 2016 CTP2 and in Azure SQL database V12. An appealing characteristic of dynamic data testing is that it can be applied by adding to a table's definition without having to rewrite application code.
Setting up dynamic data masking
This article is based on the publicly available SQL Server 2016 CTP 2.2 release. The first step in dynamic data masking is to create a database like the one shown in Figure 1.
Create a table for testing
Insert a few rows for testing
Verify that the data is there
Because no masking rules are defined, all data is visible.
Create a test user and test permissions to the table
Because no masking has been defined, the TestUser login sees everything.
Masking is part of the table definition in SQL Server 2016
The Employee table definition was modified to add a customized mask to the HomePhone column (Figure 5). The function used in this case is the partial function. The partial(0,"XXX-XXX-",4) means to grab zero leftmost characters from the column to start with, use the string defined, and then display the last four characters of the column. There is nothing inherently meaningful about the Xs or the hyphens. The mask string was crafted to be similar to the data already in that column. Let's see what happens when the WorkPhone column is masked with a slightly different string (Figure 6).
The definition of the mask for WorkPhone is similar to the definition of the mask for HomePhone except for the characters in the string within the partial function.
Now that it is shown that there are mask definitions on each of the phone columns, can a query be run against the table referencing a masked column as the search criteria? Let's find out.
Since the mask is applied on display of the data, the ability to search on a masked column is not affected.
Let's add a mask to SSN.
There are also a few predefined functions to add a mask to a column. For example, there is a predefined mask for email. Let's take a look at it.
Notice that the predefined mask for email address displays the first character of the column, but the rest of the mask is XXX@XXXX.com to obfuscate the entire domain name (Figure 9).
Does a mask cause a sort to not work properly? Let's find out.
It is sorted properly, but the first character of the mask is presented (Figure 10). What if we took that away and applied a custom mask to the email address?
As shown above, the data is still sorted properly. Also note that I dropped the mask from the column and added a customized mask using the partial function. Again, the mask is being deployed upon display.
At this point, I've shown that a masked column can be used in a where clause for searching for specific records and that correct sorting is occurring. If a developer wanted to get at the unmasked data, could this be done (Figure 12)?
I needed to provide additional permissions to TestUser on the dbo schema, but the SELECT INTO copied the unmasked data into another table. The unmasked data is now in that table.
Dynamic data masking in Azure SQL database v12 (preview)
In Azure SQL database v12, the portal is used to set masking for columns in a GUI. Here, I'll use the Azure preview portal to demonstrate dynamic data masking setup. I assume you know how to create and access an Azure SQL database from SQL Server Management Studio (SSMS). Note that if you are using a "Downlevel Client" (TDS 7.3 and below) for accessing the data, you will want to use a modified connection string from servername.database.windows.net to servername.secure.windows.net.
The first step is to create a table in the Azure SQL database using the same syntax used in the SQL Server 2016 example (Figure 1). The syntax is shown again in Figure 13.
I'll also insert the same data used in Figure 1 into the Azure SQL Database table. Use the same INSERT statements from Figure 2 and check to see that the data is correct, as shown in Figure 14.
Defining a mask
From the Azure preview portal, select Browse ALL / SQL databases / <databasename> / Dynamic Data Masking (preview). This will display a summary of any masking rules already defined. From this point, select Add Mask. Select the HomePhone choice in the Column dropdown. Choose Custom String from the Masking Field Format dropdown. Use 0 for Exposed Prefix, XXX-XXX- for the Padding String, and 4 for the Exposed Suffix. Click Save on the Add Masking Rule blade and then Save again on the Dynamic Data Masking (preview) blade. The portal will display a message to indicate the changes are saved and ask you to acknowledge by clicking OK.
Let's look at the employee table to see if it worked. Note that it might take a minute or two for the masking rule to be applied fully.
The masking worked in the same manner in Azure SQL database as it did in SQL Server 2016 CTP2. At this point, it is clear that other masking rules can be defined as I did earlier but through the GUI. Currently, it appears that the GUI in the portal must be used to define masking rules because when I used the DDL to add a mask, the syntax appeared to be accepted, but the output was unchanged.
Some common masking rules are built in to Azure SQL database. One of these is for Social Security Number. Let's add it to the employee table's SSN column, as shown below.
The results are below in Figure 18.
Although the data is masked, it is not necessarily in the format that may have been desired. This mask would work fine for a SSN column that was wider and had the dashes included in the data instead of inferred. Figure 19 shows the masking rule for using the Number (random number range) option for the EmpID for the employee table.
Obviously, a random number mask for EmpID is not realistic, but it shows the built-in mask feature.
Also, note that there is a field on the dynamic data masking blade for privileged logins. These are logins that do not apply masking rules when displaying the data. A login would need to be created and another connection to the Azure SQL database established to demonstrate this. This is somewhat different from how this feature was presented earlier in the article where a user (without a login) could easily be created and impersonated for the demonstration.
About the author
Rick Heiges is a SQL Server MVP and Principal Solutions Architect for DB Best Technologies LLC. He works with customers educating them about the Microsoft Data Platform to solve business problems and to maximize the value of their data. Rick is very involved in the SQL Server Community via PASS where he spent nine years on the Board of Directors. You can read his blog on and follow him on Twitter @heigesr2.
Find out how to implement row-level security in Azure SQL Database
Learn about Always Encrypted, a new security feature for SQL Server 2016