alex_aldo - Fotolia
SQL Server 2017 adds support for graph databases to help streamline the processing and querying of data sets with...
relationships that can be mapped in graph form, such as healthcare, customer and social networking data. In many cases, a SQL Server graph database can better handle this increasingly abundant type of data than a traditional relational model can.
Microsoft has integrated graph database features into the SQL Server Database Engine to leverage familiar core components, such as the storage engine, the query processor and the relational database system's metadata functions. You can also use many other SQL Server features in conjunction with graph databases, including Machine Learning Services, columnstore indexes, and various tools and utilities, including SQL Server Management Studio.
Like similar technologies, a SQL Server graph database consists of three building blocks that define the structure of data sets and the relationships between the data elements in them:
- Nodes, which represent an entity such as a person, location, product or organization
- Edges. which delineate a relationship or connection between two nodes
- Properties, which are attributes assigned to individual nodes and edges in the form of key-value pairs
Inside a SQL Server graph database
In a graph database, you group related nodes into a node table. For example, if you build a graph database to support an online forum for a group of book lovers, you might create node tables for books, authors and readers. The tables would include one node for each book, author and reader, respectively.
Similar to how you handle nodes, you group related edges into an edge table in a graph database. Returning to the example of the book lovers' forum, you might create a WrittenBy edge table to show which books are written by which authors. You might then create a RecommendedBy table to show which readers have recommended which books and which authors. The two edge tables represent the relationships between the three types of nodes.
To better understand how nodes and edges work, consider the graphic in Figure 1, which shows three nodes and the relationships defined between them.
Although the figure shows only three nodes, we can incorporate more complex, many-to-many relationships into a SQL Server graph database. For example, one author can write multiple books, and multiple authors can write one book or multiple books. In addition, readers can recommend different books and authors, and perhaps even other readers based on the recommendations that they've made.
Building graph database tables
A node table includes an auto-generated ID column and one or more columns that describe the nodes. In graph theory, which underlies graph databases, those additional columns are referred to as properties or, sometimes, attributes. For example, the Authors table might include properties for first, middle and last names. The values in properties columns are just like column values in traditional SQL Server tables.
Figure 2 shows what the node tables might look like in a SQL Server graph database for the book lovers' forum.
The default ID column, referred to as $node_id, is actually a pseudo-column that maps to an internal name that includes $node_id_, followed by a globally unique identifier, Microsoft's parlance for a unique identifier. You can use the pseudo-column name in any of your queries when referencing that column.
SQL Server's database engine automatically generates the values in the $node_id column, which combine the node table's ID along with an integer-based bigint value. SQL Server displays the values as JSON strings. In Figure 2, you can see only part of those strings; the complete values are much longer. For example, the following JSON string shows the actual value for the last row of the Authors table:
The string identifies the table type (node), database schema (dbo), table (Authors) and individual node ID (3). In this case, the latter is for the author Virginia Woolf.
An edge table takes a slightly different approach. It includes three auto-generated columns: $edge_id, $from_id and $to_id. The $edge_id column works just like the $node_id column in a node table. It serves as a pseudo-column for easy referencing, and it automatically generates unique IDs for each edge in a database.
On the edges of a relationship
The $from_id and $to_id columns reference the nodes at both ends of a relationship; together, they define the connections between nodes in the graph database. An edge table can also include property columns, but these are optional.
Figure 3 shows the WrittenBy and RecommendedBy edge tables mentioned above; the latter also includes a RecommendedDate property column.
As with the $edge_id and $node_id columns, the values in the $from_id and $to_id columns are shortened versions of the JSON strings that represent the referenced nodes. For example, the first row of the WrittenBy table shows a relationship between book 0 and author 1, indicating that the book One Hundred Years of Solitude was written by Gabriel García Márquez.
As part of SQL Server 2017, Microsoft has updated several Transact-SQL statements to make it possible to create node and edge tables and to populate the relationship data in the edge tables. In addition, you can use statements written in Transact-SQL to modify table definitions and to query and analyze the data in the tables.
There is, of course, a lot more to building and managing a SQL Server graph database than what we've covered here, but this should give you a good idea of how they work. If you're working with application data that can benefit from the graph structure, you now have one more reason to give SQL Server 2017 serious consideration.
The potential business benefits of using graph databases
How graph databases can help identify and prevent fraud
A look at the integration of Python into SQL Server 2017