microworks - Fotolia
Real-time operational analytics is a new capability in SQL Server 2016 that lets you run analytical queries directly against a transactional database. It's a powerful tool, but also introduces several challenges for database developers and administrators.
To enable real-time analytics against operational data, Microsoft added support for what it calls updateable, nonclustered columnstore indexes to SQL Server 2016, and also enhanced the database engine to take advantage of them.
In general terms, columnstore indexes are column-based structures for storing and querying tables of data. Introduced in SQL Server 2012, they were initially designed to speed up query processing in SQL Server data warehouses.
The new type of columnstore index makes it possible to run transactional and analytical workloads against the same database, without the need to implement a data warehouse and complex extract, transform and load (ETL) operations.
The indexes maintain separate copies of the transaction data being analyzed, in a columnar format, instead of the conventional row-based tables. If a transaction modifies data in the base tables, SQL Server automatically propagates the changes to the associated columnstore indexes. That keeps the data in the indexes up to date and available for real-time analytics at all times, while not affecting transaction processing workloads.
This doesn't mean that an analytical query never touches the base tables. When you run a query, SQL Server's query optimizer determines the best way to access the data, dipping into the base tables when necessary. From the application's perspective, the process is transparent and requires no special coding.
To implement real-time operational analytics, you need to create only the necessary columnstore indexes on the targeted tables. You don't need to modify the existing transaction processing applications at all, or set up special types of analytical queries.
You can also create the columnstore indexes on both disk-based and memory-optimized tables. For disk-based tables, you can use the updateable, nonclustered columnstore indexes, which can be filtered to exclude data that's updated frequently in order to avoid taxing the database server. For in-memory tables, on the other hand, you can create clustered columnstore indexes that must include all the columns in the base table. While that seems counterintuitive, Microsoft says the clustered indexes serve as nonclustered indexes, in this particular case.
The advantages of real-time analytics
Organizations often segregate analytical workloads from their transactional counterparts, implementing sophisticated processing platforms to support the ETL and data warehousing processes. But there are many drawbacks to doing this.
Such infrastructures are complex systems that require significant IT resources, which makes them costly to implement and maintain. Data sets must be carefully transformed and kept in sync, without resulting in corrupt or inaccurate data, and while ensuring that the numerous technology components involved in the ETL process function properly.
Data latency can also be challenging. ETL jobs are usually scheduled during off hours, and can take a long time to run, depending on the volume and complexity of the data. For many organizations, refreshing the data in a data warehouse more than once a day is simply not doable -- yet, for some companies, even a delay of 24 hours is too much.
That's where real-time operational analytics can help. You don't need to implement a separate technology infrastructure just to store and access analytics data. Once you've added the necessary columnstore indexes, you're good to go -- and the data being analyzed will be completely up to date.
Operational data analytics challenges
As promising as real-time operational analytics sounds, though, it's not without its challenges.
Transactional databases are highly normalized to ensure data integrity and to eliminate duplicate data. Analytical queries generally perform better against a dimensional model, such as a star or snowflake schema, which you may find in a data warehouse.
That said, a columnstore index can deliver substantially better analytics performance than the conventional B-tree indexes used in transactional databases. The analytical queries likely won't perform as well as those running against a stand-alone data warehouse, according to Microsoft. But you can analyze real-time data without all the complexity of traditional data warehousing.
However, you must also contend with the potential impact of the analytical queries on your transactional workloads. Maintaining the columnstore indexes and processing queries against them can take its toll on database server resources. For many organizations, even the slightest disruption to their transaction processing performance isn't feasible.
SQL Server 2016 offers several methods to mitigate the performance impact. As mentioned earlier, you can filter columnstore indexes in disk-based tables to minimize the processing overhead from indexing hot data -- that is, data that is often changed or updated. A filtered index lets you leave the hot data in a B-tree index and limits the columnstore index to cold data that rarely changes, if at all.
If that isn't doable on certain data, or if you're using a columnstore index in a memory-optimized table, a compression delay feature can postpone compression on the hot data for a specified period of time to reduce processing loads. In both cases, the hot data will automatically be incorporated into analytical queries.
Another strategy to mitigate real-time operational analytics overhead is to offload the analytics processing to one or more secondary replica databases created with AlwaysOn Availability Groups, Microsoft's high availability technology. The analytical queries can then read data from those replicas, rather than from the primary database. Of course, this also means you need an AlwaysOn structure in place.
Making the move to operational analytics
If the ability to analyze real-time data isn't a high priority for your organization, an operational analytics deployment would likely be overkill. And even if you do have a need for such data, and successfully implement a real-time operational analytics setup, there's no guarantee that you won't require ETL and data warehousing infrastructures as well. If you need to maximize the performance of your analytical queries, maintain large amounts of historical data for analysis, or consolidate data from multiple sources, including nonrelational ones, you'll still need a separate analytics platform.
Microsoft recommends real-time operational analytics when you want to store and analyze relational data from a single source. For example, that might involve a customer relationship management system that maintains all its customer data in SQL Server.
Under such circumstances, the real-time approach could be ideal, as long as you take into account the possible impact that analytical queries can have on your transactional throughput. You can easily add support for real-time analytics workloads without making significant investments -- unless, of course, you still haven't upgraded to SQL Server 2016.
Relational database management system highlights in SQL Server 2016
The future for SQL Server database administrators
Should you upgrade to SQL Server 2016?