Guide to SQL business intelligence, analytics and data visualization
A comprehensive collection of articles, videos and more, hand-picked by our editors
The physical storage design for your multidimensional applications affects the latency, size and performance of...
your project. Therefore, when designing the physical storage for it, you must determine how to store the dimensions and measures, and how to design an effective physical storage strategy for multidimensional applications. In this article, I will review the standard online analytical processing (OLAP) storage models, and discuss the considerations involved when choosing a storage model for multidimensional applications.
Multidimensional applications store three types of information: metadata, detail data and aggregations. The metadata defines the multidimensional structure, including what facts and dimensions are used. The detail data contains the business metrics for analysis. And the aggregations summarize the detail data. The metadata is always stored in the multidimensional structure, but the storage locations for the detail data and aggregations are defined by the storage model.
Microsoft SQL Server Analysis Services (SSAS) supports three main physical storage models for multidimensional platforms: multidimensional OLAP (MOLAP), relational OLAP (ROLAP), and hybrid OLAP (HOLAP). In practice, most SQL Server OLAP systems are implemented with the combination of these three storage models. However, considerations for choosing a storage model for SQL Server OLAP depend on various factors, such as data latency, query speed, processing time and the total size. The following are the main features of each OLAP storage model.
MOLAP, ROLAP and HOLAP
The default storage model for multidimensional platforms is MOLAP. In this storage model, a copy of the detail data and any aggregations are stored in a multidimensional structure. Therefore, they do not require a permanent connection to the underlying relational data store.
The MOLAP model also uses compression to store data in the multidimensional structure. It is the fastest model for queries because all the required data is actually stored in the multidimensional structure, and all calculations are pre-generated when the multidimensional structure is processed.
Data in the multidimensional structure is refreshed only when the multidimensional structure is processed, so latency is high. That means any subsequent changes to the relational data store will reflect in the multidimensional structure only once it is reprocessed. It also requires additional storage on the SQL Server OLAP server because the copy of the relational data is stored on OLAP server.
In the ROLAP storage model, the aggregations are stored in indexed views or tables in the source database. When a dimension is stored using ROLAP, the dimensions must be processed immediately if the schema of the source table changes.
Under the ROLAP model, data is accessed directly from the relational source database if it is not in the cache. It is usually the slowest of the three storage options for queries. It also has the slowest processing time, unless there are no aggregations.
In the HOLAP storage model, aggregations are stored in a multidimensional structure, and noncached data is accessed directly from the relational data source. It offers query performance equivalent to that of MOLAP when only summary aggregations are accessed. When detail data is required, its query performance is slower than MOLAP's and ROLAP's because the data must be retrieved from the relational data source.
Decisions, decisions, decisions
The proactive caching settings in SSAS enable you to balance the issues of higher latency, usually associated with MOLAP solutions, and lower performance, usually associated with ROLAP solutions. This is possible because proactive caching can dramatically and intelligently update dimensions and measures when the underlying data changes. For more information, see Proactive Caching (Partitions).
Dimensions can use either the ROLAP or MOLAP storage model. Measure groups and partitions can use any of the three storage models. As mentioned earlier, MOLAP is the default storage model for both dimensions and measure groups.
The following table compares typical latency, query performance, processing performance and overall database size characteristics for the three storage models.
You should also be aware of the following facts:
- The proactive caching setting affects the degree of latency for each storage model. Proactive caching is useful in a transaction-oriented system in which changes are unpredictable or when real-time access to the data is required. Furthermore, proactive caching can be configured to access the ROLAP data until the MOLAP cache is rebuilt when maximum latency is reached. However, if ROLAP data is accessed while a MOLAP cache is being rebuilt, this slows the performance on the MOLAP system.
- The original size of the detail data and the number of aggregations that are created affect the overall size of the storage model solution.
- The processing performance for ROLAP storage is not slow if the "Do not design aggregations" option is selected.
- The overall database size when ROLAP is used is small if the "Do not design aggregations" option is selected.
- For HOLAP, summary queries to the aggregates are fast. Queries from the relational data source are slow.
About the author
Basit Farooq is a lead database administrator, trainer and technical author. He has more than a decade of IT experience in development, technical training and database administration on Microsoft SQL Server platforms.