SQL Server 2005 Analysis Services (and later) offers numerous processing options. By now, most developers and database administrators have a basic grasp of the functionality implemented by each processing option. Let's take a deeper look at Analysis Services processing from hardware and data availability perspectives and answer some relevant questions. What resources are used during each phase of processing? Should you worry about running out of resources when you process your objects? Is multidimensional data available during processing? Is your query performance likely to suffer when you use particular types of Analysis Services cube processing?
This tip addresses those questions and helps you procure – or at least plan for – the most appropriate hardware for Analysis Services. Find out how to manage your users' expectations about application performance. Typically, Analysis Services dimensions contain fewer records than partitions, so the majority of processing time is spent on partitions. Therefore, we'll focus on partition processing options when using Multidimensional Analytical Processing (MOLAP) -- the most popular and most efficient way of storing Analysis Services data.
Although SQL Server Analysis Services allows you to process the entire database using a single command, it also allows processing objects at a much more granular level. If you're working with Analysis Services databases that are a few dozen megabytes in size or if you can afford to have lengthy downtime each time you process objects, then you don't need to worry about different processing options.
However, if you're working with large databases and you're trying to make your analytical data highly available, you should avoid processing a database in a single transaction. Instead, break measure group processing down into manageable chunks using one or more of the following options:
Process Clear option deletes measure group data and aggregation files and is referred to as "Un-process" in SQL Server Management Studio (SSMS). This processing option is very quick. It leaves the cube metadata intact, but once a cube or measure group is un-processed, it is useless from the users' perspective because you can't use it for querying. You must use either Process Full or Process Data options to load data back into the Analysis Services cube. This option is very useful if you have a partition that contains incorrect data and you simply want to delete the data without affecting the rest of the partitions in that measure group.
Process Clear is also useful if you don't have a lot of disk space available. Normally, when you process a measure group, Analysis Services creates a second copy of the measure group, builds all necessary files and then swaps the existing and new files. So if you have a measure group that is 100 GB, and you process it in a single transaction (not recommended, but completely possible), you will need 200 GB of storage during processing.
With Process Clear, you can minimize the processing footprint at the cost of making some or all of the cube data unavailable during processing.
Process Data does exactly what it says. It loads Analysis Services data into the cube. This processing option consists of several stages, however. Analysis Services must first construct a relational query for each partition -- presuming you're processing the entire measure group -- then execute the query, store query results in memory, convert data into a multidimensional format and finally, write the multidimensional files to disk.
Process Data is memory intensive during the querying stage. This will be particularly imposing if you host the relational database and Analysis Services on the same host. If you do use this less than desired configuration, procure as much memory as you can for your server.
On the other hand, if you host the relational database on another server, you will have to account for the overhead of transferring data over the network. When data is written to disk, you can expect spikes in disk utilization (not surprisingly) as well as elevated CPU usage. When data is converted from relational to multidimensional format, CPU utilization is also elevated. The amount of resources used will depend on whether the relational database runs on the same host, how much data you process and how many partitions you process in parallel.
If you use default processing options in SSMS, Analysis Services attempts to choose the most appropriate number of partitions to be processed in parallel. If you have multiple CPUs and plenty of memory on the server, it is prudent to override the default option. Unfortunately, there are no hard-and-fast rules for suggesting the number of partitions that should be processed in parallel. However, Analysis Services 2005 performance guide has recommendations you can use as the starting point for tuning processing performance.
Keep in mind that the Process Data option does not build indexes or aggregations. Although these objects can have a dramatic impact on query performance, they're not required. Data can be queried as soon as it is loaded -- aggregations and indexes can be built later.
As long as you don't un-process the partition before running Process Data against it, users can continue querying the partition. Queries will reflect the data that was available in the partition before processing commenced.
Process Indexes option builds bitmap indexes and aggregations after data is loaded into the cube. Bitmap indexes help Analysis Services find data requested by queries within partition files. Aggregations are separate objects that contain summary values. Aggregation files are generally smaller than partition files. Therefore, a query scanning an aggregation file can be considerably faster than the corresponding query, which must scan a single partition or multiple partitions. If your database has no aggregations – perhaps during the initial development and data quality assurance stages of your project – the Process Indexes option is very fast. Expect CPU resource utilization to increase during this phase since Analysis Services has to examine and index data.
On the other hand, if you're only building indexes and not aggregations, memory use will be minimal. If you have defined aggregations for your measure group (or partition), then Process Indexes can be resource intensive from SQL Server memory, CPU and disk perspective. The more aggregations you want to build, the longer it will take to process indexes. Fortunately, building aggregations doesn't preclude your users from running queries; however, these queries won't be
able to take advantage of any aggregations being computed until Process Indexes phase is complete. If your server has limited memory and you're processing multiple partitions in parallel, you should separate Process Data and Process Indexes phases in order to process objects quicker. If you have plenty of memory, then separating the two phases will have no performance impact.
It is not uncommon to see multiple instances of Analysis Services running on the same server. If you're careful, you can process multiple databases on the same instance or separate instances in parallel. For example, once you have loaded data into Database A, you could start building indexes on it while you un-process Database B and run Process Data against it.
Process Clear Indexes option is not documented and isn't available in SSMS. But you can use this option with an XMLA script to drop existing aggregations and indexes. This is a very useful option for experimenting with various aggregation schemes without preventing users from executing queries. Much like Process Clear, Process Clear Indexes is very fast and doesn't use a lot of hardware resources.
Process Full option is a combination of Process Data and Process Indexes options. This is the easiest option for re-processing the entire cube, measure group or partition. As mentioned earlier, if your server has limited memory and you have numerous aggregations to build, then you should split the Process Data and Process Indexes into separate transactions. Much like with Process Data option, users can continue querying existing data while the measure group or partition is being processed. Analysis Services will create new data and aggregation files in addition to existing files. Once processing is complete, existing files will be replaced with new files.
Process Add option allows you to add data to an existing partition based on a relational table, view or query. Within SQL Server Management Studio, this option is referred to as Process Incremental. It is a very useful option for adding data to an existing partition while allowing users to continue to query the existing data. In the background, MSAS will build a temporary partition – including data and aggregations – and then merge it with the existing partition.
The temporary partition includes the string "INCR_UPDATE_TEMP" in its name. This temporary partition is being processed fully, so Analysis Services must read data and aggregation files. The hardware resource usage will be similar to what you see with Process Full -- but likely on a smaller scale, depending on how much data you're adding to an existing partition. Note that incremental processing allows you to specify an alternate source for partition data. It doesn't have to be the same table, view or query as what you had defined in Business Intelligence Development Studio.
Process Default option brings the partition to fully processed state. If the partition is unprocessed, then Process Default would be equivalent to running Process Full. If you have already executed Process Data, and you subsequently execute Process Default, this would be equivalent to running Process Indexes. Therefore, Process Default can use different hardware resources depending on the type of processing that's necessary to ensure that the object is fully processed.
ABOUT THE AUTHOR
Baya Pavliashvili is a DBA manager with HealthStream, the leader in online healthcare education. In this role, he oversees database operations supporting over one million users. His primary areas of expertise include performance tuning, replication and data warehousing. You can reach Pavliashvili at email@example.com.