Use cube partitions to improve Analysis Services performance
Baya Pavliashvili, Contributor
Microsoft Analysis Services (MSAS) is a great tool for enabling business stakeholders to analyze
data from various perspectives -- and its growing popularity can be primarily attributed to its
exceptional performance.
Using pre-calculated summary value (aggregations),
MSAS helps you generate reports within seconds whereas running similar reports from a relational
database management system (RDBMS)
can take hours. But even with a multitude of aggregations, MSAS can struggle to provide optimal
performance when dealing with huge volumes of data. In this
Premium Access
Register now for unlimited access to our premium content across our network of over 70 information Technology web sites.
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States.
Privacy
Dig Deeper
-
People who read this also read...
-
This was first published in June 2006
tip, I will discuss the advantages of
using cube partitions – a powerful mechanism for improving query performance and reducing downtime
associated with cube processing.
Partition benefits
Partitions offer many potential benefits summarized here:
1. Shorter downtime: A portion of a typical data warehouse is relatively volatile and
needs to be processed often. Other portions are relatively static. For example, in a sales cube,
you're likely to change the current year's data nightly (or perhaps even hourly), but sales from
previous years might change only occasionally to account for merchandise returns and exchanges. You
can create a partition for current year sales and another partition containing all sales from
previous years. If your warehouse tracks sales that occurred in the past 10 years, processing only
the current partition may be 10 times quicker than processing the entire cube.
2. Better query performance: Queries that summarize data over 10 years could take
considerably longer than those that only search through the current year data. If you have
appropriate partitions, Analysis Services only has to scan a small subset of all data to return
query results, which can translate into dramatic performance improvements compared to queries
running against a cube with a single (default) partition.
3. More focused aggregations: Both MSAS 2000 and MSAS 2005 allow you to have different
sets of aggregations for each partition. The partition that is queried frequently could benefit
from additional aggregations, which in turn could warrant better performance. Partition(s) that
aren't used heavily can be processed less frequently -- perhaps monthly or quarterly -- with
considerably fewer aggregations. Usage-based optimization wizard can be executed for each partition
to determine any additional aggregations that can benefit performance of queries actually executed
against a given partition.
4. Customized storage and processing settings: Frequently accessed partitions might
benefit from proactive caching and ROLAP storage. On the other hand, other forms of storage and
processing might be better for less frequently queried partitions.
5. Distributed query and processing load across servers: MSAS allows you to create remote
partitions; a remote partition resides on a server different from its parent cube. This way the
queries that affect the remote partition are processed on a server separate from its parent cube,
allowing you to take advantage of additional processing power.
6. Partitions processed in parallel: With MSAS 2005, you can process multiple partitions
in parallel on a server that has multiple processors. This can further reduce the total cube
processing time.
Partition drawbacks
I mentioned that MSAS 2000 allows you to include the same data in multiple partitions, which may
skew calculations on your reports. Fortunately, Microsoft fixed that in MSAS 2005. Perhaps what's
more important is that partitions require more work. You must determine the appropriate data sets
typically represented on reports and then partition data accordingly. Also, you have to decide
which storage mode and aggregation scheme makes sense for each partition. More partitions mean
additional maintenance overhead because you may have to merge partitions over time.
Let's suppose most of your analytical reports query current month's data; you could have a
partition for the current month that you refresh daily (or perhaps even hourly), a partition for
all previous months of the current year and a partition for all previous years. Once the current
month is over, you'll have to merge the "current" partition with the current year's partition and
create a new partition for the current month. As I mentioned earlier, MSAS only allows you to merge
two partitions with identical aggregations -- so you might have to rebuild your aggregations before
merging.
Even with the drawbacks, partitions present an invaluable tool for improving the performance of
your analytical queries. And, more importantly, they reduce cube processing time and increase cube
availability.
Figuring how many aggregations you need
The number of aggregations you create for your cubes depends on your users' analytical needs.
More aggregations typically translate into better query performance, but they also use more disk
space and extend cube processing time.
Disk space, typically, is not a huge concern for two reasons:
1. Storage costs decrease daily and costs of building a data warehouse will far outweigh storage
costs.
2. MSAS offers the efficient multidimensional
OLAP (MOLAP) storage option: A relational warehouse typically measured in a few gigabytes would
only require several hundred megabytes to store aggregations.
On the other hand, long cube processing times present a more serious limitation. Full cube
processing requires downtime, meaning no one can execute queries against cubes being processed.
Longer processing time means lengthier downtime. As your warehouse grows larger, so will the cube
processing time. An eight-hour downtime window (during non-business hours) may be acceptable for
some environments, but users won't be happy when that window opens to 16 hours. Furthermore,
downtime is not feasible at all if your business provides analytical solutions to customers across
the globe, 24 hours a day, seven days a week.
MSAS also reaches a point of diminishing performance improvements with additional aggregations.
Creating one-third of all possible aggregations typically provides the best performance; any
aggregations beyond that achieve marginal performance improvement at best.
Implementing partitions
So how do you keep cube processing time to a minimum and boost analytical query performance? One
option is to partition cubes based on certain criteria commonly used to filter queries. A partition
is simply a portion of the full cube that can be limited by some criteria.
For example, if analytical queries normally limit output by customers' marital status, you could
create two partitions -- one for married customers and one for single customers. Cubes are more
commonly partitioned by time dimension members, as is the case when you create a separate partition
for each year or each quarter.
Partitions are available in both MSAS 2000 and MSAS 2005, but they are only supported in the
Enterprise Editions. This fact shouldn't come as a surprise because partitions typically make sense
with large-scale data warehouses that would be more likely to use Enterprise Edition. Note also
that partitions are seamless to cube users: They execute queries against the cube and don't have to
specify partition names that contain particular data sets. Instead, MSAS determines which partition
to poll at query runtime.
By default, each cube has a single partition containing all data in the fact table. To create a
new partition, you can invoke the Partition Wizard and specify a portion of the data you wish to
include in the partition. For example, if you have a time dimension that spans years 2000 through
2006, you can create a separate partition for each year.
You can also partition your cubes using multiple dimensions. For instance, the sales cube may be
partitioned by promotion category, customers' education level and yearly income. Note that you can
create different partitions based on the same fact table or different fact tables. For example, you
could have a fact table for each year of sales, or perhaps a view for each year of sales data based
on a single fact table, that can be used to build each partition.
One limitation of MSAS 2000 is that it allows you to create a partition containing data also
included with other partitions. Suppose we create a partition for customers who make more than 150K
per year; customers who make more than 150K will be counted twice -- once in the default partition
and second in the new partition. MSAS doesn't automatically remove such data from the default
partition, nor do you get a warning that you may be counting some data twice. To avoid incorrect
calculations, you must edit the default partition. In addition, there is no way to exclude
dimension members when designing partitions; you couldn't advise the Partition Wizard to include
all income levels except those greater than $150K.
MSAS 2005 allows you to create partitions based on fact tables found in your data source views.
You can also write a SQL query that can be used as the source for your partitions. MSAS 2005 warns
you if you are using a fact table for an existing partition, and prevents you from counting the
same data twice.
Analysis Services allows you to merge partitions, which is helpful when the current partitioning
scheme is no longer effective. Suppose you have a partition for year 2006. In 2007, you may wish to
combine this partition with the "historical" partition and create a new partition for 2007. You can
merge two partitions if both have the same aggregations, same structure and belong to the same
cube. In order to ensure that two partitions have the same aggregations with MSAS 2000, you can
copy the aggregation design from an existing partition when you create a new partition. MSAS 2005
lets you copy aggregation design from one partition to another after partitions have been
created.
Note that Analysis Services calculates the total number of dimension members and fact table size
when dimensions and cubes are first created. Initial size of the dimension can be considerably
smaller than the size of the same dimension six months later. However, MSAS 2000 never recalculates
the size of the dimensions or facts. You should modify dimension level member counts and fact table
size properties manually within the dimension editor and cube editor. Furthermore, MSAS 2000
doesn't have a way of determining the total number of members per partition; and you cannot specify
the total number of rows for the partition. If the member counts are incorrect, chances are your
aggregations will not be optimal. You should use the partition aggregation utility to specify
each partition's fact table size.
About the author: Baya Pavliashvili is a DBA manager overseeing database
operations supporting over one million users. Baya's primary areas of expertise include performance
tuning, replication and data warehousing.
Disclaimer:
Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.
Join the conversationComment
Share
Comments
Results
Contribute to the conversation