Utilize SSAS for data predictions and classification using Excel
There is a tendency to think of predictive analytics, data mining and data classification as a
set of technologies that belong solely in the domain of mathematicians, statisticians and other
PhD-level professionals. But Microsoft has emphasized use of these powerful data mining mechanisms
in applications and solutions by everyday users through the Analysis
Services (SSAS) engine in SQL Server 2005. This has been the case since SQL Server 2000 and has
become a core strategy in the SQL Server product team.
I will focus on how to take advantage of those powerful aspects of SSAS from Microsoft Office
Excel 2007. The advantage to this approach is that you, and your users, can remain
Premium Access
Register now for unlimited access to our premium content across our network of over 70 information Technology web sites.
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States.
Privacy
Dig Deeper
-
People who read this also read...
This was first published in December 2008
comfortable in a
common tool (Excel) and access data and data mining features of SSAS directly from Excel easily
without the need to write code, access SSAS or write DMX scripts. Using Excel as a tool for data
mining and predictions is very powerful; however, we will only scratch the surface of SSAS
capabilities. This tip is targeted at users who are not typically creating data mining algorithms
and SAS/SCL programmers. The inner workings of the SQL Server data mining algorithms will not be
examined in this tip.
Let's look at this through two different use cases. First, we'll start with segmentation in the
context of a marketing campaign. The goal is to predict whether the recipient of a targeted mailing
campaign will buy the product being advertised. To create this segmentation model, we can use the
SSAS out-of-the-box data mining algorithm: the Clustering
Algorithm for segmentation. Segmentation algorithms divide data into groups, or clusters, of
items that have similar properties. This is a level of understanding of customer behaviors and
buying tendencies that results from complex research and analysis. Not all businesses perform this
level of research, but all companies would benefit from it. Because of the complexities of putting
together such a research methodology, this is often not performed. Therefore, these techniques can
be quite helpful because you can use SSAS as the data mining engine to perform clustering and
segmentation from Excel without writing any code.
To start, you should have your data set in a Microsoft Excel worksheet. This can be tables of
raw static data in Excel sheets, or by pulling data into Excel from a database source and running
the analysis from Excel. In order to perform analysis on the data, you will follow these steps:
- Point the Excel data mining add-in to your SSAS instance
- Select which data mining algorithm to use
- Select the data to be used to for analysis and training
- Select the specific attributes and parameters for the data mining engine from your Excel
data
Click on image for larger version
Figure 1: Select data in your spreadsheet and use the Data Mining tab to model, prepare and
validate data from your spreadsheet. You can also clean-up temporary models from the server here,
establish server connection and execute data mining scenarios.
Before you can make use of this feature in Excel, you must first obtain the Excel 2007 add-in (download the installation files). This will give you the
ability to use a SQL Server 2005 Analysis Service engine as the analysis tool for Excel tables from
Microsoft Office 2007. Excel will automatically create temporary data mining models in SSAS, train
the model and provide the results of the data mining to you using the SSAS visual tools found in SQL Server
BI Development Studio (Visual Studio). These user-interface tools will be familiar to data
mining developers who already use these tools in Visual Studio where you are able navigate and
adjust the parameters such as in the following examples:
Click on image for larger version
Figure 2: Browsing the Cluster results allows you to look at this network diagram and
interact with it by selecting the weak to strong links and the variable used for the shading. In
this sample, we are looking at the "Population" attribute for clustering.
Click on image for larger version
Figure 3: If you select the Cluster Characteristics tab on the interactive data mining
results from the cluster scenario, you can view the probability for each variable.
To set this up from Microsoft Excel 2007:
- Run the setup installation files for the data mining add-in from the link provided above
- After the install, navigate to the new program group added called Microsoft SQL Server 2005 DM
Add-ins
- Launch the Server Configuration utility to establish the connection to a SQL Server Analysis
Services 2005 (SSAS) server where the models will be stored and generated.
The sample data used for this tip came from the sample data that is provided from the add-in
installation. I highly recommend that you open that spreadsheet, read through the introduction and
instructions and play with the sample data and data mining algorithms on the Data Mining ribbon.
You can find the spreadsheet in the program group created by the installation utility for the
add-in.
Click on image for larger version
Click on image for larger version
We'll look at one other scenario through the Excel 2007 add-in for SSAS. Here, we'll use
predictions through the Decision Trees algorithm to look at future customer buying patterns. The
Decision Tree algorithm takes the data sets you supply and finds trends to determine what will
happen in the future. The future predictions will appear graphically as dotted-lines, and can be
copied to Excel from the data mining user interface. This is an important step because the output
UI is only used to interact with during the generation of the model.
Click on image for larger version
Figure 6: The result of the model generation is a look at the data graph and the future
predictions to the right of the vertical line break.
You can then select the Decision Tree tab to view the split in the trees based on the prediction
attributes to see the results of the model generation and the nodes that were created during
processing.
Click on image for larger version
Figure 7: The "tree" structure in the Decision Tree
During the selection of attributes from the source spreadsheet, you can provide this data
quickly and easily to business decision- makers directly in Excel to examine marketing campaigns,
sales trends, future promotions, etc. Keep in mind that the screen shots in these samples are
interactive and you can navigate through them as they pop-up by activating the Excel data mining
add-in. But when presenting reports back to the business, use the "Copy to Excel" button and
present the static results.
ABOUT THE AUTHOR
Mark Kromer is an internationally recognized leader in the business intelligence and
database communities, having authored articles and blogs for TDWI, Microsoft, MSDN and Technet.
Mark has more than 15 years' experience in IT, focusing on database technologies and solutions,
including spending the past 2.5 years with Microsoft as the lead product manager for business
intelligence customer solutions. Check out his blog at blogs.msdn.cpm/makrom
Disclaimer:
Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.
Join the conversationComment
Share
Comments
Results
Contribute to the conversation