There is a tendency to think of predictive analytics, data mining and data classification as a set of technologies...
that belong solely in the domain of mathematicians, statisticians and other PhD-level professionals. But Microsoft has emphasized use of these powerful data mining mechanisms in applications and solutions by everyday users through the Analysis Services (SSAS) engine in SQL Server 2005. This has been the case since SQL Server 2000 and has become a core strategy in the SQL Server product team.
I will focus on how to take advantage of those powerful aspects of SSAS from Microsoft Office Excel 2007. The advantage to this approach is that you, and your users, can remain comfortable in a common tool (Excel) and access data and data mining features of SSAS directly from Excel easily without the need to write code, access SSAS or write DMX scripts. Using Excel as a tool for data mining and predictions is very powerful; however, we will only scratch the surface of SSAS capabilities. This tip is targeted at users who are not typically creating data mining algorithms and SAS/SCL programmers. The inner workings of the SQL Server data mining algorithms will not be examined in this tip.
Let's look at this through two different use cases. First, we'll start with segmentation in the context of a marketing campaign. The goal is to predict whether the recipient of a targeted mailing campaign will buy the product being advertised. To create this segmentation model, we can use the SSAS out-of-the-box data mining algorithm: the Clustering Algorithm for segmentation. Segmentation algorithms divide data into groups, or clusters, of items that have similar properties. This is a level of understanding of customer behaviors and buying tendencies that results from complex research and analysis. Not all businesses perform this level of research, but all companies would benefit from it. Because of the complexities of putting together such a research methodology, this is often not performed. Therefore, these techniques can be quite helpful because you can use SSAS as the data mining engine to perform clustering and segmentation from Excel without writing any code.
To start, you should have your data set in a Microsoft Excel worksheet. This can be tables of raw static data in Excel sheets, or by pulling data into Excel from a database source and running the analysis from Excel. In order to perform analysis on the data, you will follow these steps:
- Point the Excel data mining add-in to your SSAS instance
- Select which data mining algorithm to use
- Select the data to be used to for analysis and training
- Select the specific attributes and parameters for the data mining engine from your Excel data
Click on image for larger version
Figure 1: Select data in your spreadsheet and use the Data Mining tab to model, prepare and validate data from your spreadsheet. You can also clean-up temporary models from the server here, establish server connection and execute data mining scenarios.
Before you can make use of this feature in Excel, you must first obtain the Excel 2007 add-in (download the installation files). This will give you the ability to use a SQL Server 2005 Analysis Service engine as the analysis tool for Excel tables from Microsoft Office 2007. Excel will automatically create temporary data mining models in SSAS, train the model and provide the results of the data mining to you using the SSAS visual tools found in SQL Server BI Development Studio (Visual Studio). These user-interface tools will be familiar to data mining developers who already use these tools in Visual Studio where you are able navigate and adjust the parameters such as in the following examples:
Click on image for larger version
Figure 2: Browsing the Cluster results allows you to look at this network diagram and interact with it by selecting the weak to strong links and the variable used for the shading. In this sample, we are looking at the "Population" attribute for clustering.
Click on image for larger version
Figure 3: If you select the Cluster Characteristics tab on the interactive data mining results from the cluster scenario, you can view the probability for each variable.
To set this up from Microsoft Excel 2007:
- Run the setup installation files for the data mining add-in from the link provided above
- After the install, navigate to the new program group added called Microsoft SQL Server 2005 DM Add-ins
- Launch the Server Configuration utility to establish the connection to a SQL Server Analysis Services 2005 (SSAS) server where the models will be stored and generated.
The sample data used for this tip came from the sample data that is provided from the add-in installation. I highly recommend that you open that spreadsheet, read through the introduction and instructions and play with the sample data and data mining algorithms on the Data Mining ribbon. You can find the spreadsheet in the program group created by the installation utility for the add-in.
We'll look at one other scenario through the Excel 2007 add-in for SSAS. Here, we'll use predictions through the Decision Trees algorithm to look at future customer buying patterns. The Decision Tree algorithm takes the data sets you supply and finds trends to determine what will happen in the future. The future predictions will appear graphically as dotted-lines, and can be copied to Excel from the data mining user interface. This is an important step because the output UI is only used to interact with during the generation of the model.
You can then select the Decision Tree tab to view the split in the trees based on the prediction attributes to see the results of the model generation and the nodes that were created during processing.
During the selection of attributes from the source spreadsheet, you can provide this data quickly and easily to business decision- makers directly in Excel to examine marketing campaigns, sales trends, future promotions, etc. Keep in mind that the screen shots in these samples are interactive and you can navigate through them as they pop-up by activating the Excel data mining add-in. But when presenting reports back to the business, use the "Copy to Excel" button and present the static results.
ABOUT THE AUTHOR
Mark Kromer is an internationally recognized leader in the business intelligence and database communities, having authored articles and blogs for TDWI, Microsoft, MSDN and Technet. Mark has more than 15 years' experience in IT, focusing on database technologies and solutions, including spending the past 2.5 years with Microsoft as the lead product manager for business intelligence customer solutions. Check out his blog at blogs.msdn.cpm/makrom