Home > SQL Server Tips > Data Warehousing and Business Intelligence > Utilize SSAS for data predictions and classification using Excel
SQL Server Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

DATA WAREHOUSING AND BUSINESS INTELLIGENCE

Utilize SSAS for data predictions and classification using Excel


Mark Kromer
12.23.2008
Rating: -3.00- (out of 5)


Expert advice on database administration
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


There is a tendency to think of predictive analytics, data mining and data classification as a set of technologies that belong solely in the domain of mathematicians, statisticians and other PhD-level professionals. But Microsoft has emphasized use of these powerful data mining mechanisms in applications and solutions by everyday users through the Analysis Services (SSAS) engine in SQL Server 2005. This has been the case since SQL Server 2000 and has become a core strategy in the SQL Server product team.

I will focus on how to take advantage of those powerful aspects of SSAS from Microsoft Office Excel 2007. The advantage to this approach is that you, and your users, can remain comfortable in a common tool (Excel) and access data and data mining features of SSAS directly from Excel easily without the need to write code, access SSAS or write DMX scripts. Using Excel as a tool for data mining and predictions is very powerful; however, we will only scratch the surface of SSAS capabilities. This tip is targeted at users who are not typically creating data mining algorithms and SAS/SCL programmers. The inner workings of the SQL Server data mining algorithms will not be examined in this tip.

Let's look at this through two different use cases. First, we'll start with segmentation in the context of a marketing campaign. The goal is to predict whether the recipient of a targeted mailing campaign will buy the product being advertised. To create this segmentation model, we can use the SSAS out-of-the-box data mining algorithm: the Clustering Algorithm for segmentation. Segmentation algorithms divide data into groups, or clusters, of items that have similar properties. This is a level of understanding of customer behaviors and buying tendencies that results from complex research and analysis....


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



RELATED CONTENT
Data Warehousing and Business Intelligence
Recommended practices for SQL Server Analysis Services aggregations
Creating and managing SQL Server Analysis Services partitions
Sharing SSAS cube data in SharePoint with PerformancePoint Server 2007
New data profiling tools in SQL Server 2008
SQL Server 2008 Integration Services delivers new features
Parent-child dimensions in SQL Server 2005 with Analysis Services MDX
Change data capture in SQL Server 2008 improves BI reporting accuracy
Manage traces in SQL Server 2005 Analysis Services with XMLA commands
Tutorial: SQL Server 2005 Analysis Services
Open SSIS packages without validation using these SQL properties

SQL Server Business Intelligence (BI) and Data Warehousing
Using the Pivot transformation in SQL Server Integration Services
DBA career paths could lead to business intelligence
Are data warehouses made for the cloud?
Q&A: Business intelligence gets a facelift in SQL Server 2008 R2
Project Gemini gets a new name, Madison earns buzz
Speed up reports in SQL Server Reporting Services with caching
Data Transformation Services vs. SSIS: The key differences
Using package configurations in SQL Server Integration Services (SSIS)
How SQL Server 2008 components impact SharePoint implementations
Achieving high availability and disaster recovery with SharePoint databases

SQL Server Interoperability
How to create a SQL Server linked server to DB2
Export SQL Server data to an Excel file using SSIS and Visual Studio
Performance tuning for SQL Server 2005 and Exchange running on SBS
Custom VB.Net scripting in SQL Server Integration Services
Can SQL Server 2000 work on Windows 2003 platform?
Query to search text in old DTS packages in SQL Server?
Handle slowly changing dimensions with SSIS 2005 wizard
Run DTS packages within SQL Server Integration Services
SQL Server Integration Services how-to
Compatibility of SQL Server 2005 and 2000 coexisting

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
data aggregation  (SearchSQLServer.com)
data preprocessing  (SearchSQLServer.com)
data warehouse  (SearchSQLServer.com)
FileMaker  (SearchSQLServer.com)
GIS  (SearchSQLServer.com)
MOLAP  (SearchSQLServer.com)
pivot table  (SearchSQLServer.com)
Quiz: SQL Server 2000  (SearchSQLServer.com)
SQL  (SearchSQLServer.com)
T-SQL  (SearchSQLServer.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary


Not all businesses perform this level of research, but all companies would benefit from it. Because of the complexities of putting together such a research methodology, this is often not performed. Therefore, these techniques can be quite helpful because you can use SSAS as the data mining engine to perform clustering and segmentation from Excel without writing any code.

To start, you should have your data set in a Microsoft Excel worksheet. This can be tables of raw static data in Excel sheets, or by pulling data into Excel from a database source and running the analysis from Excel. In order to perform analysis on the data, you will follow these steps:

  1. Point the Excel data mining add-in to your SSAS instance
  2. Select which data mining algorithm to use
  3. Select the data to be used to for analysis and training
  4. Select the specific attributes and parameters for the data mining engine from your Excel data

[IMAGE]
Click on image for larger version
Figure 1: Select data in your spreadsheet and use the Data Mining tab to model, prepare and validate data from your spreadsheet. You can also clean-up temporary models from the server here, establish server connection and execute data mining scenarios.

Before you can make use of this feature in Excel, you must first obtain the Excel 2007 add-in (download the installation files). This will give you the ability to use a SQL Server 2005 Analysis Service engine as the analysis tool for Excel tables from Microsoft Office 2007. Excel will automatically create temporary data mining models in SSAS, train the model and provide the results of the data mining to you using the SSAS visual tools found in SQL Server BI Development Studio (Visual Studio). These user-interface tools will be familiar to data mining developers who already use these tools in Visual Studio where you are able navigate and adjust the parameters such as in the following examples:

[IMAGE]
Click on image for larger version
Figure 2: Browsing the Cluster results allows you to look at this network diagram and interact with it by selecting the weak to strong links and the variable used for the shading. In this sample, we are looking at the "Population" attribute for clustering.

[IMAGE]
Click on image for larger version
Figure 3: If you select the Cluster Characteristics tab on the interactive data mining results from the cluster scenario, you can view the probability for each variable.

To set this up from Microsoft Excel 2007:

  1. Run the setup installation files for the data mining add-in from the link provided above
  2. After the install, navigate to the new program group added called Microsoft SQL Server 2005 DM Add-ins
  3. Launch the Server Configuration utility to establish the connection to a SQL Server Analysis Services 2005 (SSAS) server where the models will be stored and generated.

The sample data used for this tip came from the sample data that is provided from the add-in installation. I highly recommend that you open that spreadsheet, read through the introduction and instructions and play with the sample data and data mining algorithms on the Data Mining ribbon. You can find the spreadsheet in the program group created by the installation utility for the add-in.

[IMAGE]
Click on image for larger version

[IMAGE]
Click on image for larger version

We'll look at one other scenario through the Excel 2007 add-in for SSAS. Here, we'll use predictions through the Decision Trees algorithm to look at future customer buying patterns. The Decision Tree algorithm takes the data sets you supply and finds trends to determine what will happen in the future. The future predictions will appear graphically as dotted-lines, and can be copied to Excel from the data mining user interface. This is an important step because the output UI is only used to interact with during the generation of the model.

[IMAGE]
Click on image for larger version
Figure 6: The result of the model generation is a look at the data graph and the future predictions to the right of the vertical line break.

You can then select the Decision Tree tab to view the split in the trees based on the prediction attributes to see the results of the model generation and the nodes that were created during processing.

[IMAGE]
Click on image for larger version
Figure 7: The "tree" structure in the Decision Tree

During the selection of attributes from the source spreadsheet, you can provide this data quickly and easily to business decision- makers directly in Excel to examine marketing campaigns, sales trends, future promotions, etc. Keep in mind that the screen shots in these samples are interactive and you can navigate through them as they pop-up by activating the Excel data mining add-in. But when presenting reports back to the business, use the "Copy to Excel" button and present the static results.

ABOUT THE AUTHOR:   

[IMAGE]Mark Kromer is an internationally recognized leader in the business intelligence and database communities, having authored articles and blogs for TDWI, Microsoft, MSDN and Technet. Mark has more than 15 years' experience in IT, focusing on database technologies and solutions, including spending the past 2.5 years with Microsoft as the lead product manager for business intelligence customer solutions. In the past, he was responsible for training classes, partner readiness and product development of BI initiatives at Microsoft, Agilent Technologies and Cingular Wireless. He is currently the lead product manager for enterprise BI reporting solutions at Primavera Systems. Check out his blog at http://blogs.msdn.com/makrom



Rate this Tip
To rate tips, you must be a member of SearchSQLServer.com.
Register now to start rating these tips. Log in if you are already a member.


Submit a Tip




DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



SQL Server Development - .NET, C#, T-SQL, Visual Basic
HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts