Home > Using the SQL Server Data Mining Designer
Book Excerpt:
EMAIL THIS

Using the SQL Server Data Mining Designer

16 Jul 2006 | Wiley

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   

The Data Mining Designer is where most of the work with your models will take place. It contains the facilities for editing, browsing, querying, and comparing models distributed in five panes, the Mining Structure pane, the Mining Models pane, the Mining Model Viewer pane, the Mining Accuracy Chart pane, and the Mining Model Prediction pane. In this section, we will focus on editing only, with the Mining Structure and Mining Models panes, leaving the other functionality for the "Using Your Models" section of this chapter.

Working with the Mining Structure Editor

The Mining Structure Editor allows you to add and remove columns to and from your mining structure and also set the properties of each mining structure column. You need to use the Structure Editor to perform modeling operations that are not possible in the Mining Model Wizard. Even if the Wizard-generated structure suits your needs, it is a good idea to inspect your mining structure after running the wizard to be sure that it contains everything you want. The three components of the Mining Structure Editor are the structure tree, the DSV view, and the Properties window, as shown in Figure 3.16. Clicking columns in the structure tree or the DSV will cause their properties to show in the Properties window. Dragging columns from the DSV to the structure tree will add the column to the mining model. Right-clicking almost any item produces a menu providing a list of actions to be performed on that item. You can browse your data and explore your DSV. Note that to edit the DSV you must return to the DSV Designer.

The editor allows you to perform certain operations that are not available using the wizard. You cannot perform the following modeling operations in the wizard:

  • Set discretization properties: The Mining Model Wizard will automatically set the content type of continuous columns to DISCRETIZED if the selected algorithm does not support continuous attributes. However, you may want to be able to specify the discretization method or the number of buckets that the attribute will be divided into. To set these values, click the column you want to modify, and set the DiscretizationMethod and DiscretizationBuckets properties in the Properties window. The various discretization methods that are available are described in Chapter 2.

  • Add a column to the structure multiple times: Since a structure may contain multiple models, you may find that you want to model a specific column in different ways to see how it impacts the results. For example, you may want to compare how Age influences the results when it's treated as a continuous value or when it's discretized into three, five, or seven buckets. You can add a column multiple times to the model simply by dragging the source column from the DSV to the structure tree. Each column will be given an incremental name, which you can change in the Properties window.

  • Create hierarchies among your attributes: In Chapter 2, we described the RELATED TO construct in DMX that allows you to create hierarchies. In the Structure Editor, you can use the Classified Column property of a structure column to perform the same functionality. For example, to create a Product Category-Product Name hierarchy, set the Classified Column property of the Product Category column to Product Name.


    Figure 3.16 Mining Structure Editor in the Data Mining Designer

  • Add a column that has its values looked up in another table: If your data is normalized, it is likely that the table you want to mine contains foreign keys to lookup tables instead of the actual data labels you want to appear in the model. Using the Structure Editor, you can add these columns directly to your model. To add such a column, right-click the table that has the foreign key and select Show Related Tables. The table that contains the primary key will appear in the DSV area of the Structure Editor. If the relationship is not specified in the DSV, you will have to return to the DSV editor and add it. From this new table, drag the column that contains the data name you want to use in your model to the structure tree.

    For example, assume that you were mining a Purchases table that had a Product Id column and another table Products that related Product Id to Product Name. To create a structure that used the Product Name column you would right-click the Purchases table and select Show Related Tables to introduce the Products table. Then you would click and drag the Product Name column to your structure.

    Tip: The easiest way to add a nested table to a mining structure is to drag the key of the nested table to the structure tree. When you drop the key, the editor will automatically create a nested table with the key you specified.

    Working with the Mining Models Editor

    The Mining Models Editor is where you can create multiple models on the structure. You use the editor to set the algorithm and algorithm parameters for each model, as well as to select which columns are used in each model, how they are used, and setting algorithm-specific modeling flags on each column. The editor consists of a table showing the models and their columns and again the Properties window, as shown in Figure 3.17. This configuration allows you to quickly see how each column is used in each model and set properties appropriately.


    Figure 3.17 Mining Models Editor in the SQL BI Development Studio

    Setting Column Properties

    Setting the usage of each column involves selecting the column and choosing whether you want this column to be used as Input, Predict, PredictOnly, or Ignore. Selecting Input is analogous to selecting the Input column in the Mining Model Wizard. Selecting PredictOnly is analogous to selecting the Predictable column in the wizard. Generally this usage implies that this column will not be used as input for other predictable targets; however, you should check the chapters on each algorithm for the exact semantics. Selecting Predict is analogous to selecting both the Input and Predictable columns in the wizard and implies that the column will be treated both as an input for other targets and as a target in and of itself. Again, the exact semantics should be checked for each algorithm. Setting a column to Ignore creates a model that simply does not contain the specified column. Additional, model-specific properties for each column can be set in the Properties window.

    Tip: You can multiple select columns by using the Shift and Ctrl keys. This allows you to set properties on many columns at the same time. Since setting a column to Ignore removes it from the model, you can set Ignore only in the column grid and not in the Properties window. Also, you cannot change the usage of any ignored columns in the Properties window.

    To change multiple columns to or from Ignore, select them in the column grid using Shift or Ctrl and then press the F2 key to show the combo box where you can make the change.

    You can also change the properties of structure columns in the Mining Models Editor by selecting the column and setting the properties in the Properties window. In this editor, you can also change the properties of multiple structure columns simultaneously, using the same multiple selection methods you used on mining columns.

    Setting Model Properties

    To edit and set the algorithm parameters, select the mining model itself. You do this by selecting the column header so that the model properties are shown in the Properties window. Here, you can set the name and algorithm used, annotate your model with a description, enable drill-through if supported, and set the algorithm parameters. Setting the algorithm parameters brings up a dialog box showing you the available parameters with defaults and descriptions, as shown in Figure 3.18. See the chapter on each individual algorithm for detailed discussions of each parameter.


    Figure 3.18 Algorithm Parameters Dialog for Microsoft_Decision_Trees

    Tip: n easier way to set algorithm parameters is to right-click the column header for the desired algorithm and select Set Algorithm Parameters.

    Creating Additional Models

    To create multiple models on the same structure, simply select the New Mining Model item in the Mining Model menu. You are prompted to enter a name and select the algorithm, and the editor creates a new mining model in the structure. The new model you create maintains the settings of the model that you selected when you chose the creation operation. The new model will use the same inputs, have the same targets, and use any additional settings that are compatible with the new algorithm you selected.

    Creating and Modifying Additional Models

    We are going to set all of the case-level columns of the MovieClick model to be predictable, then we will create a new model in the same structure, using the Microsoft Naive Bayes algorithm

    1. Switch to the Mining Models Editor by clicking the Mining Models icon in the View column of the Data Mining Designer.

    2. Click the row for the Age column in the Movie Trees model column to select the table cell.

    3. While holding down the Shift key, click the row for the Bedrooms column in the Movie Trees model. Now the usage for both Age and Bedrooms should be selected.

    4. Press the F2 key to bring up a drop-down box where you can select the type of usage.

    5. Change the usage to Predict. All the selected columns' usage will change.

    6. Use Ctrl, Shift, and F2 to select the remaining case-level columns and change their usage to Predict.

    7. Select New Mining Model from the Mining Model menu.

    8. In the dialog that appears, type Movie Bayes for the name and select Microsoft Naive Bayes as the algorithm. Click OK.

    9. A warning appears that the Age, Num Bathrooms, Num Children, and Num TVs columns contain a content type not supported by the new algorithm and asks if you want to continue. Click Yes, and the new model will be created with those columns set to Ignore.

    At this point, you have a mining structure containing two models. The new model has all of the same columns set to Predict as the first, with the exception of the columns that had a content type not supported by the selected algorithm, which were set to Ignore.

    Click here to return to the complete list of book excerpts from Chapter 3, 'Using SQL Server 2005 data mining,' from the book Data Mining with SQL Server 2005.

    Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



    RELATED CONTENT
    SQL Server Business Intelligence (BI) and Data Warehousing
    Programming report generation with SQL Server Reporting Services 2008
    Using the Pivot transformation in SQL Server Integration Services
    DBA career paths could lead to business intelligence
    Are data warehouses made for the cloud?
    Q&A: Business intelligence gets a facelift in SQL Server 2008 R2
    Project Gemini gets a new name, Madison earns buzz
    Speed up reports in SQL Server Reporting Services with caching
    Data Transformation Services vs. SSIS: The key differences
    Using package configurations in SQL Server Integration Services (SSIS)
    How SQL Server 2008 components impact SharePoint implementations

    Microsoft SQL Server 2005
    End of life comes for SQL Server 2005 SP2, 2008
    SQL Server Reporting Services Fast Guide
    SQL Server Service Broker Tutorial and Reference Guide
    Tips for tuning SQL Server 2005 to improve reporting performance
    SQL Server consolidation: Why it's an optimization technique
    Parent-child dimensions in SQL Server 2005 with Analysis Services MDX
    Enforcing data integrity in a SQL Server database
    SSIS error message due to installation problem on SQL Server 2005
    Should you upgrade to SQL Server 2005 or SQL Server 2008?
    Basics for working with DATETIME and SMALLDATETIME in SQL Server 2005
    Microsoft SQL Server 2005 Research

    RELATED GLOSSARY TERMS
    Terms from Whatis.com − the technology online dictionary
    data aggregation  (SearchSQLServer.com)
    data preprocessing  (SearchSQLServer.com)
    data warehouse  (SearchSQLServer.com)
    FileMaker  (SearchSQLServer.com)
    GIS  (SearchSQLServer.com)
    MOLAP  (SearchSQLServer.com)
    pivot table  (SearchSQLServer.com)
    Quiz: SQL Server 2000  (SearchSQLServer.com)
    SQL  (SearchSQLServer.com)
    T-SQL  (SearchSQLServer.com)

    RELATED RESOURCES
    2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
    Search Bitpipe.com for the latest white papers and business webcasts
    Whatis.com, the online computer dictionary




  • Secure SQL - Data Security for Your Database
    HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
    About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
    SEARCH 
    TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

    TechTarget Corporate Web Site  |  Media Kits  |  Site Map




    All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
      TechTarget - The IT Media ROI Experts