Book Excerpt

Using the SQL Server Data Mining Designer

The Data Mining Designer is where most of the work with your models will take place. It contains the facilities for editing, browsing, querying, and comparing models distributed in five panes, the Mining Structure pane, the Mining Models pane, the Mining Model Viewer pane, the Mining Accuracy Chart pane, and the Mining Model Prediction pane. In this section, we will focus on editing only, with the Mining Structure and Mining Models panes, leaving the other functionality for the "Using Your Models" section of this chapter.

Working with the Mining Structure Editor

The Mining Structure Editor allows you to add and remove columns to and from your mining structure and also set the properties of each mining structure column. You need to use the Structure Editor to perform modeling operations that are not possible in the Mining Model Wizard. Even if the Wizard-generated structure suits your needs, it is a good idea to inspect your mining structure after running the wizard to be sure that it contains everything you want. The three components of the Mining Structure Editor are the structure tree, the DSV view, and the Properties window, as shown in Figure 3.16. Clicking columns in the structure tree or the DSV will cause their properties to show in the Properties window. Dragging columns from the DSV to the structure tree will add the column to the mining model. Right-clicking almost any item produces a menu providing a list of actions to be performed on that item. You can browse your data and explore your DSV. Note that to edit the DSV you must return to the DSV Designer.

The editor allows you to perform certain operations that are not available using the wizard. You cannot perform the following modeling operations in the wizard:

  • Set discretization properties: The Mining Model Wizard will automatically set the content type of continuous columns to DISCRETIZED if the selected algorithm does not support continuous attributes. However, you may want to be able to specify the discretization method or the number of buckets that the attribute will be divided into. To set these values, click the column you want to modify, and set the DiscretizationMethod and DiscretizationBuckets properties in the Properties window. The various discretization methods that are available are described in Chapter 2.

  • Add a column to the structure multiple times: Since a structure may contain multiple models, you may find that you want to model a specific column in different ways to see how it impacts the results. For example, you may want to compare how Age influences the results when it's treated as a continuous value or when it's discretized into three, five, or seven buckets. You can add a column multiple times to the model simply by dragging the source column from the DSV to the structure tree. Each column will be given an incremental name, which you can change in the Properties window.

  • Create hierarchies among your attributes: In Chapter 2, we described the RELATED TO construct in DMX that allows you to create hierarchies. In the Structure Editor, you can use the Classified Column property of a structure column to perform the same functionality. For example, to create a Product Category-Product Name hierarchy, set the Classified Column property of the Product Category column to Product Name.


    Figure 3.16 Mining Structure Editor in the Data Mining Designer

  • Add a column that has its values looked up in another table: If your data is normalized, it is likely that the table you want to mine contains foreign keys to lookup tables instead of the actual data labels you want to appear in the model. Using the Structure Editor, you can add these columns directly to your model. To add such a column, right-click the table that has the foreign key and select Show Related Tables. The table that contains the primary key will appear in the DSV area of the Structure Editor. If the relationship is not specified in the DSV, you will have to return to the DSV editor and add it. From this new table, drag the column that contains the data name you want to use in your model to the structure tree.

    For example, assume that you were mining a Purchases table that had a Product Id column and another table Products that related Product Id to Product Name. To create a structure that used the Product Name column you would right-click the Purchases table and select Show Related Tables to introduce the Products table. Then you would click and drag the Product Name column to your structure.

    Tip: The easiest way to add a nested table to a mining structure is to drag the key of the nested table to the structure tree. When you drop the key, the editor will automatically create a nested table with the key you specified.

    Working with the Mining Models Editor

    The Mining Models Editor is where you can create multiple models on the structure. You use the editor to set the algorithm and algorithm parameters for each model, as well as to select which columns are used in each model, how they are used, and setting algorithm-specific modeling flags on each column. The editor consists of a table showing the models and their columns and again the Properties window, as shown in Figure 3.17. This configuration allows you to quickly see how each column is used in each model and set properties appropriately.


    Figure 3.17 Mining Models Editor in the SQL BI Development Studio

    Setting Column Properties

    Setting the usage of each column involves selecting the column and choosing whether you want this column to be used as Input, Predict, PredictOnly, or Ignore. Selecting Input is analogous to selecting the Input column in the Mining Model Wizard. Selecting PredictOnly is analogous to selecting the Predictable column in the wizard. Generally this usage implies that this column will not be used as input for other predictable targets; however, you should check the chapters on each algorithm for the exact semantics. Selecting Predict is analogous to selecting both the Input and Predictable columns in the wizard and implies that the column will be treated both as an input for other targets and as a target in and of itself. Again, the exact semantics should be checked for each algorithm. Setting a column to Ignore creates a model that simply does not contain the specified column. Additional, model-specific properties for each column can be set in the Properties window.

    Tip: You can multiple select columns by using the Shift and Ctrl keys. This allows you to set properties on many columns at the same time. Since setting a column to Ignore removes it from the model, you can set Ignore only in the column grid and not in the Properties window. Also, you cannot change the usage of any ignored columns in the Properties window.

    To change multiple columns to or from Ignore, select them in the column grid using Shift or Ctrl and then press the F2 key to show the combo box where you can make the change.

    You can also change the properties of structure columns in the Mining Models Editor by selecting the column and setting the properties in the Properties window. In this editor, you can also change the properties of multiple structure columns simultaneously, using the same multiple selection methods you used on mining columns.

    Setting Model Properties

    To edit and set the algorithm parameters, select the mining model itself. You do this by selecting the column header so that the model properties are shown in the Properties window. Here, you can set the name and algorithm used, annotate your model with a description, enable drill-through if supported, and set the algorithm parameters. Setting the algorithm parameters brings up a dialog box showing you the available parameters with defaults and descriptions, as shown in Figure 3.18. See the chapter on each individual algorithm for detailed discussions of each parameter.


    Figure 3.18 Algorithm Parameters Dialog for Microsoft_Decision_Trees

    Tip: n easier way to set algorithm parameters is to right-click the column header for the desired algorithm and select Set Algorithm Parameters.

    Creating Additional Models

    To create multiple models on the same structure, simply select the New Mining Model item in the Mining Model menu. You are prompted to enter a name and select the algorithm, and the editor creates a new mining model in the structure. The new model you create maintains the settings of the model that you selected when you chose the creation operation. The new model will use the same inputs, have the same targets, and use any additional settings that are compatible with the new algorithm you selected.

    Creating and Modifying Additional Models

    We are going to set all of the case-level columns of the MovieClick model to be predictable, then we will create a new model in the same structure, using the Microsoft Naive Bayes algorithm

    1. Switch to the Mining Models Editor by clicking the Mining Models icon in the View column of the Data Mining Designer.

    2. Click the row for the Age column in the Movie Trees model column to select the table cell.

    3. While holding down the Shift key, click the row for the Bedrooms column in the Movie Trees model. Now the usage for both Age and Bedrooms should be selected.

    4. Press the F2 key to bring up a drop-down box where you can select the type of usage.

    5. Change the usage to Predict. All the selected columns' usage will change.

    6. Use Ctrl, Shift, and F2 to select the remaining case-level columns and change their usage to Predict.

    7. Select New Mining Model from the Mining Model menu.

    8. In the dialog that appears, type Movie Bayes for the name and select Microsoft Naive Bayes as the algorithm. Click OK.

    9. A warning appears that the Age, Num Bathrooms, Num Children, and Num TVs columns contain a content type not supported by the new algorithm and asks if you want to continue. Click Yes, and the new model will be created with those columns set to Ignore.

    At this point, you have a mining structure containing two models. The new model has all of the same columns set to Predict as the first, with the exception of the columns that had a content type not supported by the selected algorithm, which were set to Ignore.

    Click here to return to the complete list of book excerpts from Chapter 3, 'Using SQL Server 2005 data mining,' from the book Data Mining with SQL Server 2005.

    This was first published in July 2006

  • There are Comments. Add yours.

     
    TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

    REGISTER or login:

    Forgot Password?
    By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
    Sort by: OldestNewest

    Forgot Password?

    No problem! Submit your e-mail address below. We'll send you an email containing your password.

    Your password has been sent to: