The Data Mining Designer is where most of the work with your models will take place. It contains the facilities for editing, browsing, querying, and comparing models distributed in five panes, the Mining Structure pane, the Mining Models pane, the Mining Model Viewer pane, the Mining Accuracy Chart pane, and the Mining Model Prediction pane. In this section, we will focus on editing only, with the Mining Structure and Mining Models panes, leaving the other functionality for the "Using Your Models" section of this chapter.
Working with the Mining Structure Editor
The Mining Structure Editor allows you to add and remove columns to and from your mining structure and also set the properties of each mining structure column. You need to use the Structure Editor to perform modeling operations that are not possible in the Mining Model Wizard. Even if the Wizard-generated structure suits your needs, it is a good idea to inspect your mining structure after running the wizard to be sure that it contains everything you want. The three components of the Mining Structure Editor are the structure tree, the DSV view, and the Properties window, as shown in Figure 3.16. Clicking columns in the structure tree or the DSV will cause their properties to show in the Properties window. Dragging columns from the DSV to the structure tree will add the column to the mining model. Right-clicking almost any item produces a menu providing a list of actions to be performed on that item. You can browse your data and explore your DSV. Note that to edit the DSV you must return to the DSV Designer.
The editor allows you to perform certain operations that are not available using the wizard. You cannot perform the following modeling operations in the wizard:
Figure 3.16 Mining Structure Editor in the Data Mining Designer
For example, assume that you were mining a Purchases table that had a Product Id column and another table Products that related Product Id to Product Name. To create a structure that used the Product Name column you would right-click the Purchases table and select Show Related Tables to introduce the Products table. Then you would click and drag the Product Name column to your structure.
Tip: The easiest way to add a nested table to a mining structure is to drag the key of the nested table to the structure tree. When you drop the key, the editor will automatically create a nested table with the key you specified.
Working with the Mining Models Editor
The Mining Models Editor is where you can create multiple models on the structure. You use the editor to set the algorithm and algorithm parameters for each model, as well as to select which columns are used in each model, how they are used, and setting algorithm-specific modeling flags on each column. The editor consists of a table showing the models and their columns and again the Properties window, as shown in Figure 3.17. This configuration allows you to quickly see how each column is used in each model and set properties appropriately.
Figure 3.17 Mining Models Editor in the SQL BI Development Studio
Setting Column Properties
Setting the usage of each column involves selecting the column and choosing whether you want this column to be used as Input, Predict, PredictOnly, or Ignore. Selecting Input is analogous to selecting the Input column in the Mining Model Wizard. Selecting PredictOnly is analogous to selecting the Predictable column in the wizard. Generally this usage implies that this column will not be used as input for other predictable targets; however, you should check the chapters on each algorithm for the exact semantics. Selecting Predict is analogous to selecting both the Input and Predictable columns in the wizard and implies that the column will be treated both as an input for other targets and as a target in and of itself. Again, the exact semantics should be checked for each algorithm. Setting a column to Ignore creates a model that simply does not contain the specified column. Additional, model-specific properties for each column can be set in the Properties window.
Tip: You can multiple select columns by using the Shift and Ctrl keys. This allows you to set properties on many columns at the same time. Since setting a column to Ignore removes it from the model, you can set Ignore only in the column grid and not in the Properties window. Also, you cannot change the usage of any ignored columns in the Properties window.
To change multiple columns to or from Ignore, select them in the column grid using Shift or Ctrl and then press the F2 key to show the combo box where you can make the change.
You can also change the properties of structure columns in the Mining Models Editor by selecting the column and setting the properties in the Properties window. In this editor, you can also change the properties of multiple structure columns simultaneously, using the same multiple selection methods you used on mining columns.
Setting Model Properties
To edit and set the algorithm parameters, select the mining model itself. You do this by selecting the column header so that the model properties are shown in the Properties window. Here, you can set the name and algorithm used, annotate your model with a description, enable drill-through if supported, and set the algorithm parameters. Setting the algorithm parameters brings up a dialog box showing you the available parameters with defaults and descriptions, as shown in Figure 3.18. See the chapter on each individual algorithm for detailed discussions of each parameter.
Figure 3.18 Algorithm Parameters Dialog for Microsoft_Decision_Trees
Tip: n easier way to set algorithm parameters is to right-click the column header for the desired algorithm and select Set Algorithm Parameters.
Creating Additional Models
To create multiple models on the same structure, simply select the New Mining Model item in the Mining Model menu. You are prompted to enter a name and select the algorithm, and the editor creates a new mining model in the structure. The new model you create maintains the settings of the model that you selected when you chose the creation operation. The new model will use the same inputs, have the same targets, and use any additional settings that are compatible with the new algorithm you selected.
Creating and Modifying Additional Models
We are going to set all of the case-level columns of the MovieClick model to be predictable, then we will create a new model in the same structure, using the Microsoft Naive Bayes algorithm
1. Switch to the Mining Models Editor by clicking the Mining Models icon in the View column of the Data Mining Designer.
2. Click the row for the Age column in the Movie Trees model column to select the table cell.
3. While holding down the Shift key, click the row for the Bedrooms column in the Movie Trees model. Now the usage for both Age and Bedrooms should be selected.
4. Press the F2 key to bring up a drop-down box where you can select the type of usage.
5. Change the usage to Predict. All the selected columns' usage will change.
6. Use Ctrl, Shift, and F2 to select the remaining case-level columns and change their usage to Predict.
7. Select New Mining Model from the Mining Model menu.
8. In the dialog that appears, type Movie Bayes for the name and select Microsoft Naive Bayes as the algorithm. Click OK.
9. A warning appears that the Age, Num Bathrooms, Num Children, and Num TVs columns contain a content type not supported by the new algorithm and asks if you want to continue. Click Yes, and the new model will be created with those columns set to Ignore.
At this point, you have a mining structure containing two models. The new model has all of the same columns set to Predict as the first, with the exception of the columns that had a content type not supported by the selected algorithm, which were set to Ignore.
Click here to return to the complete list of book excerpts from Chapter 3, 'Using SQL Server 2005 data mining,' from the book Data Mining with SQL Server 2005.