As a named calculation is a virtual column on a DSV table, a named query is nothing but a virtual view on your data source. Again, this allows you to change the data you are mining without making any changes to your original data. Even when modifications to the source are possible, creating Named Queries directly in your DSV is quick and easy and allows you to maintain these views alongside the models where they are used instead of polluting your databases with single-use objects.
The Named Query editor provides a standard query builder user interface to assist in creating queries which is very useful for complicated joins. Note that Named Queries can only be built upon database tables and not other DSV objects.
Typical queries that are useful for data mining are filtering, joins, and sampling.
SELECT * FROM [Movies] WHERE [Movie] != 'Star Wars'
SELECT [CustomerID], [Movie] FROM [Movies] WHERE [Movie] IN (SELECT DISTINCT [Movie] FROM [Movies] GROUP BY [Movie] HAVING COUNT([Movie]) > 20)
SELECT Customers.*, Education.[Education Level] FROM Customers JOIN Education ON Customers.[Education Id] = Education.[Education Id]
SELECT * FROM CUSTOMERS TABLESAMPLE (30 PERCENT) REPEATABLE (1)
Note: Analysis Services issues several queries to the named query, so when using TABLESAMPLE, it is necessary to use the REPEATABLE clause to guarantee Analysis Services retrieves the same rows each time. Also, since there is no efficient method for querying the complement of a sample, this method is useful for reducing the data size, but not for splitting data into training and testing sets. For that operation you should use SQL Server Integration Services (SSIS).
Creating a Named Query Based on the Customers Table
We want to create a named query based on the Customers table in the DSV that contains only homeowners. Since this table contains a named calculation, we have to manually add the calculation into the query.
1. Double-click the Bedrooms Named Calculation in the Customers table to open the Named Calculation dialog and copy the SQL text.
2. Right-click in the DSV Designer, and select New Named Query.
3. Enter the query name Homeowners, and optionally enter a description.
4. Click the Add Table button, select the Customers table, and close the Add Table dialog box.
5. Select the * (All Customers) check box on the Customers table.
6. To add the calculated column:
Your named query can now be explored from the context menu using Explore Data, like any other DSV table.
Figure 3.6 Homeowners named query
Organizing the DSV
When a named query is created, any relationships that the original table had are not carried over to the new table. This means that to use those relationships, you have to recreate them by dragging from the foreign key to the primary key for each table relationship. When you complete the plethora of crossing relationships in your DSV, it will be very difficult to read and comprehend, as in Figure 3.7.
You can remedy this problem by using DSV diagrams. The DSV Designer allows you to create any number of diagrams allowing you to select a subset of DSV tables and arrange them as you see fit. To create these diagrams, you click the New Diagram button, name the diagram, and drag tables from the list on the left to it. You can also add tables to the view by right-clicking a table already in the view and selecting Show Related Tables. After adding tables, you can clean up your arrangement by right-clicking in the design area and selecting Arrange Tables.
Figure 3.7 DSV with many relationships in the DSV Designer
Named queries and named calculations rely on the processing power of your relational server. The additional resources required to process the generated queries will have an impact on the initial processing time of your mining models. If this increase becomes too severe, and you will be frequently repeating these queries, you should consider materializing these views on the relational server.
Click here to return to the complete list of book excerpts from Chapter 3, 'Using SQL Server 2005 data mining,' from the book Data Mining with SQL Server 2005.