Manage Learn to apply best practices and optimize your operations.

Working with named queries

A named query is a virtual view on your data source that allows you to change the data you are mining in SQL Server without making changes to your original data.

As a named calculation is a virtual column on a DSV table, a named query is nothing but a virtual view on your data source. Again, this allows you to change the data you are mining without making any changes to your original data. Even when modifications to the source are possible, creating Named Queries directly in your DSV is quick and easy and allows you to maintain these views alongside the models where they are used instead of polluting your databases with single-use objects.

The Named Query editor provides a standard query builder user interface to assist in creating queries which is very useful for complicated joins. Note that Named Queries can only be built upon database tables and not other DSV objects.

Typical queries that are useful for data mining are filtering, joins, and sampling.

  • To filter rows based on column values:
  • SELECT * FROM [Movies] WHERE [Movie] != 'Star Wars'

  • To filter out unpopular items from a nested table:
         SELECT [CustomerID], [Movie] FROM [Movies]
         WHERE [Movie] IN
                    (SELECT DISTINCT
                 FROM [Movies] GROUP BY [Movie]
                          HAVING COUNT([Movie]) > 20)
  • To join information from a foreign table:
                    Customers.*, Education.[Education Level]
           FROM Customers JOIN Education
           ON Customers.[Education Id] = Education.[Education Id]
  • To sample rows from a SQL Server database:
                   TABLESAMPLE (30 PERCENT)
                   REPEATABLE (1)

    Note: Analysis Services issues several queries to the named query, so when using TABLESAMPLE, it is necessary to use the REPEATABLE clause to guarantee Analysis Services retrieves the same rows each time. Also, since there is no efficient method for querying the complement of a sample, this method is useful for reducing the data size, but not for splitting data into training and testing sets. For that operation you should use SQL Server Integration Services (SSIS).

    Creating a Named Query Based on the Customers Table

    We want to create a named query based on the Customers table in the DSV that contains only homeowners. Since this table contains a named calculation, we have to manually add the calculation into the query.

    1. Double-click the Bedrooms Named Calculation in the Customers table to open the Named Calculation dialog and copy the SQL text.
    2. Right-click in the DSV Designer, and select New Named Query.
    3. Enter the query name Homeowners, and optionally enter a description.
    4. Click the Add Table button, select the Customers table, and close the Add Table dialog box.
    5. Select the * (All Customers) check box on the Customers table.
    6. To add the calculated column:

  • Enter a comma after Customers.* in the query window, and paste the contents of the clipboard into the second row of the Column column.
  • Type as Bedrooms after the pasted text.
  • The query will expand to replace Customers.* with the list of all the table columns.
  • 7. To filter on homeowners:
  • In the grid control, find the row containing [Home Ownership] in the Column column.
  • Clear the checkmark in the second row of the Output column.
  • Enter ='Own' in the criteria column
  • 8. Your final query should look like Figure 3.6. Click OK to close the dialog box.

    Your named query can now be explored from the context menu using Explore Data, like any other DSV table.

    Figure 3.6 Homeowners named query

    Organizing the DSV

    When a named query is created, any relationships that the original table had are not carried over to the new table. This means that to use those relationships, you have to recreate them by dragging from the foreign key to the primary key for each table relationship. When you complete the plethora of crossing relationships in your DSV, it will be very difficult to read and comprehend, as in Figure 3.7.

    You can remedy this problem by using DSV diagrams. The DSV Designer allows you to create any number of diagrams allowing you to select a subset of DSV tables and arrange them as you see fit. To create these diagrams, you click the New Diagram button, name the diagram, and drag tables from the list on the left to it. You can also add tables to the view by right-clicking a table already in the view and selecting Show Related Tables. After adding tables, you can clean up your arrangement by right-clicking in the design area and selecting Arrange Tables.

    Figure 3.7 DSV with many relationships in the DSV Designer

    Named queries and named calculations rely on the processing power of your relational server. The additional resources required to process the generated queries will have an impact on the initial processing time of your mining models. If this increase becomes too severe, and you will be frequently repeating these queries, you should consider materializing these views on the relational server.

    Click here to return to the complete list of book excerpts from Chapter 3, 'Using SQL Server 2005 data mining,' from the book Data Mining with SQL Server 2005.

    Dig Deeper on SQL Server Business Intelligence Strategies