Book Excerpt

Working with named calculations

Named calculations are additional virtual columns on the tables in your DSV. This allows you to mine derived information in your data without having to change your source data. Anamed calculation consists of name, a SQL expression containing the calculation, and an optional description. The calculation can be any valid SQL expression. If you are not a SQL expert, here are some types of expressions that are useful in many data mining projects.


Figure 3.5 Completed MovieClick data source view

Arithmetic Operations

Standard SQL supports +, -, * , /, and % (modulo). For example you could create a Profit named calculation with the following formula:
[Sales Price] – [Item Cost]

Mathematical Functions

Mathematical functions are useful, especially when data in a column does not follow a uniform distribution. The SQL functions ABS, LOG, SIGN, and SQRT are particularly useful. Here are two examples:

  • To flatten out an exponentially increasing variable:
    LOG([Sales Quantity])
  • To create a flag for over/under budget quarters:
    SIGN([Actual Expenses] – [Budgeted Expenses])

    Compositing Expressions

    Often, the hypothesis you want to test depends on a variable that is a combination of two of the variables you already have. For example, it may not be interesting that a person is married or has children, but the combination of the two may provide valuable information. A composite expression for this situation could look like this:
    [Marital Status] + ' ' + [Has Children]

    CASE Expressions

    CASE expressions are an extremely flexible way to create meaningful variables for data mining. The CASE expression allows you to assign results based on the evaluation of one or more conditions. Useful applications of CASE are to change value labels, manually discretize columns, reduce the number of valid states, and convert an attribute from a nested table to the case level.

  • To change value labels:
  •      CASE [Category]
                   WHEN 1 THEN 'Food'
                   WHEN 2 THEN 'Beverage'
                   WHEN 3 THEN 'Goods'
         END CASE
    
  • Manually discretize a column:
  •      CASE
                 WHEN [Age] < 20 THEN 'Under 20'
                 WHEN [Age] <= 30 THEN 'Between 20 and 30'
                 WHEN [Age] <= 40 THEN 'Between 30 and 40'
                 ELSE 'Over 40'
         END
    
  • To reduce the number of valid states:
         CASE [Marital Status]
                 WHEN 'Married' THEN [Marital Status]
                 WHEN 'Never Married' THEN [Marital Status]
                 ELSE 'Other'
         END
    
  • To convert an attribute from a nested table to a case table:
  •      CASE
               WHEN EXISTS
                      (SELECT [Movie] FROM [Movies]
                             WHERE [Movie]='Star Wars' AND
                                                [Movies].[CustomerID]=[Customers].[CustomerID])
                             THEN 'True'
                 ELSE 'False'
         END
    

    This would be done, for instance, when you wanted to convert a nested attribute to a case-level attribute. Note that if you still want to use the nested table in the model, you will have to use a named query to filter the attribute from the nested table, as described in the next section.

    Creating a Named Calculation on the Customers Table

    To create a named calculation to discretize and reduce the number of states in the Num Bedrooms column:

    1. Right-click the Customers table and select Create a named calculation.
    2. Enter the calculation name Bedrooms and optionally enter a description.
    3. Enter the following expression:

          CASE
                  WHEN [Num Bedrooms] = 1 THEN 'One'
                  WHEN [Num Bedrooms] <= 3 THEN 'Two or Three'
                  WHEN [Num Bedrooms] >= 4 THEN 'Four or more'
                 ELSE 'None'
          END
    

    Upon closing the dialog box, the DSV Designer will validate your expression and return any applicable errors. Once you have successfully created your calculation you can see the results by right-clicking the table and selecting Explore Data.

    Click here to return to the complete list of book excerpts from Chapter 3, 'Using SQL Server 2005 data mining,' from the book Data Mining with SQL Server 2005.

    This was first published in July 2006

    There are Comments. Add yours.

     
    TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

    REGISTER or login:

    Forgot Password?
    By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
    Sort by: OldestNewest

    Forgot Password?

    No problem! Submit your e-mail address below. We'll send you an email containing your password.

    Your password has been sent to: