Manage Learn to apply best practices and optimize your operations.

Processing: Training data models on SQL Server

Data mining processes all data models in a structure in parallel on a single data read by creating a compressed cache of the data. Outlined here are several processing options.

In Chapter 2, we described how to train a model using an INSERT INTO statement. Using the tools to train models on the server is called processing. Analysis Services. Data Mining has the ability to process all the models in a structure in parallel on a single data read. It does this by creating a compressed cache of the data that is used to train each of the models in the structure. This functionality requires several processing options to control exactly what is processed when, and how to clean up after you're done. The mechanism is described in more detail in Chapter 13.

Note: Before processing a newly created or edited structure or model, you must first send the object to the server. In immediate mode, simply saving your work deploys the object. However, in offline mode, you must first deploy the project. To do so select Deploy Solution from the Build menu. When you use the default settings, deploying the project will also cause any objects in the project to be processed.

Mining Models and Structures can have three states in regard to processing: processed, partially processed, and unprocessed. A processed object is completely finished and ready to go. Partially processed is an ambiguous state that indicates that part of the object is processed and other parts are not. This may be acceptable for your circumstances — for example, you may have a mining structure with several mining models. At the current time, you may only want to process one of the models within — the structure would then be partially processed. Unprocessed implies that the object contains absolutely no data whatsoever.

The processing options for Mining Structures and Mining Models are as follows:

Process Full: Process Full causes the object to be completely reprocessed from the source data. When this option is sent to a mining structure, the structure is processed and then each model within is processed in parallel. When sent to a model, the source data is only read if the structure has not been processed.

Process Default: Processing an object with Process Default causes the server to do whatever it takes to bring the object to a fully processed state. For example, if the object is already processed, the server will perform no action or if you edit a model within a structure and send Process Default to the structure, the server will process that one model without rereading the source data.

Unprocess: Unprocess causes the object to be completely unprocessed, dropping all data associated with that object. Sending this command to a structure causes any caches to be cleared and contained models to be unprocessed.

Process Structure: Process Structure is only valid on a mining structure and causes the structure to read and cache the source data without processing the contained models. Executing subsequent Process Full and Process Default commands on the models will process information from this cache.

Process Clear Structure: Using this option on a structure causes the structure to drop any cached source data while leaving the contained models processed. This greatly reduces the disk footprint of your mining structure at the cost of having to reread the data on the next process command. Additionally, drill-through functionality on any contained models will be disabled until the models are reprocessed.

Processing the MovieClick Mining Structure

Here, we will process the MovieClick Mining Structure.

In Immediate mode:

1. Save your structure by clicking the Save button on the toolbar.

2. Select Process Mining Structure and All Models from the Mining Model menu, or click the Process button on the Designer toolbar.

3. Click Run in the processing dialog.

In Offline mode:

1. Select the Deploy option from the Build menu. By default, deploying the solution will process all objects.

2. If the default has changed, deploy the solution and follow the instructions for Immediate mode.

At this point, the Processing Progress dialog will appear, providing status information for the processing operation. When the process is complete, you can view details about each step, including the processing time.

Click here to return to the complete list of book excerpts from Chapter 3, 'Using SQL Server 2005 data mining,' from the book Data Mining with SQL Server 2005.

Dig Deeper on SQL Server Business Intelligence Strategies

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.