Manage Learn to apply best practices and optimize your operations.

Data mining project: Exploring data

Part of any data mining project is learning about and understanding the nature of your data. Learn ways to view your data with SQL Server 2005 data mining.

Part of any data mining project is learning about and understanding the nature of your data. By leveraging controls from Office Web Components (OWC), the DSV Designer provides the functionality to explore your data in four different views. By right-clicking a DSV table and selecting Explore Data, you can view your data as a table, pivot table, simple charts, and a pivot chart. By default, the Explore Data component will sample 5,000 points of your data. The option buttons in the upper left of the Explore Data window allow you to change this setting to a maximum of 20,000 points, due to a limitation of the OWC controls.

The tabular views allow you to do a simple exploration of your data. Clever use of the pivot table will allow you to get a better understanding of the data by arranging, slicing, and aggregating your data in different ways. For example, by exploring a pivot chart on the Customers table, you can find the average Age and its standard deviation by using the Bedrooms column we created previously. (See Figure 3.8.) This is possible because we are exploring the DSV table and not the actual source table as it is in the data. We can explore Named Queries in the DSV in precisely the same manner.

The graphical exploration offers a page of simple column, pie, and bar charts plus a pivot chart view. Using the simple charts you can see histograms and pies of various attributes side by side. If your data is continuous, the chart divides the continuous range into 10 buckets. The pivot chart, on the contrary, provides a wealth of graphing controls to analyze your data, from your standard line, bar, scatter, column, and pie charts, to more exotic types such as doughnut and radar charts, as shown in Figure 3.9.

The pivot table and chart have many configuration options to help you analyze your data in different ways. Many of these are available through the context-sensitive Command and Options dialog box, from the Context menu, or from embedded toolbars. Virtually every aspect of the tables and charts can be modified, either by graphically selecting the object or by using the selection box on the General tab of the dialog. Describing the full feature set of the OWC could easily fill another book and mastering the OWC controls for best value will take some practice, but with experience you will be able to manipulate the controls to find exactly the right view for you. Additionally, the pivot table and chart are linked, so you can switch back and forth, make edits, and see how the change affected the other view.

Figure 3.8 Exploring data with the pivot chart

Figure 3.9 Radar chart showing Age by Bedrooms and Home Ownership

One additional feature of the pivot chart that is important for data exploration is graphical named query generation. By clicking the Named Query button on the toolbar, you can use elements of the chart to define a named query. For instance you could select only those homeowners with one bedroom and renters with four or more on the chart and add them to the query. This named query becomes like any other and can be used as a source for exploring data.

Note: Although the Explore Data window looks like other document windows, it is, in fact, a tool window like the Solution Explorer and Properties windows. By right-clicking the Window tab you can change the Explore Data window into a floating or dockable window. You can also open up many Explore Data windows on different DSV tables to display charts and tables side by side.

Click here to return to the complete list of book excerpts from Chapter 3, 'Using SQL Server 2005 data mining,' from the book Data Mining with SQL Server 2005.

Dig Deeper on SQL Server Business Intelligence Strategies

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.