Home > Checklist: Data mining 101 in SQL Server 2005
Checklist:
EMAIL THIS

Checklist: Data mining 101 in SQL Server 2005

21 Aug 2006 | By Laurence Schwarz, Contributor

Expert advice on database administration
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google

Data. We have more of it now than ever before. And in the last 10 years, technology has provided us with the tools to collect ever more data. The question is what do we do with it? What does it mean?

The data we've collected tells us we sold x amount of widgets in product line y in the second quarter of 2006 in market z, but so what? What does it all mean in the grand scheme of things?

This is where data mining comes in. There has been a substantial leap in the data mining capabilities in the SQL Server 2005 edition compared to its predecessors. Data mining algorithms are the tools that provide insight into the "why" of what your clients, customers, data subjects and so on have done. The tools also help foresee, mathematically speaking, what these groups are likely to do. If you have an inclination that your business can benefit from SQL Server 2005 data mining features, use this checklist as a bellwether to determine if this is the direction you really want to move in.
 Checklist: SQL Server 2005 data mining
SQL Server version
Keep in mind that the data mining features are only available in the Standard & Enterprise versions of SQL Server 2005.
Defining the data source
Although the data mining algorithms in SQL Server 2005 are actually part of SQL Server 2005 Analysis Services, you are not required to pre-build an Analysis Services cube to mine from. The SQL Server 2005 data mining algorithms can also use a relational database as a source rather than a multidimensional one.
Understanding the toolset
   • There are nine data mining algorithms in SQL Server 2005, and each is used for a different purpose. Some do, however, cross over. The areas they cover include: What is the business problem at hand? Are you interested in greater insight into why you gain and lose customers? How about analyzing current borrower history to determine a new loan applicant's level of risk?
   • Fortunately, there is a Data Mining Wizard. This is the starting point for creating a new mining structure. In addition, the Data Mining Designer, which allows you to modify existing mining structures or create new ones based on already-existing structures, creates mining structures from scratch. Use the wizard, and be glad that it's there. Keep in mind, as it is with all things .NET, there are APIs that can be accessed directly if you are absolutely, fundamentally opposed to using a wizard.
   • Data mining extensions (DMX) -- DMX is the query language of data mining. It is part of the OLE DB specification and allows you to add custom data mining functionality to your applications.
Training the model
The basic premise in SQL Server 2005 is based upon data. That data needs to be trained. For the most part, there are three basic components to making data mining happen:
  1. First, create a data mining model. For this you will need input columns, predictable columns and an algorithm.
  2. Second, train the model. This is the process in which sample data passes through the algorithm and, depending on the efficiency of the algorithm, it will begin to recognize patterns within the data.
  3. Lastly is the prediction in which the new data that is the actual data to be analyzed is passed through the algorithm. Using rules it created during the training phase, the algorithm will make predictions on the outcome of data similar to it.
Keep in mind that training is an ongoing process, and as new production data comes in, you will want to continually re-train your model(s) to give the algorithms as broad of a dataset as possible to make predictions. It is a time-consuming process, but usually only occurs on a weekly or monthly basis.
Learning resources
SearchSQLServer.com recently posted some excerpts on July 16 from one of the best in-depth resources on the subject available right now. Do yourself a favor and pick up a copy of ZhaoHui Tang's and Jamie MacLennan's Data Mining with SQL Server 2005. The newly updated (as of July 2006) SQL Server 2005 Books Online also provides a wealth of knowledge on the subject. Don't forget to search online as well!

Summary

In summary, data mining is one of the most esoteric features of SQL Server 2005. It is probably one of the most difficult to understand unless you have a background in higher math or a former career as a statistician. I can also tell you from personal experience that the data mining features of SQL Server 2005 are far greater improved than its predecessor. If you don't happen to have a substantial background in higher math, but are fortunate enough to have folks in your organization who do, you would be best served to "buddy up" with them.


About the author: Laurence Schwarz is a Business Intelligence Consultant with Quilogy. For the last 10 years Schwarz has been involved with Microsoft SQL Server, in the roles of Database Administrator, Database Engineer and Production Support Analyst.


More information on SearchSQLServer.com
  • Resources for business intelligence and data mining in SQL Server 2005
  • Using SQL Server 2005 data mining
  • Learning Center: Data mining



    Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



    RELATED CONTENT
    SQL Server Business Intelligence (BI) and Data Warehousing
    Programming report generation with SQL Server Reporting Services 2008
    Using the Pivot transformation in SQL Server Integration Services
    DBA career paths could lead to business intelligence
    Are data warehouses made for the cloud?
    Q&A: Business intelligence gets a facelift in SQL Server 2008 R2
    Project Gemini gets a new name, Madison earns buzz
    Speed up reports in SQL Server Reporting Services with caching
    Data Transformation Services vs. SSIS: The key differences
    Using package configurations in SQL Server Integration Services (SSIS)
    How SQL Server 2008 components impact SharePoint implementations

    .NET Development for SQL Server
    Creating Windows PowerShell scripts to manage SQL Server 2008 instances
    Manipulate column names in a SQL Server table
    Code to restore SQL Server databases in VB.NET
    Custom VB.Net scripting in SQL Server Integration Services
    Retrieve images from SQL Server and store in VB.Net
    Connect to SQL Server database with Visual Basics
    Top 10 SQL Server development questions
    Developing CLR database objects: 10 tips, 10 minutes
    CLR architecture
    CLR stored procedures
    .NET Development for SQL Server Research

    RELATED GLOSSARY TERMS
    Terms from Whatis.com − the technology online dictionary
    data aggregation  (SearchSQLServer.com)
    data preprocessing  (SearchSQLServer.com)
    data warehouse  (SearchSQLServer.com)
    FileMaker  (SearchSQLServer.com)
    GIS  (SearchSQLServer.com)
    MOLAP  (SearchSQLServer.com)
    pivot table  (SearchSQLServer.com)
    Quiz: SQL Server 2000  (SearchSQLServer.com)
    SQL  (SearchSQLServer.com)
    T-SQL  (SearchSQLServer.com)

    RELATED RESOURCES
    2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
    Search Bitpipe.com for the latest white papers and business webcasts
    Whatis.com, the online computer dictionary




  • Secure SQL - Data Security for Your Database
    HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersIT Downloads
    About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
    SEARCH 
    TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

    TechTarget Corporate Web Site  |  Media Kits  |  Site Map




    All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
      TechTarget - The IT Media ROI Experts