Checklist: Data mining 101 in SQL Server 2005

Data mining tools allow you to understand and foresee client needs. This checklist will introduce you to substantially improved SQL Server 2005 data mining capabilities and offer basic data mining lessons.

Data. We have more of it now than ever before. And in the last 10 years, technology has provided us with the tools to collect ever more data. The question is what do we do with it? What does it mean?

The data we've collected tells us we sold x amount of widgets in product line y in the second quarter of 2006 in market z, but so what? What does it all mean in the grand scheme of things?

This is where data mining comes in. There has been a substantial leap in the data mining capabilities in the SQL Server 2005 edition compared to its predecessors. Data mining algorithms are the tools that provide insight into the "why" of what your clients, customers, data subjects and so on have done. The tools also help foresee, mathematically speaking, what these groups are likely to do. If you have an inclination that your business can benefit from SQL Server 2005 data mining features, use this checklist as a bellwether to determine if this is the direction you really want to move in.

 Checklist: SQL Server 2005 data mining
SQL Server version
Keep in mind that the data mining features are only available in the Standard & Enterprise versions of SQL Server 2005.
Defining the data source
Although the data mining algorithms in SQL Server 2005 are actually part of SQL Server 2005 Analysis Services, you are not required to pre-build an Analysis Services cube to mine from. The SQL Server 2005 data mining algorithms can also use a relational database as a source rather than a multidimensional one.
Understanding the toolset
   • There are nine data mining algorithms in SQL Server 2005, and each is used for a different purpose. Some do, however, cross over. The areas they cover include: What is the business problem at hand? Are you interested in greater insight into why you gain and lose customers? How about analyzing current borrower history to determine a new loan applicant's level of risk?
   • Fortunately, there is a Data Mining Wizard. This is the starting point for creating a new mining structure. In addition, the Data Mining Designer, which allows you to modify existing mining structures or create new ones based on already-existing structures, creates mining structures from scratch. Use the wizard, and be glad that it's there. Keep in mind, as it is with all things .NET, there are APIs that can be accessed directly if you are absolutely, fundamentally opposed to using a wizard.
   • Data mining extensions (DMX) -- DMX is the query language of data mining. It is part of the OLE DB specification and allows you to add custom data mining functionality to your applications.
Training the model
The basic premise in SQL Server 2005 is based upon data. That data needs to be trained. For the most part, there are three basic components to making data mining happen:
  1. First, create a data mining model. For this you will need input columns, predictable columns and an algorithm.
  2. Second, train the model. This is the process in which sample data passes through the algorithm and, depending on the efficiency of the algorithm, it will begin to recognize patterns within the data.
  3. Lastly is the prediction in which the new data that is the actual data to be analyzed is passed through the algorithm. Using rules it created during the training phase, the algorithm will make predictions on the outcome of data similar to it.
Keep in mind that training is an ongoing process, and as new production data comes in, you will want to continually re-train your model(s) to give the algorithms as broad of a dataset as possible to make predictions. It is a time-consuming process, but usually only occurs on a weekly or monthly basis.
Learning resources
SearchSQLServer.com recently posted some excerpts on July 16 from one of the best in-depth resources on the subject available right now. Do yourself a favor and pick up a copy of ZhaoHui Tang's and Jamie MacLennan's Data Mining with SQL Server 2005. The newly updated (as of July 2006) SQL Server 2005 Books Online also provides a wealth of knowledge on the subject. Don't forget to search online as well!

Summary

In summary, data mining is one of the most esoteric features of SQL Server 2005. It is probably one of the most difficult to understand unless you have a background in higher math or a former career as a statistician. I can also tell you from personal experience that the data mining features of SQL Server 2005 are far greater improved than its predecessor. If you don't happen to have a substantial background in higher math, but are fortunate enough to have folks in your organization who do, you would be best served to "buddy up" with them.


About the author: Laurence Schwarz is a Business Intelligence Consultant with Quilogy. For the last 10 years Schwarz has been involved with Microsoft SQL Server, in the roles of Database Administrator, Database Engineer and Production Support Analyst.


More information on SearchSQLServer.com
  • Resources for business intelligence and data mining in SQL Server 2005
  • Using SQL Server 2005 data mining
  • Learning Center: Data mining

  • This was last published in August 2006

    Dig Deeper on SQL Server Business Intelligence Strategies

    Start the conversation

    Send me notifications when other members comment.

    By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

    Please create a username to comment.

    -ADS BY GOOGLE

    SearchBusinessAnalytics

    SearchDataCenter

    SearchDataManagement

    SearchAWS

    SearchOracle

    SearchContentManagement

    SearchWindowsServer

    Close