Data. We have more of it now than ever before. And in the last 10 years, technology has provided us with the tools to collect ever more data. The question is what do we do with it? What does it mean?
The data we've collected tells us we sold x amount of widgets in product line y in the second quarter of 2006 in market z, but so what? What does it all mean in the grand scheme of things?
This is where data mining comes in. There has been a substantial leap in the data mining capabilities in the SQL Server 2005 edition compared to its predecessors. Data mining algorithms are the tools that provide insight into the "why" of what your clients, customers, data subjects and so on have done. The tools also help foresee, mathematically speaking, what these groups are likely to do. If you have an inclination that your business can benefit from SQL Server 2005 data mining features, use this checklist as a bellwether to determine if this is the direction you really want to move in.
|Checklist: SQL Server 2005 data mining|
|SQL Server version|
|Keep in mind that the data mining features are only available in the Standard & Enterprise versions of SQL Server 2005.|
|Defining the data source|
|Although the data mining algorithms in SQL Server 2005 are actually part of SQL Server 2005 Analysis Services, you are not required to pre-build an Analysis Services cube to mine from. The SQL Server 2005 data mining algorithms can also use a relational database as a source rather than a multidimensional one.|
|Understanding the toolset
|• There are nine data mining algorithms in SQL Server 2005, and each is used for a different purpose. Some do, however, cross over. The areas they cover include: What is the business problem at hand? Are you interested in greater insight into why you gain and lose customers? How about analyzing current borrower history to determine a new loan applicant's level of risk?|
|• Fortunately, there is a Data Mining Wizard. This is the starting point for creating a new mining structure. In addition, the Data Mining Designer, which allows you to modify existing mining structures or create new ones based on already-existing structures, creates mining structures from scratch. Use the wizard, and be glad that it's there. Keep in mind, as it is with all things .NET, there are APIs that can be accessed directly if you are absolutely, fundamentally opposed to using a wizard.|
|• Data mining extensions (DMX) -- DMX is the query language of data mining. It is part of the OLE DB specification and allows you to add custom data mining functionality to your applications.|
|Training the model|
|The basic premise in SQL Server 2005 is based upon data. That data needs to be trained. For the most part, there are three basic components to making data mining happen:|
|Keep in mind that training is an ongoing process, and as new production data comes in, you will want to continually re-train your model(s) to give the algorithms as broad of a dataset as possible to make predictions. It is a time-consuming process, but usually only occurs on a weekly or monthly basis.|
|SearchSQLServer.com recently posted some excerpts on July 16 from one of the best in-depth resources on the subject available right now. Do yourself a favor and pick up a copy of ZhaoHui Tang's and Jamie MacLennan's Data Mining with SQL Server 2005. The newly updated (as of July 2006) SQL Server 2005 Books Online also provides a wealth of knowledge on the subject. Don't forget to search online as well!|
In summary, data mining is one of the most esoteric features of SQL Server 2005. It is probably one of the most difficult to understand unless you have a background in higher math or a former career as a statistician. I can also tell you from personal experience that the data mining features of SQL Server 2005 are far greater improved than its predecessor. If you don't happen to have a substantial background in higher math, but are fortunate enough to have folks in your organization who do, you would be best served to "buddy up" with them.
About the author: Laurence Schwarz is a Business Intelligence Consultant with Quilogy. For the last 10 years Schwarz has been involved with Microsoft SQL Server, in the roles of Database Administrator, Database Engineer and Production Support Analyst.
More information on SearchSQLServer.com