Guide to SQL Server data management and data quality
A comprehensive collection of articles, videos and more, hand-picked by our editors
Have you ever been in that conference room during that meeting—when different managers are battling it out over widely-divergent sets of numbers that all allegedly came from the same place? In the data-saturated world of businesses, that situation is becoming all too common. The variations don’t just come from different interpretations of the same data; it is different people coming up with entirely different sets of data for what should be the exact same business function.
Not only is this frustrating, it’s also dangerous.
The focus on business intelligence (BI) has many companies scrambling to get their management dashboards in place. Companies want to see data in intuitive, simplified forms. However, until fairly recently, there wasn’t a big focus on the quality of the data in those systems.
The old saying of “Garbage in, garbage out” is more applicable than ever as businesses attempt to eke out every possible advantage by using the data at their disposal. In our thirst for data, however, we may have generated a bit too much of it. Master Data Services (MDS) is a new feature of SQL Server 2008 R2 that seeks to address this problem.
SQL Server MDS: Low-Cost Master Data Management?
Microsoft acquired the underlying MDS technologies when it acquired Stratature Inc. in 2007. At the time, Stratature was a leader in Master Data Management (MDM), which is the name applied to the market place that MDS lives in. It’s likely that future iterations of MDS will introduce deeper integration with SharePoint and SQL Server Analysis Services; connections to Microsoft Dynamics and BizTalk are possible.
MDM products in general have come under some criticism for their high cost and low return on investment. Including MDS in the SQL Server product base provides a fairly low cost of entry.
Master Data Management products always require some kind of relational database management system on the backend. By including MDS with the backend, you’re essentially getting two for one. A few major vendors compete with Microsoft in this space, including IBM with InfoSphere MDM and Oracle MDM Suite. A raft of smaller competitors also exists, such as Informatica, Talend, Data-Flux, iWay Software and Initiate (which is part of IBM).
Gaining Control and Access to Data
The idea behind MDM products is to create a single, authoritative, shared set of data sources within SQL Server and to give specific individuals rights to read this data in order to produce reports and other materials. You can allow certain individuals and deny others from making changes to the data, which helps to protect it from unwanted changes.
MDS, however, goes beyond just permissions. It version controls data so you can always browse backward and find older versions of the data—especially useful if you need to undo changes that someone else made.
Business rules can notify data custodians when changes are made, custodians can then act on unwanted changes. Those business rules can even detect specific types of changes, and only notify custodians when those occur.
A Web-based “stewardship portal” provides access to everything, making it easier for non-database administrators to become data custodians and to access master data sets. Additional business rules govern what kind of data can go into the master data set and help preserve consistency and accuracy. Microsoft bills Master Data Services as a tool that provides a “single version of the truth and that’s an accurate description for what MDS can do for your business’s overall data accuracy.
Something’s missing in MDS
As a relative newcomer to the BI market, Master Data Services lacks some of the more sophisticated features of more mature products.Currently, MDS’s approach toward data intelligence focuses entirely on defining and maintaining master data. It doesn’t provide a great deal of assistance for administrators creating that master data.
If you accept the fact that your business already has multiple data sets, then having tools to discover them, merge them, match them and so forth becomes important in initially populating a master data store.
For example, suppose you want to merge multiple customer databases from an order-entry system and a CRM system into a master data set. You’ll need tools for that, and Microsoft currently doesn’t provide those tools in the box. Technically, you could use SQL Server Integration Services to perform the process of unifying, matching, scoring and selecting data, but that would be incredibly labor intensive.
Other data-focused products have standardization tools. As you move data into a master data store, for example, these tools can help clean the data—aligning things like product names to industry registries—and bring all of your data into compliance.
This goes a bit beyond the rule sets that SQL Server MDS provides, but if you were willing to manually import registries and create the necessary matching rule sets, you could conceivably create this kind of data cleaning on your own.
Similarly, tools can help clean customer data by validating addresses and phone numbers, for example, as you move data into a master data store. Again, this is something you could manually program in SQL Server MDS, but doing so requires additional development work and deployment time.
Master Data Management is often used during mergers and acquisitions where multiple sets of data are bound to exist, often in different formats. Merging these data sets into a master data store provides a clean start as well as better ongoing data management. SQL Server 2008 R2’s Master Data Services lacks the sophistication of more mature tools.
SQL Server Integration Services can perform the extract/load/transform (ETL) operations needed to merge multiple data sets, but doing so can require significant development and debugging time. Tools with specific MDM ETL capabilities help accelerate that time and provide more accurate master data creation.
In fact, the maturity of many MDM solutions is apparent in their ability to provide domain-specific master data tools. IBM, for example, offers a specific InfoSphere MDM Server for Product Information Management that’s designed specifically to handle master data for products sold by a company.
That might seem like an incredibly niche product, but it enables IBM— using its base InfoSphere MDM product—to provide greater automation for a class of data that is quite complex. By predefining data hierarchies for this class of data, the product speeds deployment and has more domain-specific tools to make consumption of that master data more meaningful to the business.
When will MDS grow up?
Microsoft may eventually release solution accelerators that will tweak MDS to specific data domains similar to products from other vendors. Infact, if properly designed and developed, those accelerators would add a marked layer of sophistication and flexibility to MDS and would position it near the top of the market in terms of capability and business applicability. There’s also an open market for third-party independent software vendors that want to build the same type of domain-specific capabilities on top of MDS.
Master Data Services is an excellent start on Microsoft’s part, and bundling the feature with SQL Server2008 R2 puts MDM into the hands of a vastly broader audience. Microsoft still needs to develop MDS with regard to the creation of master data—the ability to integrate, select and filter existing data sources into that initial master data store.
About the Author
Don Jones is a co-founder of Concentrated Technology LLC, the author of more than 30 IT books and a speaker at technical conferences worldwide. Contact him through his website at www.ConcentratedTech.com.