Minerva Studio - Fotolia

How to build a master data index: Static vs. dynamic indexing

Expert David Loshin explores the differences between static and dynamic indexing in master data management systems, and which queries each approach can support.

David Loshin

By

David Loshin, Knowledge Integrity Inc.

Published: 31 Oct 2018

Master data management systems are intended to present a unified view of information about key data domains, such as customers or products, using data pulled from original sources located within and outside the organization. Identity resolution and record linkage techniques are used to load all of the input source records, block the records according to predefined strategies, look for similarities and link records presumed to represent the same real-world entities. In some MDM systems, data collected from the linked records is combined into a single master record.

However, there is a risk that such artificially produced master records are inconsistent with the original source records. A different approach to provide accessibility to the information about a sought-after entity is to use a searchable master data index. The goal of the master index is to allow consumer applications to request information about a named entity and retrieve all the original records that have been linked together.

Identity resolution is typically performed as a batch operation, pulling data from the original sources, extracting the values from the data attributes to be used for similarity scoring (we will call them the "matching attributes"), followed by the process of linking sets of similar records into groups. Each group of linked records is assigned a unique identifier, and this unique identifier becomes the key for building the master data index. That index consists of two mapping tables: the search table maps the set of matching attributes to the unique identifier, and the index table maps the assigned unique identifier to all records assigned that identifier.

Static indexing vs. dynamic indexing

This index configuration provides what could be called a static master index used for search and retrieve. The search process begins with a consumer request for any records associated with a set of presented matching values (such as a customer's last name, first name and telephone number). The search table is queried to find any records with the presented matching values. If any records are found, it means there was a match in the data set, and for each of the found records, the corresponding unique identifier is looked up in the index table to find all other records linked to the found record. All those associated records can be retrieved and assembled into a result set given back to the data consumer.

The goal of the master index is to allow consumer applications to request information about a named entity and retrieve all the original records that have been linked together.

This master index solution works well, as long as there is an exact match for the attributes provided by the consumer seeking the data. The challenge is that even though this configuration is designed to link records in the presence of data variation, it does not support approximate searching, in which there is tolerance for variation in the presented matching values. In other words, unless you know the exact values for at least one of the indexed records, you won't be able to find any matches.

This suggests the need for a second type of master data index that can be called a dynamic index. A dynamic indexing system uses the same two mapping tables, but it also relies on the same type of identity resolution techniques used to create the master data index in the first place.

The records in the search table need to be blocked according to the same blocking keys used for the identity resolution process to create the master data index. Any set of matching values presented by a data consumer is used to determine the blocks that might contain matching records, and the records in those blocks are selected from the search index. Instead of executing an exact match, each selected record is paired with the presented matching values and is subjected to the same similarity scoring method used for the batch identity resolution. At this point, the mapped unique identifiers for any search records with scores at or above the matching threshold are used to search the index table to find all other records that are linked to the found record. All the associated records can be retrieved and assembled into a result set given back to the data consumer.

The response time for the static master data index is relatively fast, as it can be executed using a standard query and then table accesses using that query's results. But although the searching process for dynamic indexing may take longer, the result is greater precision and accuracy in retrieving matching records, providing more complete visibility of information about the sought-after entity.

Dig Deeper on Data management strategies

Business Analytics

Logi analytics suite to add new GenAI, SaaS capabilities
Insightsoftware, parent company of the embedded BI specialist, unveiled a new generative AI assistant and SaaS version of ...
Snowflake targets enterprise AI with launch of Arctic LLM
The data cloud vendor's open source LLM was designed to excel at business-specific tasks, such as generating code and following ...
AI-fueled efficiency a focus for SAS analytics platform
The vendor's latest product development plans include an AI assistant and prebuilt AI models that enable workers to be more ...

AWS Control Tower aims to simplify multi-account management
Many organizations struggle to manage their vast collection of AWS accounts, but Control Tower can help. The service automates ...
Break down the Amazon EKS pricing model
There are several important variables within the Amazon EKS pricing model. Dig into the numbers to ensure you deploy the service ...
Compare EKS vs. self-managed Kubernetes on AWS
AWS users face a choice when deploying Kubernetes: run it themselves on EC2 or let Amazon do the heavy lifting with EKS. See ...

Content Management

7 SharePoint problems that spur customers to leave the platform
SharePoint is a well-known content management and collaboration platform. Despite its popularity, it can introduce many ...
5 benefits of enterprise search
With a proper enterprise search strategy in place, organizations can improve their employees' efficiency and ensure customers ...
OpenText expands GenAI for enterprise content, IoT
OpenText finds a novel use for generative AI: combing through, sorting and summarizing massive amounts of IoT data. It also ...

Oracle sets lofty national EHR goal with Cerner acquisition
With its Cerner acquisition, Oracle sets its sights on creating a national, anonymized patient database -- a road filled with ...
With Cerner, Oracle Cloud Infrastructure gets a boost
Oracle plans to acquire Cerner in a deal valued at about $30B. The second-largest EHR vendor in the U.S. could inject new life ...
Supreme Court sides with Google in Oracle API copyright suit
The Supreme Court ruled 6-2 that Java APIs used in Android phones are not subject to American copyright law, ending a ...

SAP earnings for Q1 indicate strong cloud growth
SAP's cloud revenue for the first quarter of 2024 indicates healthy growth and sets the stage as customers plan cloud migrations ...
SAP chief AI officer: Waiting on AI is the wrong strategy
SAP's first chief AI officer, Philipp Herzig, outlines the company's new AI-focused organization and underscores why companies ...
SAP, Nvidia partner to boost Business AI development
SAP and Nvidia are working together to combine platforms and services that help customers build business-specific generative AI ...

Close