By Michael Meyer
Published on 2024年10月15日
Master Data Management (MDM) and data catalogs are both becoming ubiquitous as organizations integrate more systems, embark on data-hungry and compliance-fraught AI initiatives, comply with extensive privacy regulations, and address data quality concerns — all while making data easy to find, access, trust, and use. Organizations that use data catalogs to support subsequent MDM initiatives experience organizational efficiency, accelerated decision-making, and overall customer satisfaction.
MDM is a discipline that takes a comprehensive approach to managing and organizing critical enterprise information, such as customer data, financial insights, and product details. Its crucial goals are to avoid duplication, inconsistency, and other data quality issues. Those responsible for MDM typically use dedicated technology, tools, and processes to ensure high-quality, accessible, and trusted enterprise data.
In many organizations, data is duplicated, stored in multiple locations, transferred and manipulated numerous times, and handled differently by different departments. This causes data inconsistencies that can reduce accuracy, trust, and confidence in the data.
On the contrary, MDM strives to create a single source of truth for critical organizational data by integrating data from across the enterprise that teams, applications, and analytics tools pull from and ensuring everyone utilizes consistent information.
Transactional systems and data warehouses can also use this “golden record” of data as the organization’s most current, trusted, and comprehensive source of information. Furthermore, this single-source approach increases efficiency for data users and technical teams who can focus on utilizing the data instead of trying to integrate it themselves.
As mentioned, effective MDM relies on dedicated technology, tools, and processes. A data catalog is an important technology component of MDM.
A data catalog is a metadata repository of information that helps analysts and other data users find the data they need, serves as an inventory of available data, and provides information to evaluate data's fitness for intended uses. Data catalogs surface information from sources across the enterprise, including data sets, business intelligence reports, visualizations, and conversations.
Fundamentally, metadata is “data about data.” Metadata helps people understand the data's content, physical structure, and purpose, making it easier to organize and describe it to others so they can use it. Metadata can be employed with various data formats, encompassing documents, images, videos, databases, and beyond.
For example, a customer record in a Customer Relationship Management (CRM) application might contain a customer’s name, email address, and phone number. In a data catalog, that information is augmented with metadata that details the source system, the date it was last cataloged, and the person responsible for maintaining the data. This metadata helps data consumers – workers using data – better understand the data’s quality, accuracy, trustability, and more.
Modern data catalogs now empower entire organizations to tackle a broad range of data intelligence challenges, including:
Self-service analytics and supporting a robust data culture,
Data governance and data governance policy management,
Creating an AI-ready organization and adapting to AI’s data governance concerns,
Privacy and changing data privacy laws, and
Cloud modernization and cloud data migration.
At the most basic level, MDM ensures consistent, reliable data across an organization, while data catalogs help workers find the right data quickly. A data catalog benefits MDM by identifying data sources, assessing data quality, and managing data policies—all critical to responsible data usage. MDM without a data catalog offers no guidance for users to instill trust and confidence in the data.
Implementing a data catalog first will make MDM more successful. To understand more, let’s break down the critical capabilities of MDM and data catalogs.
A data catalog provides a centralized application for stakeholders to participate in and collaborate on a major undertaking such as MDM. Successful MDM initiatives must start with a complete understanding of the resources involved, such as the people, processes, technologies, and data.
Identifying and cataloging data sources that create/update data for the entity you are trying to master is essential. Stakeholders must understand the data and its journey before mapping its attributes to be mastered.
The data catalog can be invaluable to this process by capturing and storing valuable information, including:
Tables and column descriptions
Key business people (domain owners) and application subject matter experts
Source to MDM target mapping matrix of the attributes to be mastered
Data lineage detailing where data originated and how it may have transitioned (more critical for analytical MDM)
Downstream consumers
Attribute classification as Master, Transactional, or Reference data
Starting with MDM typically forces organizations to spend more time analyzing data structures, reworking MDM assumptions and processes as new sources are found, and communicating the impacts to downstream data consumers.
It is critical to ensure data quality before incorporating it into MDM. Good data is crucial to creating accurate golden records because it directly impacts the reliability of information used for decision-making. While data quality can be qualitative, a data catalog can track the nine popular data quality characteristics and dimensions that determine high-versus low-quality data. In today’s world, data quality is also critical for ensuring the success of AI-related initiatives.
A data catalog is a one-stop shop for an organization’s data quality policies. Hence, employees at all levels understand data quality, how it’s measured, and what implies high-quality data. Documenting rule definitions and corrective actions further guides domain owners and data stewards in addressing quality issues. Using a data catalog to review data profiles can help discover other potential quality concerns.
In an MDM-first approach, it can be challenging to convey the importance of data quality to stakeholders, especially if this is the first time they are introduced to it. Without appropriate data quality, the testing phase of the mastered entity may reveal issues, causing stakeholders to question whether MDM is working. The time it takes to understand and resolve the data quality cases could derail the MDM project for an extended period.
There are rules that must be defined to construct the golden record. It is critical to outline how the process will work for merging and matching the data for a mastered entity. The rules contain pertinent information for constructing the golden records, such as what makes the entity unique, which attribute value to use when there are duplicate records, reference data alignment, what amount of latency is acceptable, and others.
While data-related policies typically imply data governance policies and rules, MDM policy definitions for domain entities can also be created in a data catalog. Having this information in a data catalog makes it easier to find and use. In addition, it helps communication with stakeholders and others so everyone understands how the mastered entity is constructed.
When a team starts with MDM, the rule definitions often become an exercise led by IT, and the details are buried inside the MDM software. This weakens transparency for stakeholders, and alignment is easily overlooked, causing longer testing cycles and rework.
When starting an MDM project, a data model must be created as the blueprint for the single source of truth. Most MDM tools provide the means to develop the model, which contains the tables, relationships, and attributes pertinent to the solution.
The model must reflect the analysis work done to this point in the data catalog. The information such as the sources, source to MDM target mapping matrix, and attribute classification are essential to creating the model. Curated descriptions from the data catalog should be used in the model for continued consistency and understanding of the mastered entity.
In an MDM-first approach, modeling the mastered entity first without understanding the outputs from the data analysis put into the data catalog is a recipe for rework and frustration. Having that information readily available in a data catalog streamlines and eases MDM efforts.
The next step is building the data objects, including tables and columns. The MDM tool should help with this.
The team should reference the MDM policy and metadata information in the data catalog in the MDM tool. This information is critical before making any future changes to the MDM solution.
A company that decides to use and tweak a base model from the MDM tool will likely fail to meet its needs. Modeling is a critical process step that relies on analyzing and understanding the data. The best way to gain this knowledge is by having it easily accessible in a data catalog.
The final step is to build the MDM solution, which includes creating the code and pipelines to address data duplication, inconsistency, and quality issues to make the golden records.
Establishing the MDM policy in the data catalog allows engineers to use the rules as the requirements for building the solution. They can also collaborate in the data catalog to get answers to any questions that arise during construction.
When using an MDM-first approach, engineers often start development with the rule definitions created while coding the solution. This might get the code to testing faster but inevitably will not match stakeholders’ expectations. The project will then have to return to defining the rules, and someone will have to explain to management why delivery dates have to be moved.
It is clear that establishing the data catalog first provides a better path to success for an MDM initiative.
The benefits organizations can achieve by starting with a data catalog include:
Reducing the cost and time of MDM initiatives
Ensuring all data is identified and assessed for quality before it is incorporated into an MDM initiative
Increasing trust in and access to much-needed data for analysis, decision-making, AI-driven applications, and other critical business imperatives that rely on the MDM initiative
Schedule a personalized demo today to learn how Alation can help you accelerate and increase the value of your MDM initiatives.
MDM is a discipline that helps organize critical information to avoid duplication, inconsistency, and other data quality issues. The concept is to have a “golden record” for core entities in your organization including customers, products, locations, and many others. Transactional systems and data warehouses can then use the golden records as the entity's most current, trusted representation.
Implementing a data catalog first will make MDM more successful. The reason why can be broken down into these critical areas, including identifying resources, assessing data quality, and defining policies.