Data Lifecycle Management and the Role of the Data Catalog

By Joseph Perez

Published on June 26, 2024

Data Lifecycle Management and the Role of the Data Catalog

What’s your data’s story? For answers, look to its lifecycle. Data lifecycle management is the process of overseeing and managing data throughout its entire lifecycle, from creation and initial storage to the time it becomes obsolete and is deleted. 

While this seems obvious, being intentional about data management as it cycles through its lifespan has grown increasingly important, as privacy rules mandate stringent storage and deletion practices that must be documented and observed.

This wasn’t always the case. In my past life, I was an Oracle database administrator (DBA), where I focused on data storage and operations. Back then, the data archival and deletion practices of certain clients (which included some of the world’s preeminent financial institutions) were not always transparent, documented, or consistently practiced. Crazy, right?

Luckily, that’s changing. As business leaders embrace data-driven leadership, more organizations are formalizing their data lifecycle management practices. Not only does this safeguard quality – it’s good for business and critical for compliance. 

It’s also what consumers demand.  As of March 2023, 138 countries have enacted data privacy laws, and while the American Data Privacy and Protection Act (ADPPA) has not yet become law, more than 4 out of 5 US voters support its provisions.

In this blog, I’ll recap my brief on data lifecycle management (originally shared with the Alation Community), demonstrate its value, and show how the data catalog can help.

Data governance for data management

Data experts will be well acquainted with DAMA, which stands for the Data Management Association. This global body has been providing best practices, guidelines, and resources for data management professionals since its founding in 1980. One key aspect of DAMA is the Data Management Body of Knowledge (DMBOK). This thorough guide outlines the standard practices, principles, and terminology of data management, including data governance, data architecture, data quality management, and data security.

Image showing the DAMA wheel with data governance at its center

As you can see, DAMA places data governance at the center of its data management wheel. Data governance structures all other practices, serving as the framework of policies, processes, and standards that ensure effective data use within an organization. Its primary goals are to ensure data quality, promote consistency, enable data-driven decision-making, and safeguard regulatory compliance. 

Zooming back, proper data governance enables the proper use of data through storage and operations. It's the data governance piece that enables things like data security and document and content management. It also impacts definitions of key pieces which help your data teams collaborate, including terms, articles, documents, glossaries; aligning on these terms is critical for fostering understanding and cross-functional collaboration (and I can tell you I really suffered without this clarity in my time as a DBA!) 

In the Alation platform, data governance is overlaid on all data management functions. It touches on all these things. The catalog really is a platform at its heart that enables data governance to guide all other data practices – including storage and operations. 

Data storage and operations

I encourage my clients to think of data operations as a fact or (better yet) a function – and it’s critical such operations live in the catalog. Why? Databases can exist in their own kingdom, with little visibility or governance overlaid. A catalog makes these obscure kingdoms visible, along with the lifecycle of the data that dwells within them – and ensures their integrity.

Image showing how inputs combine with activities and people to drive outcomes in a data storage and ops model.

As a function, data operations require inputs or activities, along with a set of people to generate an output or outcome. These inputs may be data models, SLAs, or data requirements (for example, the need to curate data with titles and descriptions). The right data architecture is key to appropriately managing the technology and its operations.

What goes into the management of data storage and operations? Management entails understanding, evaluating, and monitoring the technology, developing database instances, migrating, testing, decommissioning, and more. All these activities play a part in database storage and operations. The people who take part in data operations include database administrators, software developers (who create requirements), app teams, and end consumers, like business analysts. 

How can these data leaders and users leverage a platform like Alation to improve data operations? By clearly defining each stage of the data lifecycle management and making it transparent in the catalog, leaders can ensure each stage is appropriately carried out.

The data management lifecycle: 6 steps

Let’s consider the full lifecycle, and demonstrate how a catalog can be overlaid at each step.

Image showing the 6 steps of data lifecycle management within a data catalog.

#1 Creation

Data creation tends to fall into three buckets:

  • Data entry is the most common, generated from humans or devices capturing or inputting data.

  • Data acquisition comes from a third party like the data company Dun & Bradstreet, for example.

  • Data capture is generated through devices or sensors (think IoT).

In a data catalog, newcomers to data need to know how it came to be, and why. Not only does this signal quality (and what further steps need to be taken to cleanse the data and make it available for use), creation details inform appropriate usage. Background context is key for proper governance and operations. All data users, from analysts to stewards and engineers, need the context of creation to understand how best to maximize data’s value.

#2 Processing (AKA data maintenance)

Processing describes how data is massaged and moved through the organization. It may include extract-load-transform (ETL or ELT), integration, cleansing, scrubbing, sorting, filtering, aggregating, and even applying algorithms in order to make it usable for a broader range of people.

#3 Storage

Location, location, location applies to more than just real estate: it’s critical for data assets, too! Clearly communicating storage details does more than ensure compliance for audits and safeguard against loss – it streamlines access for new users. This is why IT leaders need to understand where data exists and where it’s replicated. The catalog serves as the gateway to the environment of the source, and the place to show people how to gain access to this data, ensuring that’s a seamless process.

Image showing how data storage and usage is documented in the data catalog.

Key considerations for storage include:

  • Location: Does data live in the cloud, on-prem, or in a hybrid environment? Storage location informs format, as well as dev, QA, and production

  • Security: Is the data encrypted? How do we control access to ensure compliance, by role, team, or domain?

  • Protection: Is the data backed up? Is there ADR recovery? Is it replicated, and if so, where? 

Clearly communicating these details in the catalog ensures that data is useful so that it may empower business users to make decisions, IT leaders to safeguard security, and governance leaders to ensure compliance.

#4 Usage

What does usage entail? If you consider data as a service or a product (and the many types of “customers” who will seek to use it) you can appreciate the various responsibilities that come with usage – and the questions likely to arise.

Is it findable? Is it certified as trusted? Is it packaged to communicate when and why it was created, how it’s been used, by whom, how, and why? A data catalog addresses these questions with metadata, informing future usage.

#5 Archiving

How do you know when data is ready to be archived? Simple: When it's no longer processed or used. Another sign is that it's not published, but it still exists, and users aren't really consuming it in any downstream applications. Regulations may require that you archive the data for ten years before destruction. 

In this case, leaders would remove it from all active production environments and copy the data to another environment/system, where it can live out its days in a (more affordable) archive. This enables data teams to remain compliant and cost-effective in their data usage while still staying aware of where key data sits in case they need to access it.

#6 Destruction

Data must be destroyed at the end of its usable lifecycle. This entails removing the data and every copy of the data from the organization.

“But what if I need this later?” your data scientists may ask. Even if you wanted to keep your data absolutely forever, there are a few reasons you can't. 

First, it would become extremely expensive. As data volumes explode over time, we're going to see those costs that come with capturing and storing this data for the long term explode as well.

That’s why we need a strategy to ensure proper destruction. From a regulatory standpoint, (think GDPR) certain types of data can't live forever for legal reasons. What’s more, privacy laws have given consumers the ability to request that their personal data be destroyed. So when someone requests that an organization delete their personal data, there need to be strategies and policies in place to govern how the organization processes that request and sees it through

How to build your data management plan in 5 steps

This is all great in theory. But how do we put it into practice? For those just starting out with a data catalog, here are the steps I recommend you take to kick off your data management lifecycle.

Key steps to create a data lifecycle management plan in a data catalog.

Step 1: Identify the data to be collected 

By estimating data volume, you can determine infrastructure costs and the time it will take your team to set up. To plan the appropriate data architecture, you need to know the types of data you’ll collect and a rough idea of its volume. 

Step 2: Define how the data will be organized 

What tools will you need throughout the data’s lifecycle, and which environment will support their usage? You might consider a data warehouse, on-prem, a cloud data lakehouse, or a hybrid environment. Also consider the types of data you seek to leverage, and the regulations governing their analysis. Understanding the implications of these choices will enable to you define your governance policies. 

Step 3: Document your data storage and preservation strategy 

Form follows function. Now that you know the types of data you’ll collect and your environment, you can answer key questions like, How long should data be accessible, and who can access it? How will we store and protect data through its lifecycle? 

It’s key to understand when data must be destroyed, as well as what your system’s backup/retention and recovery operations.

Step 4: Define data policies 

Is the data you’re collecting persuant to licensing and sharing agreements or restrictions? For example, is the analysis of sensitive data restricted for legal or ethical reasons? Ensure that data is clearly labeled so newcomers remain compliant in their usage.

Step 5: Define roles and responsibilities

How will you keep the data lifecycle train rolling and on time? By recognizing people who are already stewarding data, for example, you can build your team of data stewards (without having to hire new folks). 

Those just starting out on their data management journey have a lot to consider! A data catalog (AKA a data intelligence platform) like Alation is a critical foundation for data management, as it serves as a trusted gateway to certified, curated data for a range of users across an organization.

Curious to learn more? Join us for a demo. 

    Contents
  • #1 Creation
  • #2 Processing (AKA data maintenance)
  • #3 Storage
  • #4 Usage
  • #5 Archiving
  • #6 Destruction
  • Step 1: Identify the data to be collected 
  • Step 2: Define how the data will be organized 
  • Step 3: Document your data storage and preservation strategy 
  • Step 4: Define data policies 
  • Step 5: Define roles and responsibilities
Tagged with