Curate Your Catalog Effectively: Proven Methods and Tools

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Catalog curation enriches the data catalog with valuable information such as Titles, Descriptions, and field values, ensuring data assets are discoverable, trusted, and ready for use in analysis. It enables users to locate and understand data assets, supporting collaboration and data-driven decision-making. Led by Catalog Admins, the curation process integrates with governance practices to align metadata with organizational policies.

A curated catalog can transform technical metadata into a strategic resource.

Best Practice 1: Establish the End Goal

How do you approach curation?

Start by defining clear goals that will lay a strong foundation for the catalog curation process and broader team involvement. Misaligned goals can result in inconsistent progress. In contrast, a shared understanding of goals promotes measurable progress, helping contributors feel confident and productive.

Set Expectations

Your key questions are:

What are your curation standards?

Curation standards provide a guide to stewards on the fields to be curated. Without standards, different teams may interpret curation in various ways.

From the outset, clarify whether curation in your catalog focuses on adding Titles, Descriptions, and fields or also extends to governance through policies and glossary terms or documents. It’s important for all contributors to understand the end goal in the same way.

What is a fully curated asset?

If all the fields defined in your curation standard are curated for an asset, it’s a fully curated asset.

Clearly define what qualifies as a fully curated asset in your case. Specify whether it involves basic Titles and Descriptions or includes comprehensive fields, links to policies, and glossary terms. Achieving even 70% of the goal is a big milestone.

What does full catalog curation mean?

Measure curation and evaluate progress through metrics. Define whether your goal is focused on curating critical assets or achieving comprehensive coverage of everything in the catalog. For larger catalogs, full coverage may be an unrealistic goal. Making the curation of critical assets a priority can be a practical benchmark for success.

Set Achievable Objectives

Curation is continuous, not a one-time task:

  • Break efforts into manageable phases that allow for time-boxed goals.

  • Scope curation efforts: For example, top 10 out of 100 tables from each domain or schema.

  • Prioritize high-value or frequently used data assets.

Adopt a Tiered Approach

Use a phased approach to structure your curation efforts:

  • Basic—Add Titles and Descriptions.

  • Intermediate—Populate custom fields and additional metadata.

  • Advanced—Include policies, glossaries or documents, and governance standards.

Define fully curated assets based on these tiers. For example, an asset might require only titles and descriptions or include additional metadata for advanced curation.

Adapt as You Make Progress

Curation is iterative. Regularly revisit definitions and adjust objectives based on the insights that emerge and the challenges you identify.

Best Practice 2: Design Thoughtful Custom Templates

Custom templates for catalog pages are a way to ensure curation consistency and relevance in your data catalog. By tailoring templates to meet the specific information requirements of each type of catalog page, you can optimize both data discovery and data understanding for your catalog users.

Understand Templates and Fields

  • Templates—Templates define the structure of catalog pages. Templates act as a mold that ensures consistency across all catalog pages of a specific type of data asset, such as a schema, table, or column.

  • Catalog pages—Catalog pages are the primary interface through which users access catalog metadata. They contain detailed information about your organization’s data assets. A page is a specific instance of the template.

  • Fields—Each field within a catalog page holds specific information about a data asset. Examples include Titles, Descriptions, lists, rich text fields, and references.

The information on catalog pages serves two essential purposes:

  • Facilitate discovery—Helping users efficiently find the data assets they need through Alation Search.

  • Clarify data—Providing users with detailed definitions and context for the data elements they find.

Facilitate Discovery

Think about how users search for data. For instance, if users often search by geography, include a custom field for Region to enable advanced filtering.

Include fields that support free-text and advanced faceted searches, such as picker fields and object sets.

Clarify Data

Consider adding fields that address common concerns or questions. For example:

  • A PII Classification field with options like PII, Non-PII, or Unclassified to clarify data sensitivity.

  • A Molecule field for a pharmaceutical company, helping analysts quickly identify data sets relevant to their specific research.

Collaborate on Template Design

Consider convening a core team of knowledgeable data users for workshops if possible. This team should:

  • Identify the custom fields required for both discovery and clarification.

  • Agree on the content structure for each type of catalog page.

  • Ensure that templates align with organizational goals and user needs.

By designing thoughtful custom templates, you can create a data catalog that is both highly navigable and richly informative, empowering your data users to find and understand the data.

Best Practice 3: Prioritize Your Curation Efforts

Metadata extraction can create tens of thousands of catalog pages. It’s crucial to prioritize your efforts. You can maximize the impact of your curation efforts by focusing on catalog pages that deliver the most business value, answer the most critical questions, or are most frequently accessed.

Define High Priority Assets

  • Business criticality—Identify data assets that are central to your business and technical users. These critical datasets should be prioritized for curation to ensure stakeholders have the clarity and confidence to use them effectively.

  • Complexity—Consider data assets that generate a disproportionate number of questions from users. These datasets often indicate areas of complexity or ambiguity. Prioritizing their curation reduces the burden on key resources by providing answers upfront.

  • Usage—High-usage data assets are another natural focus for curation. Tools like Alation’s Query Log Ingestion (QLI) measure the relative use of schemas, tables, and columns, presenting results as a Popularity score for each data object. These popularity indicators can highlight data assets that users frequently engage with and would benefit from immediate curation.

  • Page visits—Once foundational curation is in place, you can incorporate catalog page visits into your prioritization strategy. By querying Alation Analytics, you can identify the most-visited catalog pages and assess their curation status. Pages with high visits and incomplete curation may need to be added to your priority list and targeted for full curation.

Align Prioritization with Curation Goals

Ensure that the curation status of prioritized catalog pages complies with the definitions and expectations you established in the early stages. For example, if your definition of fully curated includes Titles, Descriptions, and custom fields, these elements should be present in all high-priority pages.

By prioritizing areas with the highest impact, curation efforts can build momentum and deliver significant value to data users.

Best Practice 4: Know Your Alation Tools

Get Up to Speed with Manual Curation

Manual curation is the most straightforward way to begin enriching your data catalog. It allows you to gain a hands-on understanding of catalog structure, templates, and fields while curating. By beginning with manual curation, you can both enrich your catalog and set the stage for a more scalable curation process in the future.

Manual curation involves:

  • Manually editing Titles and Descriptions directly on catalog pages to clarify the purpose and context of the data.

  • Manually setting specific values in custom fields.

  • Applying governance attributes, like linking policies, to catalog pages.

Manual curation helps you familiarize yourself with the user interface and establish a baseline understanding of how curated information is structured and displayed.

Leverage AI for Assistance

Alation Cloud Service on the cloud-native architecture

If you’re using a cloud instance of the catalog, the ALLIE AI Suggested Descriptions feature can help with the curation process. This feature generates descriptions for tables, allowing you to move through curating them faster.

Bulk-Curate Objects Through Data Dictionaries

Use data dictionaries for bulk updates of curation information.

A data dictionary is a structured file in the CSV or TSV format, that captures the curation information for catalog objects. It allows you to fill or update multiple fields, such as Titles, Descriptions, and custom fields, across an object hierarchy in bulk. This may be especially useful when curation information is already stored externally or maintained in spreadsheets.

Understand the Data Dictionary Format

The data dictionary has a rigid format that must be adhered to for successful uploads. A data dictionary file includes metadata for the parent object (for example, data source) and all its child objects (schemas, tables, columns).

Start with a Downloaded Template

Begin by downloading a data dictionary template for the specific object you want to curate.

Important

Avoid starting at the data source level, as this includes all schemas, tables, and columns, which can be overwhelming due to the volume of objects.

Instead, batch your efforts by focusing on a single schema or a table. This approach is more manageable.

Upload and Review

Populate the data dictionary template with metadata. If working from an existing external source, map your data to the dictionary format. Once your data dictionary is prepared, upload it to the catalog.

Manage Similar Objects

You can streamline the management of similar catalog objects by using catalog sets or the Stewardship Workbench, reducing manual effort.

Leverage Catalog Sets

Use catalog sets for consistent metadata management across similar RDBMS objects, especially for rich text fields. When your catalog contains duplicate or similar schemas and tables, catalog sets provide a structured way to group and curate them. Catalog sets allow you to group similar catalog objects either manually or based on rules you define. Once grouped, you can update specific fields across all objects in a set, ensuring consistency in metadata such as titles, descriptions, and classifications.

Leverage the Stewardship Workbench

If your catalog includes the Governance App, use the Stewardship Workbench for straightforward custom field updates like policy assignments and Steward designations. The Stewardship Workbench offers another way to bulk-curate fields. It allows you to curate objects identified through search and saved searches. Stewardship Workbench is not limited to RDBMS objects.

The Workbench has restrictions on the number of objects and types of fields you can update:

  • Rich text fields are not supported for updates through the workbench. For rich text fields or more complex updates, use catalog sets.

  • The Workbench can only handle a maximum of 10,000 updates.

Plan Updates Strategically

  • Strategize how many catalog sets to create and which fields to manage with each set.

  • Organize saved searches to efficiently track objects curated through the Stewardship Workbench.

Use Lexicon to Suggest Business Titles

Alation’s Lexicon provides a machine-learning-driven approach to enriching your catalog with meaningful business titles. Lexicon analyzes technical names in the catalog and generates suggestions for user-friendly titles. This feature combines automation with human validation to ensure accuracy.

As a Catalog Admin, begin by running Lexicon centrally to analyze technical names throughout the catalog and create a dictionary. Then you can review the suggested expansions. Titles suggested by Lexicon can be confirmed, modified, or rejected to ensure they align with the business language and context. Lexicon automatically uses the dictionary to suggest business titles wherever they match technical names in the catalog. Users can further refine these suggestions directly on catalog pages.

You can periodically revisit the Lexicon dictionary to refine or expand it as new technical names are added to the catalog or as business needs evolve. By leveraging Lexicon to suggest business titles, you can transform technical metadata into accessible and meaningful information faster and make your data catalog more user-friendly for both technical and non-technical stakeholders.

By leveraging Lexicon to suggest business titles, you can transform technical metadata into accessible and meaningful information faster and make your data catalog more user-friendly for both technical and non-technical stakeholders.

Best Practice 5: Assign Groups as Stewards

Effective curation involves assigning Stewards to domains or the cataloged assets directly to ensure supervision of curation. A Steward is typically a representative from a business function who takes ownership of a specific data set. The Steward field is a built-in feature in domain and object templates, designed to help delegate curation responsibility. Stewards actively contribute to curation by:

  • Adding and maintaining the curation information such as Titles, Descriptions, custom fields.

  • Reviewing and validating metadata to ensure alignment with business standards.

  • Collaborating with teams to address questions or provide insights about the data.

By assigning Stewards strategically, you can distribute the workload of curation. Whenever possible, assign groups as Stewards rather than individuals. This ensures:

  • Broader coverage of responsibilities.

  • Continuity when personnel changes occur.

  • Greater collective expertise for managing the data asset.

You can use these tools for bulk-assignments:

  • Stewardship Workbench:

    • Assign Stewards in bulk.

    • Use saved searches to track objects curated through the Workbench.

  • Catalog sets:

    • Assign Stewards across multiple catalog objects grouped by rules or manually added to a set.

    • Catalog sets allow you to manage and curate the Stewards fields consistently across the grouped objects.

Best Practice 6: Use APIs to Accelerate Curation

For organizations aiming to streamline the curation process, Alation provides a suite of public APIs that support faster curation across catalog objects. By leveraging APIs, teams can automate repetitive tasks and scale their curation efforts.

Getting Started with APIs

Access API Documentation

Visit the Developer Portal for detailed API documentation. Familiarize yourself with the endpoints, parameters, and examples provided for each API, along with recipes to help you get started faster.

Team Up with a Technical Resource

If possible, collaborate with a developer or technical team member to leverage the APIs. Ensure the technical resource understands the specific curation goals.

Learn and Iterate

Start with small, non-critical datasets to practice using the APIs and understand their capabilities. Use the learning curve to refine your approach before scaling up.

Automate Curation

Design API workflows to automate common curation tasks, such as:

  • Updating metadata across multiple catalog objects.

  • Applying consistent naming conventions to titles.

  • Integrating with external systems to pull or push metadata.

APIs Suitable for Curation

  • Relational Integration API

    • Designed for working with RDBMS objects such as data sources, schemas, tables, and columns. Enables automated updates to metadata fields.

  • BI Source API

    • Focused on BI tools, allowing updates to catalog objects such as reports, dashboards, and BI fields.

  • Custom Fields API

    • Allows admin users to read and create custom fields on an Alation instance.

  • Custom Field Values API

    • Allows asynchronously inserting and updating custom field values.

  • Data Dictionary API

    • Allows updating catalog objects with metadata provided in a data dictionary file.

Best Practice 7: Measure Progress and Revisit Goals

Measuring and revisiting goals is essential to ensure ongoing success. The goals and priorities for curation will naturally evolve over time as the catalog grows, and continuous monitoring and refinement help maintain progress:

  • New data sources—Incorporate new data sources into the prioritization and curation plan.

  • Adjust prioritization—Reassess priorities based on changes in business objectives, usage patterns, or governance requirements.

  • Reassign owners—Update Steward assignments as teams and responsibilities change.

Leverage Alation Analytics

Alation Analytics can be a powerful tool for tracking curation efforts and answering key questions, for example:

  • Track progress—Monitor the percentage of prioritized assets that are fully curated.

  • Identify gaps—Determine which fields on prioritized assets still need curation.

  • Assess popularity—Identify the most-visited catalog objects and their curation status.

  • Measure impact—Evaluate the curation percentage of your most-visited objects.

Alation Analytics resources available in the Alation Community (requires a separate login) include examples and best practices for querying Alation Analytics. While you may need to adapt these examples to fit your specific needs, they offer a solid foundation for tracking and reporting progress.

By monitoring progress and revisiting goals, you can ensure the data catalog remains a dynamic, valuable resource that evolves alongside business needs.