DQP Prerequisites

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Contact the Forward Deployed Engineering team or your Alation Account Team to obtain the Data Quality Processor (DQP) installer. You will be provided with a download link. The installer must then be uploaded to your Alation Services Manager instance.

Source Systems

Data Quality Processor for Snowflake

To begin using Alation DQP with Snowflake, data metric functions (DMFs) are required. Please refer to Snowflake DMFs documentation for more information. Once DQ rules are created in Snowflake and executed, Alation’s DQP ingests results, maps them to the corresponding catalog assets, applies RAG thresholds, and overlays the insights with metadata.

Note

For Snowflake users, Snowflake Enterprise Edition is required to use Data Metric Functions (DMFs). DMFs are a mandatory requirement for DQP.

Data Quality Processor for Databricks

Lakehouse Monitoring is required to begin using Alation DQP with Databricks. This needs your workspace to be enabled for Unity Catalog, and you must have access to Databricks SQL. Please refer to Databricks Lakehouse Monitoring documentation for details. Once Quality is enabled on the Databricks tables you have choosen, Alation’s DQP ingests results, maps them to the corresponding catalog assets, applies RAG (Red/Amber/Green) thresholds, and overlays the insights with metadata.

Alation Service Manager (ASM)

ASM must be installed and configured first. DQP is then installed and configured on ASM. Contact the Forward Deployed Engineering team or your Alation Account Team to obtain ASM.

Network Requirements

ASM requires inbound/outbound connections on port 443 to be open for DQP to function. This is because DQP leverages APIs that create HTTPS inbound/outbound traffic between ASM and your Alation instance.

Alation Instance

Cataloged Data Sources

The source system and the corresponding data associated with the DQ rules you will create must be set up as a cataloged Alation data source, and MDE must have been run at least once on that source system.

Compose

You will need to have Alation Compose configured with your data source. Compose is used by DQP to automatically execute the SQL needed to retrieve and publish the data quality results. For example, if Snowflake is the source of your data quality information, you will need to have Compose enabled and configured with the correct connection details and credentials for your Snowflake instance.

Alation Catalog Objects Customization

Before DQP can be deployed, there are Custom Fields, Document Hubs, Document Templates and Document Hub Folders that need to be created. The naming of these objects needs to exactly match the names specified below.

Custom Fields

Go to Settings > Custom Fields (or Settings > Customize Catalog > Custom Fields using the old UI). Scroll down to Rich Texts and create the following rich text custom fields. Note, some of the objects are data-source specific.

  • Custom field Name: DQ Processor ID (DO NOT EDIT)

  • Tooltip Text: This ID is used by the DQ Processor and must not be changed.

../../../_images/FDEdqpCustomfield1.png
  • Custom field Name: DQ Processor

  • Tooltip Text: [none required]

../../../_images/FDEdqpCustomfield2.png

For Snowflake sources only, create the following additional custom field (this will be added to a document template in a later step):

  • Custom field Name: DMF Location

  • Tooltip Text: Is this DMF part of the Snowflake CORE or is it CUSTOM

../../../_images/FDEdqpCustomfield3.png

Go to Settings > Custom Templates (or Settings > Customize Catalog > Custom Templates using the old UI) and add the above custom fields to the Table and Column templates in your Alation catalog. For example, for the Table template:

../../../_images/FDEdqpCustomfield4.png

(Note: use the Applies to: All Sources setting)

Document Hubs

Go to Settings > Custom Templates (or Settings > Customize Catalog > Custom Templates using the old UI) and scroll down to Document Hubs. Create a Document Hub with the following settings:

  • Hub & Navigation Name : Data Quality Processor

  • Folders are called: DQ Metric List, DQ Metric Lists

  • Documents are called: Data Metric Function, Data Metric Functions

../../../_images/FDEdqpDocHub1.png

Click Save. Click Publish.

Within the Document Hubs section scroll down and open the Data Quality Processor section. Alongside Document Templates click the button to create a template with the following settings, and add the custom fields you created the previous steps:

For Databricks sources only:

  • Create a Document Template with Name: Lakehouse Monitor Metric

  • Insert the “DQ Processor ID (DO NOT EDIT)” Custom Field.

../../../_images/FDEdqpDocHubLakehouse.png

For Snowflake sources only:

  • Create a Document Template with name: Snowflake DMF

  • Insert the “DQ Processor ID (DO NOT EDIT)” Custom Field.

  • Insert the “DMF Location” Custom Field.

../../../_images/FDEdqpDocHubSnowflake.png

DQ Metric List

Go to the Alation homepage and in the left nav click Data Quality Processor:

../../../_images/FDEdqpMetricList1.png

Click Create DQ Metric List and name your list:

  • For Databricks sources name it Lakehouse Monitoring Metrics

  • For Snowflake sources name it Snowflake DMFs.

For example, for a Snowflake setup:

../../../_images/FDEdqpMetricList2.png

Click Save

Once DQP runs it will populate these folders with the rules it finds.