DQP Prerequisites¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
Contact the Forward Deployed Engineering team or your Alation Account Team to obtain the Data Quality Processor (DQP) installer. You will be provided with a download link. The installer must then be uploaded to your Alation Services Manager instance.
Source Systems¶
Data Quality Processor for Snowflake¶
To begin using Alation DQP with Snowflake, data metric functions (DMFs) are required. Please refer to Snowflake DMFs documentation for more information. Once DQ rules are created in Snowflake and executed, Alation’s DQP ingests results, maps them to the corresponding catalog assets, applies RAG thresholds, and overlays the insights with metadata.
Note
For Snowflake users, Snowflake Enterprise Edition is required to use Data Metric Functions (DMFs). DMFs are a mandatory requirement for DQP.
Data Quality Processor for Databricks¶
Lakehouse Monitoring is required to begin using Alation DQP with Databricks. This needs your workspace to be enabled for Unity Catalog, and you must have access to Databricks SQL. Please refer to Databricks Lakehouse Monitoring documentation for details. Once Quality is enabled on the Databricks tables you have choosen, Alation’s DQP ingests results, maps them to the corresponding catalog assets, applies RAG (Red/Amber/Green) thresholds, and overlays the insights with metadata.
Alation Service Manager (ASM)¶
ASM must be installed and configured first. DQP is then installed and configured on ASM. Contact the Forward Deployed Engineering team or your Alation Account Team to obtain ASM.
Network Requirements¶
ASM requires inbound/outbound connections on port 443 to be open for DQP to function. This is because DQP leverages APIs that create HTTPS inbound/outbound traffic between ASM and your Alation instance.
Alation Instance¶
Cataloged Data Sources¶
The source system and the corresponding data associated with the DQ rules you will create must be set up as a cataloged Alation data source, and MDE must have been run at least once on that source system.
Compose¶
You will need to have Alation Compose configured with your data source. Compose is used by DQP to automatically execute the SQL needed to retrieve and publish the data quality results. For example, if Snowflake is the source of your data quality information, you will need to have Compose enabled and configured with the correct connection details and credentials for your Snowflake instance.
Alation Catalog Objects Customization¶
Before DQP can be deployed, there are Custom Fields, Document Hubs, Document Templates and Document Hub Folders that need to be created. The naming of these objects needs to exactly match the names specified below.
Custom Fields¶
Go to Settings > Custom Fields (or Settings > Customize Catalog > Custom Fields using the old UI). Scroll down to Rich Texts and create the following rich text custom fields. Note, some of the objects are data-source specific.
Custom field Name: DQ Processor ID (DO NOT EDIT)
Tooltip Text: This ID is used by the DQ Processor and must not be changed.

Custom field Name: DQ Processor
Tooltip Text: [none required]

For Snowflake sources only, create the following additional custom field (this will be added to a document template in a later step):
Custom field Name: DMF Location
Tooltip Text: Is this DMF part of the Snowflake CORE or is it CUSTOM

Go to Settings > Custom Templates (or Settings > Customize Catalog > Custom Templates using the old UI) and add the above custom fields to the Table and Column templates in your Alation catalog. For example, for the Table template:

(Note: use the Applies to: All Sources setting)
Document Hubs¶
Go to Settings > Custom Templates (or Settings > Customize Catalog > Custom Templates using the old UI) and scroll down to Document Hubs. Create a Document Hub with the following settings:
Hub & Navigation Name : Data Quality Processor
Folders are called: DQ Metric List, DQ Metric Lists
Documents are called: Data Metric Function, Data Metric Functions

Click Save. Click Publish.
Within the Document Hubs section scroll down and open the Data Quality Processor section. Alongside Document Templates click the button to create a template with the following settings, and add the custom fields you created the previous steps:
For Databricks sources only:
Create a Document Template with Name: Lakehouse Monitor Metric
Insert the “DQ Processor ID (DO NOT EDIT)” Custom Field.

For Snowflake sources only:
Create a Document Template with name: Snowflake DMF
Insert the “DQ Processor ID (DO NOT EDIT)” Custom Field.
Insert the “DMF Location” Custom Field.

DQ Metric List¶
Go to the Alation homepage and in the left nav click Data Quality Processor:

Click Create DQ Metric List and name your list:
For Databricks sources name it Lakehouse Monitoring Metrics
For Snowflake sources name it Snowflake DMFs.
For example, for a Snowflake setup:

Click Save
Once DQP runs it will populate these folders with the rules it finds.