Prerequisites¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
Prepare for configuring the Databricks Unity Catalog OCF connector by going over these recommendations:
Pre-checks¶
On your Databricks instance, ensure the following:
Unity Catalog is enabled.
The workspaces have been assigned to the Unity Catalog metastore. There is a running Unity-compatible interactive cluster or SQL warehouse that Alation will connect to and extract the metadata.
For lineage extraction (beta) and query log ingestion, the system schema
system.access
is enabled.
Configure Network Connectivity¶
Open inbound TCP port 443 to the Databricks Unity Catalog server.
Create a Service Account¶
When configuring the connector in Alation, you will need to provide authentication details. To obtain those:
Create a Databricks account-level user to be used as a service account in Alation.
Assign Alation’s user the
USAGE
andSELECT
permissions on all catalog, schema, and table objects that you want to catalog in Alation.Metadata extraction requires additional permissions. See Grant Permissions for Metadata Extraction.
Lineage extraction requires additional permissions. See Grant Permissions for Lineage Extraction.
Query log ingestion (QLI) requires additional permissions. See Grant Permissions for QLI.
Assign Alation’s user to workspace(s) using the information in Manage users, service principals, and groups. It must be assigned to the same workspace(s) as the cluster or SQL warehouse.
Assign Alation’s user the
Can Attach To
permission on the cluster or theCan Use
permission on the SQL warehouse.
Optionally, you can grant Alation’s user the
Can Restart
permission if auto-starting the cluster is allowed by your organization’s policy.If you want to use authentication with a personal access token, grant Alation’s user the
Can Use
permission on personal access tokens.
Grant Permissions for Metadata Extraction¶
Ensure that you grant the following permissions required for metadata extraction from Unity Catalogs present in workspace using the information_schema
of SYSTEM
catalog.
System Catalog Access¶
GRANT USE CATALOG ON SYSTEM TO <user>
Schema Extraction¶
GRANT USE SCHEMA ON <CATALOG> <catalog_name> TO <user>;
GRANT USE SCHEMA ON <SCHEMA> <schema_name> TO <user>;
Table Extraction¶
You can either grant SELECT permission at catalog or schema level or at a specific table level.
GRANT SELECT ON <CATALOG> <catalog_name> TO <user>;
OR
GRANT SELECT ON <SCHEMA> <schema_name> TO <user>;
OR
GRANT SELECT ON <TABLE> <table_name> TO <user>;
View Extraction¶
You can either grant SELECT permission at catalog or schema level or at a specific view level.
GRANT SELECT ON <CATALOG> <catalog_name> TO <user>;
OR
GRANT SELECT ON <SCHEMA> <schema_name> TO <user>;
OR
GRANT SELECT ON <VIEW> <view_name> TO <user>;
Catalog Extraction¶
GRANT USE CATALOG ON CATALOG <catalog_name> TO <user>;
Grant Permissions for QLI¶
The Unity Catalog audit log feature is currently in Public Preview in Databricks and may require separate access enablement:
Contact your Databricks administrator about enabling access to this feature. Query log ingestion (QLI) in Alation uses this functionality and is currently a beta feature.
QLI requires access to the system
catalog, the system.access
schema, and the system.access.audit
table in this schema. Grant the Alation service account these permissions:
USE CATALOG
on catalogsystem
USE SCHEMA
on schemasystem.access
SELECT
on tablesystem.access.audit
Grant Permissions for Lineage Extraction¶
The Unity Catalog lineage feature is currently in Public Preview in Databricks and may require separate access enablement:
Contact your Databricks administrator about enabling access to this feature. Lineage extraction in Alation uses this functionality and is currently a beta feature.
Lineage extraction requires access to the system
catalog, the system.access
schema, and the tables in this schema. Grant Alation’s user these permissions:
USE CATALOG
on catalogsystem
USE SCHEMA
on schemasystem.access
SELECT
on tablesystem.access.table_lineage
SELECT
on tablesystem.access.column_lineage
The service account does not require USE
or SELECT
for all catalogs, schemas, and tables captured in the lineage records in the system.access
lineage tables. All lineage will be extracted. Any objects that are not cataloged but exist in the system.access
tables will be marked as temporary (TMP) on lineage diagrams unless temporary objects have been disabled.
Gather the Authentication Details¶
The connector supports Basic Authentication for Databricks on AWS and Token-Based Authentication for Databricks on AWS, Azure Databricks, and Databricks on GCP. Choose the authentication type that suits your use case.
Basic Authentication¶
Basic authentication requires the username and password of the user you created for Alation in Databricks (see Create a Service Account).
Token-Based Authentication¶
Token-based authentication requires a personal access token (PAT) of the user you created for Alation in Databricks. Follow the steps in Databricks personal access token for workspace users in Databricks documentation to generate a PAT.
Build the JDBC URI¶
The JDBC URI string you will need to provide in Alation depends on the connector version:
Newer versions 2.0.0 and later use the Databricks JDBC driver.
Older versions below version 2.0.0 use the JDBC Spark driver.
On how to get the JDBC URI for your Databricks resource, refer to Databricks documentation:
Databricks on AWS: Get connection details for a Databricks compute resource
Azure Databricks: Get connection details for an Azure Databricks compute resource
Databricks on GCP: Get connection details for a Databricks on GCP compute resource
When specifying the JDBC URI in Alation, remove the jdbc:
prefix.
Note
The property
UseNativeQuery=0
is required for custom query-based sampling and profiling. Without this property in the JDBC URI, custom query-based sampling or profiling will fail. If you are not using custom query-based sampling and profiling in your implementation of this data source type, you can omit this property from the JDBC URI string.Find more information in ANSI SQL-92 query support in JDBC in Azure Databricks documentation.
Connection String for Databricks JDBC Driver¶
Find more information in Databricks JDBC driver in Databricks documentation.
Format¶
databricks://<hostname>:443/default;httpPath=<databricks_http_path_prefix>/<databricks_cluster_id>;UseNativeQuery=0;
Based on your target compute resource (a Databricks cluster or a Databricks SQL warehouse), provide the appropriate databricks_http_path_prefix and databricks_cluster_id in the httpPath parameter of the JDBC URL.
Examples¶
Compute Cluster
databricks://dbc-32ak8401-ac16.cloud.databricks.com:443/default;httpPath=sql/protocolv1/o/2479012801311837/0612-093241-z79vbfjk;UseNativeQuery=0;
SQL Warehouse
databricks://dbc-32am8401-ac16.cloud.databricks.com:443/default;httpPath=/sql/1.0/warehouses/9f5d50hhsaeb0k23;UseNativeQuery=0;
Connection String for Spark JDBC Driver¶
This format applies to connector versions prior to version 2.0.0. Find more information about the legacy Spark driver in JDBC Spark driver in Databricks documentation.
Note
Connector versions 2.0.0 and newer include the Databricks JDBC driver and use the format described in Databricks JDBC driver.
Format¶
spark://<hostname>:443/default;httpPath=<databricks_http_path_prefix>/<databricks_cluster_id>;
Example¶
spark://adb-58175503737864.5.azuredatabricks.net:443/default;httpPath=/sql/1.0/endpoints/0f38f55be5cbd786;
JDBC URI Properties¶
The connector adds some JDBC properties to the connection. These properties do not need to be explicitly included into the JDBC URI connection string in Alation.
RowsFetchedPerBlock
—Limits the number of objects returned in each fetch call. Used to regulate the amount of memory used by the connector and prevent OOM errors. Set to500
. The memory utilization of the MDE job is captured in the connector logs when debug logging is enabled.
UserAgentEntry
—Identifies driver request calls from the connector in Databricks. Set toalation+unity_catalog
.
Enable Extraction of Complex Data Types¶
Complex data types, such as map, array, and struct are extracted. By default, they will be represented as flat lists.
You can enable tree-structure-like representation of complex data types using alation_conf on the Alation server.
Note
Alation Cloud Service customers can request server configuration changes through Alation Support.
To enable the representation of complex data types as a tree structure:
On your Alation instance, set the alation_conf parameter
alation.feature_flags.enable_generic_nosql_support
toTrue
.Additionally, use the parameter
alation.feature_flags.docstore_tree_table_depth
to define the depth of the display. By default, three levels are displayed.
For information about using alation_conf, refer to Using alation_conf.
Important
After changing values of these parameters, restart Alation Supervisor from the Alation shell:
alation_supervisor restart all
.You can find more information about Alation actions in Alation Actions.