Bot Configuration

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

ASM Settings for Bots

Visit your Alation Services Manager (ASM) settings page to ensure the following are configured:

  1. Alation instance: Configure your Alation instance’s details here. You can choose to configure basic authentication (username and password) or a refresh token.

  2. Alation Analytics: Configure your Alation Analytics database details. Snowflake- and Postgres-hosted Alation Analytics databases are supported.

  3. SMTP: Optional server details for Bot notifications via email, if required.

  4. Slack: Optional Slack app details for Bot notifications via Slack, if required.

Configuring a Bot

We will start with a sample Bot configuration. Below is a “Data Source Ownership” Bot which:

  • Reads all data sources and associated attributes from the rdbms_datasources Alation Analytics table

  • Checks if the Steward OR the Data Owner field for a data source is empty (you can choose between AND/OR operations)

  • If empty then start a conversation to notify the user(s) who have been assigned as stewards to the datasource. If there are none then notify the user #2 who is the default user with subject “Datasource without Steward(s) or Data Owner(s)”

Bot configurations use json format. To create this Bot, go to your Bot Manager in ASM, click the Add Bot button, click Edit, and copy this json into the configuration section:

{
    "bot_name": "Data Source Ownership",
    "description": "Bot to check for data sources without Steward(s) or Data Owner(s)",
    "conn_type": {"sf":
                    {"otype": "DATA",
                    "aa_query": "SELECT * FROM public.rdbms_datasources WHERE deleted = False",
                    "last_etl": "SELECT last_load_time from information_schema.load_history where table_name = 'STG_RDBMS_DATASOURCES' ORDER BY last_load_time DESC LIMIT 1"}
                  },
    "create_policy": True,
    "rules": [
        {
            "fields_to_check": [
                {"Steward":{"condition":"IS NULL", "field_type":  "OBJECT_SET"}},
                {"Data Owner":{"condition":"IS NULL", "field_type":  "OBJECT_SET"}}
            ],
            "fields_to_check_operator":"or",
            "notification": {"type": "conversation",
                             "subject": "Datasource without Steward(s) or Data Owner(s)",
                             "frequency": "always",
                    "to": [{"column": "steward"}]
                             "default": [{"user": 2}]}
        }
    ]
}

The column in the to section can be any valid column in the query response that contains valid user or group id details. In the default column on the last line, a valid Alation user or group id should be given.

Click the Validate button to check the format, then click Save. Your Bot has now been created, and the corresponding Bot Policy will automatically appear in your Alation catalog.

Configuring a Schedule

Bots are run manually by default but you can choose to schedule each Bot individually to run automatically at a frequency of your choosing.

Schedule a Bot in ASM’s Bot Manager screen by enabling its Scheduled toggle and enter a cron string alongside. For example, to run the Bot at midnight every day use cron string 0 0 * * *.

For more examples of cron strings please refer to this external site. To check your cron string enter it here.

Bot Parameters

Here is a complete list of the Bot configuration parameters:

bot_name

The name of the Bot in the Bot Manager

The name of the Bot Violation Policy within Alation

description

This description will be added to the description in the created Bot Violation Policy within Alation

conn_type

How your Bot will interact with Alation Analytics. There are two possible settings here.

sf represents Snowflake, and this value tells your bot to use the Snowflake connector to communicate with Alation Analytics

pg represents Postgres, and this value tells your bot to use the Postgres connector to communicate with Alation Analytics

Example:

"conn_type": {"sf":

{"otype": "DATA",

"aa_query":"",

"last_etl":""}

}

otype

Alation otype that the Bot will be working with. See supported otypes below

custom_template_name

This is required when running a Bot against a BUSINESS_POLICY. The entry needs to be the exact name of the policy template entered into Alation.

aa_query

Data that the Bot will be analyzing. See Alation Analytics example queries below

last_etl

When a Bot runs, it stores the date_time of that run. Before it runs again, it compares the stored date_time to the last run of Alation ETL to see if there is any new data. If there is no new data, the Bot will not run. See last_etl examples below.

create_policy

Bot Violation policies are not always required. Pass true or false to instruct the Bot to create a policy.

There will be times when you just want your Bot to run and perform an action if a condition is met. For example, attach a specific policy to an object if a condition is met. This example does not need a Bot Violation Policy; just have the Bot perform the action.

rules

A collection of rules your bot will run. Rules are waterfall-based, and the first rule condition satisfied wins. Any following rules will not be examined. Bots have a max limit of 4 rules.

fields_to_check

These are the fields that the condition in the rule should check. They are Alation fields and must be typed exactly as they are labeled within Alation. Spaces are acceptable, and the names have to match.

Multiple fields_to_check entries are acceptable. If more than one entry, then the fields_to_check_operator is required.

Example:

"fields_to_check":[

  {"Certified":{"condition":"IS NULL", "field_type": "PICKER"}},

  {"Phase One":{"condition":"IS NULL", "field_type": "PICKER"}},

  {"Phase Two":{"condition":"IS NULL", "field_type": "PICKER"}},

  {"Phase Three":{"condition":"IS NULL", "field_type": "PICKER"}}

  ],

"fields_to_check_operator": "or",

fields_to_check_operator

If more than one fields_to_check entry is added to a rule, a fields_to_check_operator is required. This can be and or or.

properties_to_check

You can create properties to check from your aa_query. These properties are analyzed just like fields_to_check. Multiple properties_to_check are permitted, but if more than one, the properties_to_check_operator is required.

Example:

"aa_query": "SELECT \*, DATEDIFF(day, verified_date::DATE,

CURRENT_DATE) as date_span FROM public.business_policy

WHERE policy_type_name = 'Compliance' AND deleted = False",

{"date_span":{"condition":"GREATER THAN", "value":"120", "field_type": "TEXT"}}

properties_to_check_operator

If more than one properties_to_check entry is added to a rule, a fields_to_check_operator is required. This can be and or or.

condition

Supported conditions are

IS_NULL

IS NOT NULL

CONTAINS

DOES NOT CONTAIN

GREATER THAN

LESS THAN

value

Required if the following condition is used

CONTAINS

DOES NOT CONTAIN

GREATER THAN

LESS THAN

field_type

Support field types are

TEXT

DATE

MULTI_PICKER

OBJECT_SET

PICKER

RICH_TEXT

actions_to_perform

If a condition is satisfied, one or more actions can be performed. An action is altering the data within a field.

Example:

{"Business Policy":{"action":"ADD","field_type": "OBJECT_SET", "values":["business_policy_24"]}}

action

Supported actions are

ADD

REMOVE

field_type

Supported Alation fields are

TEXT

DATE

MULTI_PICKER

OBJECT_SET

PICKER

RICH_TEXT

value

The value that will be passed to the field. Notice that this is a list since a MULTI_PICKER field is supported.

Notifications

These are the parameters used when a notification is required:

type

conversation

email

slack

subject

{string}

frequency

updates - Any time there is a change in the violation count + or -

always - Every time the Bot runs, a notification is sent regardless of the violation count

to

a list of objects that specify which user to notify if the validation fails. Each column in the to section can be any valid column in the query response that contains valid user or group id details (optional).

default

a list of valid user information based on the selected type of notification in the config. (user or group id for conversations; or valid email id for email and slack)

Conversation Example

"notification": {
        "type": "conversation",
        "subject": "Term not curated",
        "frequency": "always",
  "to": [
          {
            "column": "steward"
          }
         ],
        "default": [
          {
            "user": 6
          },
          {
            "user": 2
          }
        ]
      }

A Conversation is then created with the specified Alation user ID - users 6 and 2 in this example.

Email Example

"notification": {
    "type": "email",
    "subject": "Schema Completeness",
    "frequency": "always",
    "to": [
      {
        "column": "steward"
      },
      {
        "column": "data_owner"
      }
    ],
    "default": [
      "[email protected]"
    ]
  }

Slack Example

"notification": {
    "type": "slack",
    "channel": "bot-auto-notification",
    "subject": "Notification Check",
    "frequency": "always",
    "to": [
      {
        "column": "steward"
      }
    ],
    "default": [
      "[email protected]"
    ]
  }

Supported otypes

These are the Alation object types supported and the corresponding AAv2 tables:

otype

AAv2 table

SCHEMA

rdbms_schemas

TABLE

rdbms_tables

ATTRIBUTE

rdbms_columns

DATA

rdbms_datasources

BUSINESS_POLICY

business_policy

“aa_query” Examples

The following example SQL queries can be used in the aa_query parameter:

Data Source

SELECT * FROM public.rdbms_datasources WHERE deleted = False AND EXCLUDED = False

Schema

SELECT * FROM public.rdbms_schemas WHERE deleted = False AND EXCLUDED = False

Table

SELECT * FROM public.rdbms_tables WHERE deleted = False AND EXCLUDED = False

Column (Attribute)

SELECT * FROM public.rdbms_columns WHERE deleted = False AND EXCLUDED = False

Policy

SELECT * FROM public.business_policy WHERE deleted = False AND EXCLUDED = False

More complex queries can be used. The following example forms a flattened list of stewards’ email addresses and data_owners along with associated schemas, titles and descriptions.

WITH tmp_people AS (
    SELECT
        schema_id,
        SPLIT_PART(f.value::STRING, '_', 1) AS people_otype,
        TRY_TO_NUMBER(SPLIT_PART(f.value::STRING, '_', 2)) AS people_oid,
    FROM public.rdbms_schemas,
    LATERAL FLATTEN(input => steward) AS f -- Flatten the steward array
    -- steward_value is the alias for flattened values
    WHERE f.value IS NOT NULL AND f.value != ''
),
-- Aggregate email by schema_id
processed_people_set AS (
    SELECT
        TP.schema_id,
        ARRAY_AGG(USR.user_email) AS steward
    FROM tmp_people TP
    JOIN public.users USR ON TP.people_oid = USR.user_id
    WHERE TP.people_otype = 'user'
    GROUP BY TP.schema_id
)
SELECT
PPS.steward,
RT.data_owner,
RT.schema_id,
RT.title,
RT.description
FROM public.rdbms_schemas RT
LEFT JOIN processed_people_set PPS ON RT.schema_id = PPS.schema_id

“last_etl” examples

For the second SQL query parameter labelled “last_etl”: Each time the Alation Analytics ETL job runs within your Alation Instance it leaves a “last updated” timestamp in the target database for each object it updates. If you make a catalog change to table metadata for example, the last updated stamp for the rdbms_table table will be updated to that time. When a Bot runs it will check to see if it has a stored timestamp from the last time the Bot ran. If it does, it will compare that timestamp to the one in the data object that the Bot is running against. If the time stamp for the data object is newer then the one stored, it will run the bot because there must be some new data. Otherwise the Bot will not run. If there is no timestamp stored because this is the first time the Bot has been run, it will ignore timestamps and just run. This efficiency feature eliminates unnecessary Bot executions on data that has not changed.

Data Source

SELECT last_load_time from information_schema.load_history where table_name = ‘STG_RDBMS_DATASOURCES’ ORDER BY last_load_time DESC LIMIT 1

Schema

SELECT last_load_time from information_schema.load_history where table_name = ‘STG_RDBMS_SCHEMAS’ ORDER BY last_load_time DESC LIMIT 1

Table

SELECT last_load_time from information_schema.load_history where table_name = ‘STG_RDBMS_TABLES’ ORDER BY last_load_time DESC LIMIT 1

Column (Attribute)

SELECT last_load_time from information_schema.load_history where table_name = ‘STG_RDBMS_COLUMNS’ ORDER BY last_load_time DESC LIMIT 1

Policy

SELECT last_load_time from information_schema.load_history where table_name = ‘STG_BUSINESS_POLICY’ ORDER BY last_load_time DESC LIMIT 1

Further Bot Examples

It is often useful to create Bots with rules based on elapsed time. This type of Bot can be used for managing content that should be checked on a regular basis, or finding content that may have expired for example.

This is achieved by creating a date diff property in the aa_query being run. The date diff will return the number of days between two dates. So in the query we take the current date and the date stored in a custom field and return the difference in days. Then in the Bot rule we test that number and choose to do something if the number exceeds some threshold.

The following example Bot checks that Compliance type policies are verified no later than every 120 days. Aspects including the notification, policy type, and time range can be adjusted to meet your needs:

{
      "bot_name": "Bot Sample - Verified Policy",
      "description": "Bot to check if policy has been verified within a time range",
      "conn_type": {
                      "sf":{
                          "otype": "BUSINESS_POLICY",
                          "custom_template_name": "Compliance",
                          "aa_query": "SELECT *, DATEDIFF(day, verified_date::DATE, CURRENT_DATE) as date_span FROM public.business_policy WHERE policy_type_name = 'Compliance' AND  deleted = False",
                          "last_etl": "SELECT last_load_time from information_schema.load_history WHERE table_name = 'STG_BUSINESS_POLICY' ORDER BY last_load_time DESC LIMIT 1"
                      }
                },
    "create_policy": true,
    "rules": [
      {
          "properties_to_check": [
              {"date_span":{"condition":"GREATER THAN", "value":"120", "field_type":  "TEXT"}}
          ],
          "notification": {"type": "conversation",
                           "subject": "Policy verification has exceeded the 120 day limit. Please re-verify.",
                           "frequency": "always",
                    "to": [
          {
            "column": "steward"
          }
         ],
                           "default": [{"user": 2}]}
      }
    ]
}

Sample Bots on Github

A wide variety of sample Bots can be found on our public GitHub page.