Understanding Data Quality Checks

A data quality check is a rule applied to a table or column that evaluates whether the data meets an expected condition and within the defined threshold value.

A threshold is the expected value or range of values that a data quality metric must satisfy during a check. If the observed data meets the threshold, the check passes; if it violates the threshold, the check fails.

Comparison Operators

Operator

Meaning

=

Equal to

<

Less than

>

Greater than

<=

Less than or equal to

>=

Greater than or equal to

!=, <>

Not equal to

between

Value is within a specified range

not between

Value is outside a specified range

Each check is designed to detect specific types of data quality issues such as missing values, invalid formats, duplicate records, or outdated timestamps.

Result

Meaning

Contribution to Score

Pass

The data meets the check’s condition

Positive

Fail

The data violates the check’s condition

Negative

Error

The check failed to execute due to a syntax or runtime issue

Negative

The data quality checks are grouped into two main categories:

  • Table-level check: Includes numerical and custom SQL query check.

  • Column-level check: Includes numerical, uniqueness, completeness, validity, and custom (common table expressions and SQL query) checks.