Understanding Data Quality Checks¶
A data quality check is a rule applied to a table or column that evaluates whether the data meets an expected condition and within the defined threshold value.
A threshold is the expected value or range of values that a data quality metric must satisfy during a check. If the observed data meets the threshold, the check passes; if it violates the threshold, the check fails.
Comparison Operators¶
Operator |
Meaning |
---|---|
= |
Equal to |
< |
Less than |
> |
Greater than |
<= |
Less than or equal to |
>= |
Greater than or equal to |
!=, <> |
Not equal to |
between |
Value is within a specified range |
not between |
Value is outside a specified range |
Each check is designed to detect specific types of data quality issues such as missing values, invalid formats, duplicate records, or outdated timestamps.
Result |
Meaning |
Contribution to Score |
---|---|---|
Pass |
The data meets the check’s condition |
Positive |
Fail |
The data violates the check’s condition |
Negative |
Error |
The check failed to execute due to a syntax or runtime issue |
Negative |
The data quality checks are grouped into two main categories:
Table-level check: Includes numerical and custom SQL query check.
Column-level check: Includes numerical, uniqueness, completeness, validity, and custom (common table expressions and SQL query) checks.