By Jason Lim
Published on May 24, 2022
In the latest release of Alation, 2022.2, generally available on May 24, Alation introduces the Open Data Quality Initiative for the modern data stack, giving customers the freedom to choose the data quality vendor that’s best for them with the added confidence that those tools will integrate seamlessly with Alation’s Data Catalog and Data Governance application. Additionally, a set of key features will accelerate data governance and simplify the security of sensitive metadata.
The perennial problem of using data to make decisions, as coined by George Fuechsel, is garbage in, garbage out. If the data is of poor quality, so too will be the analyses or insights gleaned from it. Data governance helps guide data users to correctly use high-quality data that can deliver value to their organization. Data quality is a critical aspect of data governance to ensure better analysis, consistent regulation compliance, and overall improved data management.
The data quality category, like many in the data ecosystem, is fast evolving and expanding. Consequently, there is no one approach to data quality. To some it means profiling statistics, like min, max, median, and null; to others it means producing an aggregate quality score that can be interpreted as good or bad. Processes to support these features, too, are divergent and proliferating.
Some vendors leverage a sampling based approach where others look to leverage how different workloads are processed and report on the success/failure of a given process. Some vendors leverage machine learning to build rules where others rely on manually declared rules. These solutions exist because different industries or departments within an organization may require different types of data quality.
This has created many different data quality tools and offerings in the market today and we’re thrilled to see the innovation. People will need high-quality data to trust information and make decisions. Data quality has become one dimension of a larger category called data observability, which is about diagnosing the internal state of the entire data value chain, by observing its outputs with the purpose of rectifying issues. And, yet a swivel-seat problem arises where data consumers need to go from one tool to another, to determine the whole picture of data quality as they are trying to understand and use data.
Alation has been leading the evolution of the data catalog to a platform for data intelligence. Higher data intelligence drives higher confidence in everything related to analytics and AI/ML. To be a true platform for data intelligence means, firstly, putting the right data into the right hands at the right time, so people can make decisions based on trustworthy data. Secondly, being a platform means providing openness and extensibility to interoperate with other tools in the data ecosystem. Every organization has a choice about which tool is right for its own environment and culture. Alation empowers customers to seamlessly connect and plug in data quality tools of choice: to become the single system of reference for data. With today’s launch of our Open Data Quality Initiative, we are accelerating our commitment on both fronts.
We have always strived to put high-quality data in the right hands. Prime examples of this in the data catalog include:
Trust Flags — Allow the data community to endorse, warn, and deprecate data to signal whether data can or can’t be used
Data Profiling — Statistics such as min, max, mean, and null can be applied to certain columns to understand its shape
SmartSuggestions — In Compose, Alation’s SQL editor, AI-powered suggestions actively show query writers relevant data to use as they query
Meanwhile, new data quality vendors have been making significant advances. New features include:
Rules — Validate data is correct, for example by checking if postal codes in the U.S. are five digits to meet standards
Metrics — Profile data to provide a sense of it, such as checking for nulls and blanks to ensure data exists
Alerting — Triggers alerts if data exceeds or drops below thresholds, ensuring nothing is abnormal
Since the data catalog is the place where people discover, judge, and comprehend data, it is the natural place to expose data quality rules, metrics, and alerts. Having that context in one place matters; fracturing that context across multiple tools makes judging an asset more challenging.
Faster data-backed decisions result in better business decisions. We have seen that most of our customers have multiple data quality tools across multiple departments. This is not a space where one size fits all.
A pillar of Alation’s platform strategy is openness and extensibility. To help both partners and customers build on top of Alation, we provide open APIs and SDKs. The Lineage & Dataflow API is a good example enabling customers to add ETL transformation logic to the lineage graph. The Open Connector Framework SDK enables engineers to custom-build data source connectors, which are indexed by Alation.
With the Open Data Quality Initiative, Alation introduces an Open Data Quality Framework (ODQF), which includes a starter kit for data quality partners. This kit offers an open DQ API, developer documentation, onboarding, integration best practices, and co-marketing support. With the DQ API, partners can seamlessly integrate their specialty data quality information with Alation Data Catalog. This initiative grants customers the freedom to choose the best data quality tool to meet their needs. Ultimately, this ensures important data context and quality are placed in the hands of every data consumer.
By augmenting rich human data curation in Alation with purpose-built data quality from partners, customers will have a complete view into the trustworthiness of data. It is advantageous to data consumers, such as data analysts and data scientists, to connect data quality context into their workflow in Alation. For instance, via lineage, analysts can understand if upstream data dependencies have reliable data quality. This helps to operationalize data, so people can see data quality and act on it.
SVP and Research Director, Ventana Research
“Our research shows that data quality is the most impactful data governance issue organizations face affecting 89% of survey participants,” said Dave Menninger, SVP and Research Director, Ventana Research. “Alation’s Open Data Quality Initiative will help organizations more easily establish and maintain quality and trust in their data.”
How will new DQ features appear in the data catalog? Within Alation, a new “Data Quality” tab provides a framework for storing data quality rules, status, and descriptions from an external data quality tool. This rich data quality information then enables data quality flags of endorsements, warnings, or deprecations to be automatically triggered. Alerts can be configured to immediately inform people when there is a data quality issue to be addressed.
Let’s walk through an example of how this could work in practice. For a retail sales transaction table, a data quality rule states that if a transaction date is missing in the column, the transaction is void and a critical alert is triggered. Since 70% of transactions in the table are missing a transaction date, the table is automatically deprecated in Alation, actively telling data consumers they can’t trust the table. All of these rules, metrics, and conformance, set in the external data quality tool, are displayed in Alation, giving data consumers a trusted data point for self-service BI consumption.
“At Kestra Financial, we need confidence that we’re delivering trustworthy, reliable data to everyone making data-driven decisions,” said Justin Mikhalevsky, Vice President of Data Governance & Analytics, Kestra Financial. “We are excited about Alation’s partnership with Bigeye as it gives all users, not just data engineers with technical expertise, a way to easily determine whether they can trust the data and use it right away.”
Data quality partners, including Acceldata, Anamolo, Bigeye, Experian, FirstEigen, Lightup, and Soda, are signed up today, expressing their commitment to integrate with Alation through our new Open DQ API.
We are excited to launch the Open Data Quality initiative and welcome more fantastic data observability and data quality partners into our ecosystem. We believe there is tremendous value in offering our customers the choice to pick the best and most appropriate tools from across the modern data stack to address your organization’s specific requirements.
If you are also excited and ready to be part of the Data Quality Initiative, we welcome your involvement:
Data quality vendors: register to become a partner here
Customers: find more information on Alation Community
To learn more about Alation’s Open Data Quality Initiative, you can read the interview with our Open Data Quality Initiative Product Manager, Peter Wang.
By strengthening the foundation of data in Alation with the right data quality, data can be governed more effectively. Consequently, better governed data leads to better data intelligence and realized business value. To harness the relationship between data quality and data governance, Alation is investing in accelerating governance capabilities and simplifying the security of sensitive metadata.
A business glossary is critical to aligning an organization around the definition of business terms. Without an easily accessible glossary, people can easily misunderstand and therefore miscalculate important metrics, such as revenue or customers. Robust data governance starts with understanding the definition of data. A unified glossary in the data catalog truly gives people a way to understand data.
In 2022.2 Alation is elevating the glossary to its own object type. The glossary experience will be fundamentally enhanced by improving the UI and discoverability of glossaries and related business terms. It will allow for layout customization and better version history tracking to determine how it has changed over time. Related data objects, such as tables, business intelligence, and related terms, can be directly linked for easier discovery and context.
The stewardship workbench within the data governance app empowers data stewards to bulk curate data using search and filters. In 2022.2, new bulk actions, including assigning and removing stewards and updating custom fields, have been added. The ability to quickly drill down to relevant data and make bulk changes saves stewards the time and headache of doing it manually, one by one.
For example, a data steward can filter all data by ‘“endorsed data’” in a Snowflake data warehouse, tagged with ‘bank account’. Hundreds of results appear and immediately, a steward can select a custom field called “Contains PII” and update it with “Yes.” Such a simple yet powerful metadata change mechanism accelerates governance, especially for compliance and auditing requirements.
Lineage helps everyone to see relationships between data, from source to target, and diagnose critical problems. Alation is extending automated column level lineage in 2022.2 for the popular database SQL Server. Additionally, support for MySQL, Oracle, PostgreSQL, and DB2 will be in beta. This builds on existing column-level lineage support for AWS Redshift, Google BigQuery, and Snowflake.
In Alation, lineage provides added advantages of being able to add data flow objects, such as ETL transformations, perform impact analysis, and manually edit lineage. Ultimately, this provides granular visibility into data dependencies and relationships for enhanced data governance.
Data quality reinforces data governance. Combining critical data quality information with well-governed data puts data intelligence into action. With this latest initiative, Alation puts vital, necessary data quality information into the hands of more people who need it.
And this is just one of many advancements that is a part of the 2022.2 release. Learn more about the Open Data Quality Initiative by exploring the resources below.
See the data dialog: Why an Effective Data Quality Program Includes a Data Catalog
Read the press release
Book a demo today.