New Data Quality Solutions with Alation and Databricks

By Ben Ng

Published on June 10, 2024

Assessing data quality in workflow just got easier. Alation is excited to announce a deepened integration with Databricks to deliver critical data health information to more data users. Our new Data Quality Processor for Databricks empowers business users with DQ metrics from lakehouse data in Databricks in a single, consistent view within Alation. This deeper partnership is made possible through Alation’s Open Data Quality Initiative and builds on Databricks’ latest innovations regarding Lakehouse Monitoring. Read on to learn more.

The value of data health in the data catalog

For newcomers to data assets, the first question they must ask themselves is: Can I trust this? This new integration with Databricks provides these newcomers with a quick answer, enabling trust, streamlining decision-making, and improving data reliability. Now, a wider community of data users can benefit from visibility into the latest quality metrics from Databricks within the Alation platform. 

This integration builds on Databricks’ new Lakehouse Monitoring feature. This capability enables Databricks users to address questions like: 

  • What is our data integrity, and how is it changing?

  • How are the inputs for ML models evolving?

  • What does the drift of a subset of data look like?

Technical users can review the actual profiling results of these questions within Databricks. And now, with this integration, data leaders can bring summaries of these results into Alation, making them consumable and usable for business users. By highlighting these results in the health tab in Alation, users can more quickly assess whether data is fit for usage. Here’s an example of what it looks like:

Mockup showing how Alation displays data quality health metrics from Databricks in its UI

By leveraging the Lakehouse Monitoring results, data leaders can assign rules, which are used to calculate and expose quality rating to end users in Alation – pointing them toward trusted data and warning them away from that likely to be erroneous. 

As a result, business users benefit from greater trust and confidence in data from the outset, as they can more quickly assess its fitness for use. Data scientists, engineers, and AI/ML experts also benefit from improved data lineage visibility so they may more confidently leverage high-quality data for AI initiatives.

Curious how to get started? Any IT leader who has Databricks and Alation and seeks to empower data users with data health indicators based on profiling metrics can implement this (without having to write a script!) You can simply pull in the data, apply the rules specific to your organization, and populate the Data Health tab in Alation.

How does the Data Quality Processor for Databricks work?

With the recent launch of Lakehouse Monitoring, Databricks users can easily enable and monitor key statistical properties of data, tracking the performance of ML models more naturally. Alation will be offering a solution that leverages the outputs of that monitoring and exposes the results for end users to see in the Data Health tab in Alation. This enables users searching for data to quickly access (and assess) data with an eye to critical health metrics. Leaders can assign profiling results to specific buckets, signaling to downstream users the relative health of an asset at the point of use.

Graphic showing how Lakehouse Monitoring metrics inform DQP rules in Alation and how they are displayed to the end user

Conclusion

Thousands of data experts and newcomers around the world leverage the Alation platform to find trusted data and innovate with it. They come to Alation to find, understand, and trust data, using it to drive meaningful business outcomes. They know that you can’t trust that data unless it’s high quality. Now, with this profile-driven information in the Data Health tab, business users have signals at the “gate” for lakehouse data that indicate trustworthiness and inform more intelligent usage.

Databricks customers can now expand the reach of Lakehouse Monitoring within Alation and expose trusted data to a broader audience, democratizing access and data-driven decision-making.

Learn more about how customers leverage Alation and Databricks to modernize data management in this case study, RaceTrac Fuels Real-Time Data Insights for Decision Making with Alation and Databricks.

    Contents
  • The value of data health in the data catalog
  • How does the Data Quality Processor for Databricks work?
  • Conclusion
Tagged with