Paola is the former CTO of Hawaii, and has served as a CIO, CSO and Global Privacy Officer and VP at multi-billion dollar organizations around the world. Paola is the recipient of numerous awards, including a government award commending her cybersecurity efforts. She is passionate about bringing digital ethics into the workplace, and currently serves as an adjunct professor at Georgetown University and the Pan-American Business School.
As the Co-founder and CEO of Alation, Satyen lives his passion of empowering a curious and rational world by fundamentally improving the way data consumers, creators, and stewards find, understand, and trust data. Industry insiders call him a visionary entrepreneur. Those who meet him call him warm and down-to-earth. His kids call him “Dad.”
Space, it’s the final frontier. It’s the Wild Wild West. And what do we all want to do in a state of disorder? Create rules and laws, of course. In fact, organizations are planning for space governance right now. In its user agreement, satellite internet provider Starlink is already thinking about wifi on Mars. Here’s some language from the contract.
For services provided on Mars or in transit to Mars via Starship or other colonization spacecraft, the parties recognize Mars as a free planet, and that no other earth-based government has authority or sovereignty over Martian activities.
Now, eventually we’re going to get to Mars. And if we want to do a good job of preparing for our future, we’ve got to know who’s going to build the roads. We’ll need to know how we’re going to collect the taxes. And the Mars homeowner association will definitely need to decide which four colors your spaceship garage can be.
And all these potential rules remind me quite a bit of data governance. It might be a bit contrived, but stay with me for a bit. The field is less than 50 years old. The ones and zeros that we deal with are just as disorganized as the red soils of Mars. And in a profession that advocates for every business to have a glossary of terms, we’re still debating exactly what data governance even means.In essence, we’re constantly developing rules on unstable ground. And even while we’re figuring out the basics, things are only getting more complicated. We have to ask ourselves how automation and AI will transform our field. We need to consider ethical questions that are only hypotheses today.
We are exploring a strange new world, but one day maybe it won’t be so new or so strange. If we want our field to be everything it can be, we need to think like we’re going to Mars. So, on this episode of Data Radicals, we’re looking at what data governance is, and how to make it better now and in the future.
I speak with Paola Saibene, a data governance expert and digital ethicist, who is a principal at Teknion Data Solutions. She also teaches at multiple universities, enlightening tomorrow’s data radicals on digital ethics, global data privacy, information security laws, and more. And today she’s going to be doing the same for us. So, strap on your space suit and prepare for takeoff. It’s time to bring order to this strange disheveled world of data.
Producer Read: Welcome to Data Radicals, a show about the people who use data to see things that nobody else can. This episode features an interview with Paola Saibene, principal consultant at Teknion Data Solutions, award-winning cybersecurity professional, and multi-billion-dollar C-suite executive.
On this episode, she joins your host, Satyen Sangani, to discuss data governance, how mishandling your data can cost you millions, protecting the people behind the data, and much more.
Data Radicals is brought to you by the generous support of Alation, the data catalog and data governance platform that combines data intelligence with human brilliance. Learn more at Alation.com.
Satyen Sangani: It’s an understatement to say that navigating a new frontier can be daunting. Our clients can sometimes feel the same way about data governance. Paola and her team have designed a methodology that helps their clients face data challenges head on.
Paola Saibene: We spend a lot of time using a methodology we call “governance by design.” And that includes, of course, the traditional data governance and all of the appropriate DAMA–DMBoK (DAMA’s Data Management Book of Knowledge, a comprehensive guide to international data management standards and practices for data management professionals) framework. And that’s about a third of what we do.
The other third is effectively blending all of the risk privacy, cyber procurement, and contractual vendor management components that actually affect the access quality rights to that data, but doing it in such a way that all of those frameworks are really, really blended. And you can execute in unison as opposed to requiring different parties do independent work on the data, or check or audit that data in a way that is just exhausting, and people give up.
So, having this comprehensive way of doing it quickly — because they’re all integrated — is our focus right now. We’re seeing great results when all of a sudden you have cybersecurity people say, “Wait, some of the heavy lifting can be done by the data people?” Yes. And they actually want to do the heavy lifting, because you wouldn’t know:
How to rank this properly
The valuation
The purpose of that data
The final purpose can be dictated by those stories. They really know it the best as opposed to cyber, who’s going to try to protect everything.
With the advances of the regulation in data privacy, you see a lot of different parties like compliance for cyber or risk, inheriting that privacy function. And they may not be the best candidates in organizations without a privacy office, or a privacy program.
Data governance, the data people — however it is called in the organization: management or governance or a combination — can also come to the rescue and help in a number of things from the rope, from the GDPR to understanding how the architecture ought to be set up for IT governance people to say whether this is well done or not.
Satyen Sangani: When do people come to you? You mentioned a lot of complicated things like cybersecurity, privacy, and risk, all of which are often developed in data silos. So when do your clients come to you and say, “I need something more robust or rationalized.” Or when do they have that realization?
Paola Saibene: There are different doorways to that. One could be: “We don’t trust the data.” So, you’ll have to hunt for, what is that? Where does the ball drop, if you will, at which level of those functions, to be able to assist them there? Others could say: “We’re just not sure whether we are complying or not.” Or: “Can we do more with our data?”
There are many questions related to monetization. How do we monetize this property without incurring liability — reducing the risk? And monetization, as we know, doesn’t always mean selling the data, it could just be sharing it for enrichment purposes, but that is a great place to start.
And then when they start that journey, a number of things may need to be cleaned up to just enable that function. Doing that properly actually accelerates the final step of execution and delivery, and increases the value in that partnership as they engage in monetization.
So, it needs to be a part of several journeys of people, from those who aren’t sure whether they have something good and are in good health, to those who say, “We think our stuff is pretty good, and it’s very unique. Help enable us to share it, and get something out of it.”
Satyen Sangani: So, the essence is they’re trying to do something, broadly make money as businesses want to do. And then they want to make sure they don’t run afoul of any of these effective rules.
Do they call it data governance when they first meet you, or is that something that’s learned by them? They say, “I don’t trust the data,” but do they say, “I need data governance,” or what do they say?
Paola Saibene: They don’t. Most of them don’t, and that’s quite all right, because the definition of data governance is actually quite nebulous. So, depending on the industry or sector — depending on who you ask — you’ll have different folks that refer to different frameworks. So, just think of it differently.
We just look at the maturity of the data:
If it fits to full-on data governance engagement, great.
If it’s more of a quality engagement, great.
If it’s more of a cataloging engagement, then that is the right spot for them to start and mature.
We don’t always call it data governance, but we do explain that this governance by design — thinking about putting the right rules in place so that you can expedite, whatever you need to do — is super important. The design of that takes a bit of skill and experience. Instead of multiplying that effort, you can have it all consolidated, and just engage quickly.
Satyen Sangani: As much as it pains me to say, we’re still struggling to define what data governance means. But one of Paola’s clients gave her a great definition that some of us may be able to recycle.
Paola Saibene: I heard it best just a week ago from one of our clients: “I just want certified data.” That resonates with a lot of people. It may not be the definition for everyone, but I think it works for many people. Is there a stamp of approval for data to run in my business analytics, in my automation, in my AI work? Because then I’ll know the data is good. That certification process comes with quite a bit of work in the background. But I thought that that simple two-word phrase was perfect.
Satyen Sangani: The concept of certified data speaks to the importance of being transparent about what it means for data to be good. When it comes to defining our data, it’s not just important that we get things just right, but that we all agree upon what right means. I asked Paola to share an example of why that’s so essential.
Paola Saibene: I’ll give you a couple of examples. I think one was very telling because it had to do with financial data, and it had to do with just definitions, and in a place that had a lot of turnover. And then, of course, the definitions had not been documented properly, resulting in millions of dollars in gaps, reports, and complications. The calculations had been derived from macros. They were keeping things up in Excel, etc., so it led to a myriad of issues that are common to many places.
Satyen Sangani: You’re saying that not defining data leads to millions of dollars in losses — explain that.
Paola Saibene: Several parties in the financial unit were determining what needed to be amortized differently. And they were also recording the invoices, and recording the AR and AP dates, in a slightly different way. As far as when data happened, some of them would close it differently, some of them would open it differently.
The portion that was intriguing was the amortizing. There was a very loose definition depending on the senior individuals in the organization recording it in a particular way, because it wasn’t a regular asset that was software-related. And then those who were trained 20 or 30 years later had different definitions, then they had to merge data in a company that already had acquired many other companies.
The portion that was intriguing was the amortizing. Those numbers already were fragmented, and had to be unified. It was a struggle. It was not beautifully finalized or arranged in the end; there had to be a compromise. The definition had to be arbitrarily decided moving forward, knowing there would potentially be some things that would not be correct retrospectively in the data. It’s an unfortunate situation, but so many organizations go through that.
Satyen Sangani: So, you had one team, and one set of individuals from this group doing things one way, recording things one way. You had another recording things in quite a different way. And all of a sudden, you had to say, “We’re going to do it this way going forward.” And, of course, one system therefore had history, but the other system would not have had that same history.
That’s such an interesting example, because in some ways, the way in which you measure something then drives things. There was a feedback effect. You think about measurement as being this totally independent thing within an organization: you just observe the world and you measure it. But in this case, it seems as if the measurement actually drove the process, and how you measured drove how the organization worked.
Paola Saibene: It always does. Measurement needs to be monitored, which is often overlooked. It isn’t an excess task, or a burden to just keep track at least on a yearly basis. “Are we still okay with this definition?” Make sure that all of the stakeholders involved say yes or no.
I think this is changing, where we’re making sure that metadata is added appropriately. You have to keep up with it, and it’s actually more important than other processes later on. Business analytics and business intelligence often end up displayed on beautiful dashboards, with sophisticated calculations, but based on data that hasn’t really been consistently defined. I’m not even talking about quality, and I’m not talking about the cleanliness of the data, or whether the validity is there — just simple definitions, the most basic of all the pieces. And it is incredible how it can propel one way or the other
Satyen Sangani: If we want to create a future as radical as we think it can be, then we’re going to have to advocate for it. Paola has some thoughts on how we can do exactly that.
Paola Saibene: I’ve worked in many places — government, commercial, nonprofit, education — and many folks at the top level would say, “Yes, we’re supporting it.” But from saying it to actually doing it, there’s quite a bit of a gap sometimes. I think with a good design and a strategy in place, a bottom-up approach tends to work better than top-down.
By that I mean having educational gamification plus exercises for folks in lower management. Give people performance indicators tied to improving the health of the data, or find ways of actually increasing literacy without having to watch another compliance webinar or the usual data-related classes that put you to sleep if you’re not in that space or have a fascination for that space.
Make it as fun and engaging as possible. Go straight to exactly what that business unit should be concerned about. And then what each individual needs to know, so they understand their special role in the treatment of this data — that they’re so important because nobody else can do it quite like they do.
Their task is generally unique, so their contribution to this health is enormous. And when they latch onto that concept, and when they see that they are part of this landscape of evolving and improving the data, I think it makes them look for more. I know I’ve experienced it. It makes them look for more gaps, and they’re the ones that bring it up into the stewardship council, or into the data governance council.
And then you see this force that is constantly monitoring and controlling and improving the quality, and it’s a beautiful thing. And then at that point, top leadership begins to support it more, fund it more, and see it as one of the top indicators of organizational health.
If you can also embed it in enterprise risk management (ERM), and have it be something they keep an eye on — especially the ERM unit which is more advanced and more engaged with the business — then that’s a great formula for success.
Satyen Sangani: If those are the formulas for success, what are the modes of failure? What’s the counterfactual? What do people tend to do if they don’t always do that — or maybe they always do that. What do they do when they fail?
Paola Saibene: I would say a couple of things I wouldn’t touch, because I think that they fail almost always. And that is not tying things to top leadership KPIs, not just business units. If you don’t align completely with the business strategy and what is important at that moment, then you’re going to lose momentum slowly but surely.
It is true that in many cases, the data is suffering at a level that it does not match with the KPI, but you have to address which way you go. If you need the program to succeed, then you have to blend enough of what is important to top leadership and to the board and to the C-suite so you can then continue to do more work in the other gaps that you might have.
For instance, not many people worry about retention schedules, and they’re such an important part of the data life cycle. But you may want to just talk about the fact that if IT needs to have a budget cut, and if storage is something that has been excessively used, how much of the data that should be purged is in storage that need not be, that is increasing the cost of operations? Normally OPEX.
By just combining those areas and looking for ways of understanding, how does the effort in data governance, data maturity, come close to addressing one of the KPIs of the organization? And that goes a long way.
The second one is, on one hand, making sure that you’re relevant; on the other hand it’s not having enough marketing and communication about the program. Data governance is not something that makes most people just say, “Wow, I really want to do data governance today. I’m so excited about it.” It doesn’t quite happen that way. Making sure you are, in this case, conveying and providing clarity as to why it is propelling the organization forward.
Satyen Sangani: So, your clients are now data radicals, but how do you help them decipher which data points are the most valuable for their organization?
Paola Saibene: You decide by a number of formulas. One of them is, where does that data land normally? How unique? Some data will help you, but in terms of the monetization piece, there are four general buckets. And one of them has to do with the uniqueness of that data.
And if you have in your organization data that may not — it could be even IOT data, it could be sensor-related data — but if it is unique enough, then it has to be treated differently from data that is not going to be giving you any rewards or any revenue back.
It is a business conversation. You don’t decide that at the data governance level, you let the business get heavily involved, deciding that and helping you then prioritize, and helping you map how that data governance evolves, how the data maturity effort evolves based on what they have dictated that is important to them.
Satyen Sangani: So, you’re saying talk to the potential consumers of the data in order to get their perspective on whether or not it could be useful?
Paola Saibene: Absolutely. Yes.
Satyen Sangani: And just like space is constantly expanding and changing, we also have to be ready for our own field to evolve. What does Paola think the next few years will look like in the world of data?
Paola Saibene: I see a lot of automation happening in this space more and more at various layers. So, I think that a lot of the manual work that we do right now is just going to go away. But I think what we’re going to be seeing is the role of how data is treated — I’m going to call it “data governance” for the sake of just using the words — in the algorithmic calculations and the AI designs, and the platforms and the software. And how we are going to play a role foundationally in any flaws that happen subsequently.
We want to be able to have that separation of, “Well, it’s the data, that’s why AI is failing, or the algorithms are off,” as opposed to being able to dissect it very clearly, and understand that, no, the data was perfectly fine up until the point of those algorithmic calculations, post business intelligence. And those models and those designs that have been created (and that software, then) have to be looked at further.
It’s almost like an audit lens, and that we need to be very sharp on, and it is going to require that you segment. Just because if you look at, for instance, the FTC, and the questions they’re asking from companies and organizations, and the actual expectations, the declared expectations, the liability associated with AI, and the sampling issues, etc. You have to have a defensible posture in the organization when you’re going to engage more and more, especially with PII, PHI, etc.
Anything that is related to a person, personally identifiable directly or indirectly. And that defensible posture will need to come with not just frameworks and policies in place, but with some vetted practices to be able to say, “Yeah, up until this point is very good,” and then reduces the scope of analysis, and you can segment it accordingly.
We want to be able to have that separation of, “Well, it’s the data, that’s why AI is failing, or the algorithms are off,” as opposed to being able to dissect it very clearly, and understand that, no, the data was perfectly fine up until the point of those algorithmic calculations, post business intelligence. And those models and those designs that have been created (and that software, then) have to be looked at further.
Satyen Sangani: As the relationship between data and business grows over the coming years, we have to make sure that we use data in an ethical and safe manner. Paola tells us what matters in trying to achieve these goals.
Paola Saibene: Expandability and transparency. That push for explainable AI is there. And I would say there is a lack of understanding of the amplified effect you can have on data that is personally identifiable as a result of the use of AI.
It is not as if the data is all of a sudden more dangerous because it part of an AI lane, it is actually the effect of that — the speed, the scope, the range — it can predict in a wrong way, or it can assign labels incorrectly, or it can make decisions. It is now more damaging by virtue of volume.
That is where we’re always trying to hone in, protect the people inside of the data. And I don’t mean in a cyber sense only, but protect the people by having that care put into how it is being treated all the way post-capture.
Many times that makes folks look at algorithms differently and say, “I understand, I would actually not combine those elements if I had known that there were purpose limitations to the data, or that it had this kind of metadata in it that does not serve my purpose on the AI side.” That’s why the maturity on the governance side is vital, so that then AI can be done much better.
Satyen Sangani: Do you feel like the people who are developing these algorithms, and doing this work are super aware of these issues? Not aware enough?
Paola Saibene: They’re not aware enough. They’re not aware. We’re still talking about the basics on the governance in AI. And we still have to remind a lot of folks that are very excited in that space that, wait a second, it is no different from DevSecOps. We are still talking about DevSecOps a whole lot more because that Dev space is still learning how to do things securely. And only last year, SSDF came out on the software development framework just to make sure folks are implementing it. We’ve had this issue for 30 years.
Satyen Sangani: What do you think it’ll take to make people more aware of these explainability challenges, and these challenges of ensuring that these algorithms are doing the right thing?
Paola Saibene: A billion-dollar question. I think a combination of factors, such as more regulation or more fines. But that’s not always deterring enough. It will be important that the organization values it.
If you are going to be monetizing the data, or monetizing processes, is your product, this data, going to be of higher value because you’ve done it well all along? If that’s the case, that’s a great incentive. So, you think of just other physical products around us: do we go for quality, or do we go for quantity? And it’s a similar thinking.
Satyen Sangani: I might be biased, but exploring our new frontier of data governance might be even more exciting than colonizing Mars, mostly because the benefits can be realized here and now.
This is Satyen Sangani, cofounder and CEO of Alation. Thank you for listening.
Season 2 Episode 26
Edo Liberty, CEO and founder of Pinecone, introduces the impact of vector databases on AI, likening them to Esperanto for algorithms—a universally understandable language that transforms intricate data into an easily interpretable format for AI systems. Unlike traditional databases' clunky, one-size-fits-all approach, they make AI smarter, faster, and infinitely more useful. As the fabric of AI's cognitive processes, vector databases are the hidden engine behind the Generative AI revolution.
Season 2 Episode 23
In a world of Neflix queues and doom-scrolling, Ars Technica’s Nate Anderson consults an unlikely source: Friedrich Nietzsche. The author of In Emergency, Break Glass: What Nietzsche Can Teach Us about Joyful Living in a Tech-Saturated World prescribes an “information diet” and adapts the German philosopher’s passionate quest for meaning into a world overwhelmed by “content.”
Season 2 Episode 14
If you know technology marketing, you know Dave Kellogg. The Kellblog author is an expert in tech marketing, sales, and how to evolve with your industry. Yet with all he’s learned, his advice for data leaders boils down to “Keep it simple.” In this chat, Dave offers his insights to simplify, from building frameworks to identifying your “crux challenges.”