Sanjeev Mohan is a leading expert in data products, modern data architectures, and AI. Formerly VP of Big Data & Advanced Analytics at Gartner, he now advises enterprises as Principal at SanjMo. He’s also the author of Data Product for Dummies and a host of the podcast It Depends.
As the Co-founder and CEO of Alation, Satyen lives his passion of empowering a curious and rational world by fundamentally improving the way data consumers, creators, and stewards find, understand, and trust data. Industry insiders call him a visionary entrepreneur. Those who meet him call him warm and down-to-earth. His kids call him “Dad.”
0:00:04.5 Satyen Sangani: Welcome back to Data Radicals. What if data management wasn't just about governance and compliance, but a powerful driver of growth, innovation and competitive edge? As data products and AI agents mature, they're not just helping us manage information, they're unlocking its full potential. We're entering a new era, one where intelligent autonomous systems supercharge productivity and deliver real business value faster than ever before.
Today I'm joined by Sanjeev Mohan, a renowned expert in data products, governance and modern data architectures. In this episode, we unpack what data products really are and why they're critical to the modern enterprise. We also dive into generative AI and AI agents, how they operate, where they're headed, and why they could transform how we work. Stay tuned for a fascinating conversation about turning data into a true competitive advantage.
0:01:00.5 Producer: This podcast is brought to you by Alation, a platform that delivers trusted data. AI creators know you can't have trusted AI without trusted data. Today, our customers use Alation to build game-changing AI solutions that streamline productivity and improve the customer experience. Learn more about Alation at A-L-A-T-I-O-N.com.
0:01:25.1 Satyen Sangani: Today on Data Radicals, I'm thrilled to welcome Sanjeev Mohan, a leading expert in data products, governance and modern data architectures. As the former Vice President of Data and Analytics at Gartner Research, Sanjeev has deep expertise in analytics, AI, and the evolving data landscape. Now, as a principal at SanjMo, he advises companies on data products, data operations, observability and generative AI, helping them turn data into real business value. He's also the author of Data Products for Dummies, making him the perfect guest to break down this critical topic. Hey Sanjeev, it's great to see you again.
0:02:00.7 Sanjeev Mohan: Thank you Satyen. And it's such an honor to be on your podcast.
0:02:04.5 Satyen Sangani: Oh, well, that's amazing. I'm glad you feel that way. And absolutely, we're both incredibly honored. I would love to talk and dive right into a topic that I think a lot of people talk about, but maybe don't have a lot of understanding of perhaps or content around or maybe common definition around which is data products. You wrote the book Data Products for Dummies, so maybe we'll just start off on the basic stuff. What is a data product?
0:02:28.4 Sanjeev Mohan: Data products are kind of mired in a little bit of confusion and the definition is a bit of a slippery slope because we tend to define a data product based on where we come from. Business people have their own definition. Technical people have their own definition.
So, let me start from a business point of view. So, the first thing that comes to my mind is that a data product is a reusable and a standardized data asset that is delivering some measurable value. Most important, two goals that a data product conveys. One, it builds trust in data. And the second thing is that it has much better user experience than having to write SQL code trying to join tables. Business owns a data product, there is some accountability. See, today what happens is if something breaks down, let's say my dashboard, and the numbers are not correct, I have no idea who to ask what went wrong because there's so many people or tools that are involved in this entire end to end life cycle of building my dashboard. So, data product has a designated data product owner. It follows product management best practices. Extremely important.
0:03:47.1 Sanjeev Mohan: In the past when we build say a data warehouse, it was a project, we would go spend a year or two, build it, roll off the team, go do something else. But a product has a life cycle and all the way up to retiring it. So, there has to be somebody who's responsible for it, making sure all the features are well documented: How do I access it? Has an API, Is it SQL? So that's something we term a data contract. How do I discover it? It is available in a data catalog or a data marketplace? So, there's some self-service angle. Very long-winded answer. But I'm just trying to give an essence of what is a data product.
From a technical point of view, it could be a table or a view like in the past, or it could be a dashboard, a machine learning model. I'm even expanding the definition of data products to include AI outcomes like agents, assistants, RAG pipelines. As long as we are developing it with the product management principles and making that access easy and discoverable and building trust. To me it's a data product.
0:05:04.4 Satyen Sangani: Which is a pretty broad definition.
0:05:06.7 Sanjeev Mohan: Yeah.
0:05:07.4 Satyen Sangani: And you mentioned the idea of reusability in your definition and you mentioned the idea of sort of ongoing non-ephemeral ownership. So, it's not something that's just a one-time thing. And you also mentioned the idea of trust and guarantees around trust. And you use the word data contract as well. Are those kind of the four elements of it or are there others and do they, are those the four minimum elements of these things or are even some of those things perhaps discardable in the definition?
0:05:38.4 Sanjeev Mohan: So, I would say those are, trust being important because we didn't mention the word security or governance even once. So, to me all that falls under trust. The quality is good, it's secure, it's versioned. And I know just like Apple comes out with a new iPhone every year, my data product can get versioned, you see. So Satyen, this is actually, it's a very interesting concept. We always buy software with a version number, right? But when was the last time we ever talked about data having a version number? So, data product is actually a way of bringing that version. So, I can trust you, I can say this is the data that the marketing produced. This is my customer master, for example. But tomorrow, if I buy, let's say HubSpot and I get some new pieces of data, I may have a new data product, a new version number backward compatible with the previous one. So, this is how we are rethinking the entire space of data.
0:06:43.6 Satyen Sangani: And what are the problems that this data product thinking solves? So, you mentioned this idea of sort of ongoing ownership and knowing where to go when something breaks and even sort of trust. Although, you know, one could argue that trust could have been developed in other means without having to have a product in place. Like why should I do this product thing?
0:07:06.2 Sanjeev Mohan: Okay, so great question. The data products actually became very popular, by the way, I think the first person who came out with this term was DJ Patil in 2012. So, a long time ago...
0:07:19.7 Satyen Sangani: Oh, wow.
0:07:20.2 Sanjeev Mohan: But it really... Yeah, he's the one who coined it, but I don't think his definition was anything close to what we are saying, but because he was coming from a different space. The definition of data product that we are talking about came from Zhamak Dehghani when she was in Thoughtworks and she came out with a whole concept of data mesh.
So, the problem we were trying to solve was that as our data landscape has become more and more complex and the demand has gone up on data, centralized data teams are getting squeezed, they're getting demands from different places. "Hey, I'm launching a campaign. I need this information by tomorrow. Otherwise I can’t do my campaign or this thing went wrong or I need this information." So, to break that bottleneck, she came up with this whole notion of decentralizing or giving the responsibility of data to the business teams or the line of business units. So, in a data product, what we are saying is that who understands the data the best? It is the business team.
0:08:23.3 Sanjeev Mohan: Like for example, if, let's say I'm a hotshot data engineer and somebody tells me, here's your clinical data and I want you to take this clinical data and derive these reports for me, I don't know what those clinical codes mean at all. I'm not a trained clinician. But if you give that responsibility to the business team and tell them you are responsible, it's your data, you curate it, because you understand the meaning and produce a well-defined, reusable asset, then that is a problem we are trying to solve in a data product.
0:09:00.1 Satyen Sangani: And in her case, you know, she had a, at least in the original paper there was a pretty rigorous definition of a data product. It had to meet certain interoperability standards. It had to meet certain, you know, it was always a physical thing. It was always, you know, almost, you know, a set of tables or a particular standalone artifact. You mentioned things like models and... Which might be, for example, more consistent with the original expansive idea that DJ would have mentioned. You know, I think one of the original data products in the LinkedIn context was “people you may know,” but in her case it was more narrow. You seem to be sort of buying into a more expansive definition. Why is that? And as you thought about writing the book, consider which way to go and what was the best to go with the broader notion?
0:09:45.4 Sanjeev Mohan: So, I didn't want to define it too narrowly. I don't want people to come away thinking data product is something brand new. It's never been done before. Because if we go down that rabbit hole, then people will have even more resistance than they do already. So, my goal was to not change things from a technical point of view, but change it from a process point of view. In other words, I'm saying that the artifact may be what you have been used to, but the way it is developed and the methodology that you use, the process is different. So, that is why I've diverged slightly from where she was.
Also, one of the issues with the whole data mesh space was that there was never any methodology on how you build it. It was always a concept. So, data mesh is a concept, it's a theory. My customers are not interested in theory, they want practicality. How do I deploy a data product? So, therefore I've had to diverge a bit from the purest definition, if you may.
0:10:56.1 Satyen Sangani: Have you seen anybody that has adhered to the purist definition in a strict way and gotten to success?
0:11:04.7 Sanjeev Mohan: I don't think so. One other thing, by the way, in a book, is that a data product is immutable. And very quickly people said, "No, no, that's not true. Data products can change." So, I think the reason why it has taken so long for data products to get the prominence is because we never really arrived at an ironclad definition: This is a data product, and this is not. To be honest, we've never done that in IT. We've never said this is big data, this is not, this is NoSQL, this is not. Right? I mean even data catalog, we struggle to define things.
In fact, I was known in Gartner. A lot of my clients used to try to pin me down and say, define this, define that. And I always had this view right from the beginning. I can spend a whole inquiry, one hour just talking about definition, or I can help solve your problem. What do you want? And almost always business users said, "Okay, let's not worry about definition. Here is a problem. How can we overcome this issue? Because we've been trying to do it for a long time." Like you mentioned, trust in data can be achieved in many other ways. But we struggle. For 40 years we've not been able to get a handle over data quality, in my opinion.
0:12:30.4 Satyen Sangani: Yeah, it's funny you should say that because I mean this idea of like I can spend an hour in an inquiry trying to define things or I can actually solve your problem seems to be sort of the tension that literally exists with every data team ever. There's lots of wisdom in that statement.
So, do you leave the book thinking everybody should be adopting data products or at least the thinking that goes along with it? Is that sort of when you came out with the book, is that the conclusion you reached or do you feel like maybe, maybe not, depends?
0:12:56.3 Sanjeev Mohan: So no, I completely, by the way, I have a podcast called, It Depends. So in fact, it's the ultimate consultant answer. So, I absolutely don't want to give an impression that data product is the right way to go for every problem that you're trying to face. Absolutely not. You know, there are problems that the systems that are running really well in large organizations, why would they rip out and then go down the path of data products? Because the problem with data products is it's also a cultural issue, it's a mind shift and you have to think from a completely different long-term point of view. And we are so used to in IT to go, like, somebody gives me a problem, I'm like, "Yes, I got it, I'll solve it for you." And then you move on to the next problem. With data products, it's a mindset shift. And so, I would not recommend data products to be this solution for every problem.
0:13:55.5 Satyen Sangani: Yeah, I think that's right. I mean, it's funny because I think we've obviously looked at the topic quite bit as well. And if you think about data teams, I mean one thing they should be doing is answering questions, solving problems. And maybe they're building reports, maybe they're building AI models, maybe they're building some recommender engine, like what, whatever it might be, they’re building something. And a lot of the work should be basically just, solve my problem. I think this question of sort of reuse, value orientation, abstraction is where a lot of data teams and architecture teams spend their time. Because everybody wants to do things better, do things efficiently, do things in the most modern ways, do things with the best tooling. And you know, data's so confusing. Why don't we spend 20 hours a day thinking about definitions? Which is not to be like that sarcasm, isn't to be like sort of dismissive of that idea. But I do think that this, you do have to do, think about doing things better, doing things differently, doing things with value and reuse in mind, because otherwise you're just going to keep on being inefficient about your work. But when thinking about that life cycle, thinking about that value, thinking about that reuse is the fundamental thing you should be thinking about.
0:15:00.2 Satyen Sangani: And those are the problems that you're trying to solve. How reasonable is this? How valuable is this? How much should I be abstracting? Or how many, how much trust am I building in this, you know, overall? And I should be solving many problems with one thing in the abstract. You mentioned this idea of sort of getting to practice. How do people actually do this? Where do I start when I'm building data products?
0:15:21.0 Sanjeev Mohan: The good thing is there are a lot of companies, product vendors that are helping simplify the process of building, testing, deploying. There's a whole space of data ops that has now come up, which is how do I automate my CI/CD, my testing, managing environments, moving from one environment to another and then into production? So, that's one angle people are taking. Then there are these products that are trying to unify the metadata. So, Satyen, one of the problems that we are seeing, maybe this is because of the modern data stack, the fact that we had an explosion of point solutions. So, for example, humor me here. So, I have an ERP system and I want to run some reports on it. So what do I do today? I have my ERP system, I have some data ingestion transformation, some tool that will do chain data capture or something landed into an object store like S3, or into a lakehouse or a cloud data warehouse. Then I put a data transformation product on top. I have a data catalog, maybe I have a specialized lineage tool, I have a data prep tool. I have data observability, I have a data access governance tool, maybe I have data observability, data quality, the list is endless.
0:16:49.0 Sanjeev Mohan: In each of these topics I mentioned, and on purpose, I did not want to mention any vendor names because for each of these areas there are at least 20 different companies that I'm aware of. So, now the poor dashboard gets created at the very end. Each of these products has its own metadata catalog, right? I mean, they don't talk to each other, they have their own standards. So, we never actually had a common standard of metadata, unfortunately. Like, you know, we've got TCP/IP for networking, they all speak TCP/IP, but what about metadata?
So, I'm being a bit philosophical here, but the companies that are in the data product space are trying to abstract this madness, this really fragmented tool chain that we have. And so, that to me is one way of going through it. I have even seen evidence these days of some of these products that are now using AI. I know, Jenny, I had to come in our conversation at some point, so why not now? So they're using some foundation model beneath these products. They're even trying to abstract and make it sort of like conversational. So, in natural language, it is generating the code that is required to develop my data products.
0:18:20.0 Satyen Sangani: Yeah, no question. I mean, we're recording this before an announcement that we're about to make here at Alation. And like, there's multiple ideas behind it, but for example, along those paths there's an idea of a data product that data product would be exposed via, you know, what is an MCP server, you know, a server that essentially allows you to talk to, or an LLM to talk to Alation, but more fundamentally the products that are encapsulated inside of Alation. And so, this idea of sort of using data products as a mechanism to be read by AI, to be read by other applications, to be read by dashboards and people, to be read by any other arbitrary SQL database for transformation, I think is actually quite nice because it means that the thing can come along with the metadata that describes it. And I think the point that you're making is really important, which is that you're sort of saying, like, why haven't we had a standard for metadata? Because the metadata you might want depends on the use case you have. And on theory, like the metadata is almost, if you think about all possible use cases for all possible data, that number, the amount of metadata or the types of metadata could be infinite.
0:19:27.0 Satyen Sangani: But your use case really determines what products you build and what you need to do within the product. And so, to your point, I actually disagree with the idea that it's immutable. It should be constantly changing and evolving depending on the use cases that are built on top of it. Yeah.
0:19:42.8 Sanjeev Mohan: By the way, I don't know what you are launching, but everything you said is music to my ears. And this is why I do what I do. Because when I see some of my ideas, you know, start getting adopted. For example, the idea of having a data product be sort of the layer between raw data and my AI initiative or AI product, to me that's the right way to go. And I was talking about this a year and a half ago because my whole thinking was that at that time a lot of models were hallucinating like crazy. Even now they do. My problem was that if I put AI directly on data, I know I had to go through this whole era of SQL injection. What if there is an LLM injection? What if it returns some sort of output that is undesirable?
0:20:32.9 Sanjeev Mohan: Data products is already giving me the trust layer. It's curated, metadata is published, APIs are published, there's a data contract. What if I put my agents, my assistant, my chatbots on top of data product? Then I can improve reliability and trust in my outcome.
0:20:52.9 Satyen Sangani: No question. It also is the case that the data product can then travel with its own semantics and descriptions that would give the LLM confidence and context through which to talk to it.
I mean, people have talked to your point a lot about this idea of sort of SQL injection, but even in the LLM context there's a lot of confusion from turning what is a natural language question into an arbitrary schema. And as the schemas get more complicated, the LLMs get more lower and lower fidelity about how to understand how to talk to that underlying data structure. So, the data product can turn it into a natural language and give it enough context to have a higher fidelity conversation, even if it's just for human interrogation.
So yeah, I think there's a lot of interesting use cases to make it tangible and real. And you can then version the data products as you are seeing what questions are being asked of that particular product, if you're talking about that sort of LLM oriented use case. So it's, I think super exciting and it, you know, there's been a lot of talk about this idea of text to SQL, but it's a really hard problem because fundamentally SQL is about precision and these LLMs operate probabilistically and so you have to be very clear about narrowing the context window enough that you can not make the answers come back and be completely incorrect.
0:22:07.2 Satyen Sangani: I've talked to customers who are like, "Look, we have 40 data products and that's all we're ever going to need. And I talk to customers who are on the flip side saying we have thousands of data products and we're producing more every single day." Does either of those sound right to you?
0:22:22.1 Sanjeev Mohan: No. If you have thousands of data products and you're producing more every day, then to me that cannot be a data product. That may be a table. You know, you're creating a new view and because of the way I defined it, you can say, well, you said a view can be a data product. So, here you go. So this is the problem we had with data mesh washing because there was no prescribed definition and having 40 data products and we are done. That's a very static business that's about to go out of business. I'm sorry to break it to them.
[laughter]
0:23:00.4 Satyen Sangani: What's amazing is that these are... One was a large financial services institution, a top-end bank in one region of the world. The other one was in a different region of the world, but also a top-end bank. So, very interesting. And both almost exactly the same scale, slightly different markets, but close enough an overlap, but literally two very complicated large institutions and you get two very, very different perspectives. And I expect even within those institutions you might have different perspectives as well. So, it's quite different and interesting.
0:23:29.2 Sanjeev Mohan: And Satyen, one of the things that I mentioned earlier was that a data product has a lifecycle. We are monitoring it and if we see that it's not being used, then we retire it. Just like iPhone, it goes out of life. So, if you have thousands of data products and you're producing more every day, they're not retiring. And if you're not retiring, then you're doing the way we did traditional stuff where, when we moved from on-prem to the cloud, we took all the temporary tables and all the crap we had collected over the years, we just moved it into the cloud. And then we wondered, wow, cloud is more expensive. Yes, it is more expensive because you took the junk with you. If thousands of data products exist, then a lot of them are junk, in my opinion, then they're never being used. And there's no data product owner who's actively cleaning out these old data products. That's just my opinion.
0:24:24.5 Satyen Sangani: No, no, that's deeply my opinion as well. I mean, I think that's true for data. So, obviously data products are bringing software development, thinking to the management of data. And yet it's true of data that you have to sort of throw away the things that no longer serve you. It's true of software that you have to throw away the things that no longer serve you. I mean Alation, we've built software over 10 years. Some stuff is wildly used, some stuff is not used at all. And you got basically have to be thoughtful about pruning the tree and being very thoughtful about throwing away things that you tried that didn't work. You tried that worked for a while but no longer work. You tried but continue to work and that you have to get make better. And that I think is very clearly a parallel analog in sort of thinking through what you do.
Now that you've published the book for Dummies, how much of your work and your practice does data products represent? Is it a lot of your work? A majority? A plurality?
0:25:15.0 Sanjeev Mohan: No, actually, unfortunately it's not a lot of work. And the reason is because Gen AI has taken up so much of my bandwidth and I'm even embarrassed to say this beginning of the year welcome to 2025 Newsletter in January I made a bold statement. I read a book on AI agents which I have not yet started. So, now I'm so deeply entrenched into the AI space that I have not really been paying a lot of attention to data products unfortunately. And it just comes in waves. And I know at some point the AI agent story may actually temper, because we may have overhyped it at this point and then it'll be back to data products.
So Satyen, if I may say this, data products, data management is foundational. Right now DeepSeek just came out and so we all into, "Oh my God, what's happening? LLMs are hot again." So, we keep getting pulled into different directions but the data will never go away. We all come back to data once some hype is over and the attention goes back to data. It's the only differentiator. Models are not differentiators, agents are not differentiators. The only differentiator, the only moat a company has is its data.
0:26:44.6 Satyen Sangani: I think that's right. I mean the models themselves can produce new data. So, to that extent there's always some circularity there. But I think that's right and obviously in the business of data I am definitely somebody who's drank that Kool Aid.
But let's talk about agents because you know, if we're going to talk about something that is relevant, let's talk about the topic of the hour I can't believe we've gotten 30 minutes through a podcast and we haven't said the word AI and agents quickly enough. So you know, something's bad is going to happen if we don't get there. So let's get there. Tell us what is an agent and what are people doing with them? And you're talking a lot about generative AI, like why agents? What's going on there?
0:27:19.5 Sanjeev Mohan: Okay, so an agent is the next generation of robotic process automation to some extent. So, RP on steroids. And the purpose of an agent is to... By the way, let me also say, I'm just trying to draw parallels here because like, if I say it's a brand new thing, then people will be like, "Wait, yet another brand new thing." It's like, "When will you IT people stop and be more practical?" Microservices architecture came out many years ago and microservices became the thing. So to me, what's microservices in software engineering? Data products is in data. Agent is akin to that. It's autonomous, which means it's operating on its own. It's taking action on my behalf. It's like an agent, like a travel agent, for example. But the difference is that an agent is built on top of a foundation model, like a large language model. The best way to explain it is through an example. So, there are four parts to an agent. It has to sense the source or sense my environment, reason, what to do with this, come up with a series of steps and then act upon it. So, there's the sense, reason, plan and act.
0:28:42.5 Sanjeev Mohan: So, I wrote a frequently asked question, this is how I defined it, just to make it easy for me. Now, people are saying there's yet another piece, the fifth piece, which is self learning, but let's keep that aside and look at an example. So, let's say I have an agent that is reading my email, but because it's based on a large language model, it can decipher what is the sentiment of my email. It can decipher the intent, it can decipher or understand or infer who this email came from. So, if the email came from my boss and the sentiment is negative, then it just goes up in priority and it just sends me a text message and says, "Your boss is mad at you, you better do this." If the email is from my prospect and the prospect says, "I'm sorry, we cannot meet on this Friday, I have a funeral to go to, can we postpone the demo to Monday?" Then maybe it's going to read that, "Understand the intent, Go look up my calendar and move this meeting to Monday." So, that's one example of a personal agent.
The agent's based on personal agents based on a role. Like what does a DBA, a database administrator, do every day? What does a supply chain analyst do every day? Can I automate that piece of work? Let me give you another marketing example.
0:30:07.0 Sanjeev Mohan: We have tools to go crawl the Internet and look at rivals, our competitors pages and see what they're doing. But how do I know what should I crawl? So let's say I'm in the competitive analysis business and I have an agent, a research agent. What this research agent does is it goes every morning before I come to work or I log in, it looks for similar companies in my space. Maybe there's some new content so it is automatically on its own, it is scanning the web, crawling it, synthesizing that data. Because it's based on large language model, it can summarize that data and maybe if it's advanced enough, it can do a SWOT analysis. And so, I come to work and now I know that companies in this segment, this is what they did yesterday, this is what they published. And so, it helps me come up with the next best action. I still have to do the work, but all of a sudden hours and hours of research time has been taken away from me. I still don't know, did it do a good summarization? Because this is all unstructured content, right? So, there are no quality metrics.
0:31:23.4 Sanjeev Mohan: How do I know it did a good job? So, there is a little bit of that hallucination unreliability which is inherent in a probabilistic model like you mentioned earlier. But it's not replacing my job. It is making me so much more productive. It is as if I, an individual contributor, am now a team leader and I have 10 subject matter experts who did the work for me and provided me with their summary.
0:31:54.9 Satyen Sangani: So, do you think that these agents will replace jobs?
0:31:58.7 Sanjeev Mohan: Absolutely not. So some, like even today, there's some CRM companies that have gone on this journey to make SDRs. So, what is going to happen? And I have full conviction on this, it is going to transform every single job in the world. No question about it. People say, "Well, only thinking jobs will be affected, not physical labor, even those through LLMs embedded in robots." And it's already happening. Every job will change, but new jobs will be created. And what do those jobs will be? It sounds like science fiction. Yeah. What new jobs? You know, if you go back to way back in history. In 1800s, 90% of US employment were farmers. They had no idea that in hundred years they'd be building cars. And then another hundred years, by 2000, only 3% of US workforce will be farming. Rest will be doing all these jobs that never existed. We don't have these people sitting in a telephone exchange as a switchboard operator connecting those cables that we used to see in the movies from way back. We don't have a whole army of tellers dispensing cash because we use an atm. So, there will be absolute chaos to some extent. But new jobs are going to be created.
0:33:28.7 Sanjeev Mohan: It always has happened. AI is not going to just overnight just say, all of you people, you're no longer needed because now AI is doing it. There will be, maybe like coding, for example. Coding is a perfect example because the coding assistants have become really good. But you still need coders. You cannot rely on them. Maybe they have taken away documentation of code from me. Thank God, I never liked coding documentation. Maybe they've taken away writing test cases. But I'm still needed to do that human reinforcement and make sure it's correct so we can go on forever, but...
0:34:10.6 Satyen Sangani: No, but I think what you're saying is some jobs will go away, which I think is right. Some jobs, I think some jobs will go away and other jobs will be created. And I think that's, you know, true. I mean, you know, there, there was the printing press, which meant that you didn't have to write as much on papyrus and or other bits of written note. And certainly jobs went away and the switch to our job went away, but cell phone retail customer associates came into play. And maybe one day those jobs go away and something else comes into play. And I, you know, I do think that what's exciting is that your imagination gets you a lot farther and your intentions get you a lot farther.
And you know, as I think about the work that we do, the work of data management, much of this work is work that nobody really wants to do or really should want to do. I mean, nobody should wake up every day and be like, "Oh, let's go steward some data this morning." Like that's like not a, it's not an interesting, it's not a bad way to spend time and it's not a useless way to spend time.
0:35:08.0 Satyen Sangani: But most of the people that could be doing that work could be spending their time in higher-order ways. And so, I personally believe that it's going to change not just the jobs, but also the shape of the many of the markets that we operate in. Including the ones that I, you know, like I... And we, like, you know, 10 years ago we built this thing called the data catalog. And I think the way in which these things are going to be used is very different and it's going to change materially.
0:35:33.2 Sanjeev Mohan: These times may be head spinning, but they have never been so exciting in our entire career. This is the golden age of data.
0:35:43.1 Satyen Sangani: It's funny because I say this as well. I mean, I think that when we launched this company, the work and the state of the art was this idea that in sharing knowledge, the best way that we could do that was to put it in a static page on the open Internet. And somebody could write it and a search engine could index it. And if we were lucky, somebody would come to it, they would read it and they would understand it. The iteration on that was things like social media. People would ask questions, people would get answers. But now you can literally get the knowledge of the moment where you're actually doing something and you can have the knowledge summarized for you based upon a thought. You know, like you can summarize, like, you know, yesterday I took a set of emails and basically an outline that I'd written and I was like, "Hey, why don't you write an essay based upon all of these emails and the output that I've written?" Organized it perfectly. I had to edit a few things. Some of it was completely wrong, but still got me farther and way farther along than I otherwise wouldn't have gotten.
0:36:39.8 Satyen Sangani: And that I think is pretty, pretty exciting. And I think it's going to change all of these things. I mean, to your point, you talked about these things like quality and like master data management and data products. And these are all intermediary artifacts, intermediate artifacts that people ideally would never even have to think about because what they really want to do is just reduce customer churn or increase their retention or grow new products and build them. And so, yeah, I think it's super, super exciting.
Given the intersection of these two disciplines, how quickly do you see the change happening? Where does the world of data go in this new age of agents and LLMs and generative models? What excites you? How quickly do you see change occurring?
0:37:21.6 Sanjeev Mohan: That's a great question. And I'm glad you're asking these questions because it's making me think, what has formed my perspective? Like what did I do in the past that makes me think like this? And I think after having been in this industry for so many years and having seen dot-com boom and bust, to me it's the same story is repeating. I'm not saying there's going to be an AI bust because I think companies have a lot more money than they did during dot-com boom. But what happened during the World Wide Web Internet heydays? Things came crashing down and years went by. There was a company called Webvan during my dot-com boom days. It was a celebrated success of the Internet and it was most spectacular disaster. But then now we rely on Instacart which does exactly the same thing.
So, when the dot-com bust happened, people's attention moved away from the Internet and they're like, "Oh my God, that was such a bad dream. We've moved on to other interesting things." But the technology never stopped, innovation never stopped. And we came back with something that did my entire tax, entire US tax code. And I could, I mean it was just fascinating to see whether mobile, with social media, all the changes, same thing is going to happen to AI.
0:38:46.0 Sanjeev Mohan: We may have detractors, we may criticize it, we may point its deficiencies like hallucination, is going to get better. It's actually the trajectory is even faster than anything in the past. So, I think next year when we are talking again, if we are getting ready to meet up at Gartner Data and Analytics Summit somewhere in the world, you and I will come with our agents. That's how fast it'll be next year. That's my prediction. And I think Gartner will have magic quadrants.
There is a concept I laid out in my trends for 2025 where I said to build these agents and deploy them, you need an agent management system. So, the prediction I've made is that there will be a magic quadrant in 2026 on agent management system. So AGI, I don't go that far and I don't even care too much about AGI. I've dedicated to Elon Musk to talk about it, but maybe that'll take longer and I don't really care. But AI is going to get smarter and better, faster than we know. And we will have these small agents. Maybe we won't have super complicated workflow in an agent, but smart personal agents. Absolutely, I see that next year.
0:40:06.7 Satyen Sangani: Yeah, yeah, I see that too. What I don't know is whether I think they're going to be shaped like personal agents. I mean, I think there's lots of different ways these things can be shaped and generally speaking they're going to be everywhere and they're going to be disparate and they might be conflicting and even confused at times. But there's going to be lots of different ways we use these things, and some of the most useful will probably be totally unpredictable. But I do think that people have to use them. What's been remarkable to me is actually how much some people don't use them at all, and yet other people use them all the time. And so, there is a little bit of a learning curve and a little bit of an acceptance of, it seems of, you know, how tech forward are you and how much are you willing to sort of lean into some of these things.
0:40:47.6 Sanjeev Mohan: I was just going to say, you know, some organizations may not even have a choice. Because if your competition is doing it, then why would you wait? You'd be forced into this space, in my opinion.
0:41:00.8 Satyen Sangani: Yeah, totally. It's a really fun world that we live in because I think a lot of human labor that was oriented around knowledge work is going to take a very different shape. And, you know, the skills you'll need for tomorrow are going to be very different, then so... You obviously analyze a lot of things, talk to lots of people, and, you know, your job is to sort of see and look at different problems. Which problems are you most excited about getting solved in the next year? Where do you see the most innovation happening? Where are people talking about what could be done?
0:41:30.3 Sanjeev Mohan: So, I've been very intrigued by all the work that's going on in Apache Iceberg and other table formats. So, I'm super excited to see that we are finally going to use unstructured data and structured data together. So far, all my career has been unstructured data. But now, thanks to these embedding models, I can read an unstructured data, put it into maybe a vector database, or any database for that matter. Maybe, not even put it into a database, just read them as these PDFs come and go. So, that stack which is needed to me should be unified. I don't want to live in a world where, "Oh, for this use case I have this stack and for AI, is this for data, it's that." No, it's all data in different formats. So, having this disaggregation of analytical tools, I think metadata, semantic layers, they all become extremely critical because to me they sort of unify. There's so much emphasis these days on these catalogs sitting on top of meta stores that are sitting on top of Iceberg like table formats. So, from a technology point of view, I'm very, very interested in seeing how structured, unstructured data come together and how we simplify and unify the stack with the governance on top and with any analytical tool I want to bring.
0:43:03.9 Sanjeev Mohan: And by the way, that analytical tool doesn't have to Spark or Pandas or Snowflake or Databrick or Dremio, Starbucks. It could even be my LLM. My AI agent or assistant is a compute engine that is going through this unified architecture.
0:43:25.0 Satyen Sangani: So, table formats unification of structured and unstructured data. What else is of interest?
0:43:31.7 Sanjeev Mohan: Governance, which is going to be very interesting because when I say governance, I look at it as AI assisting governance, like better inference of relationships, for example, better data quality, discovery and things like that. And then there is governing AI itself. So, there are two separate things. One is greenfield, which is I have a governance tool. How do I supercharge it with AI? And so, that's Brownfield, because it only exists. And the other is greenfield, which is how do I get a handle over all of my experimentation, what value did I get and then put it into production, monitor what I'm doing with my AI for cost, for hallucinations and things like that.
0:44:25.3 Satyen Sangani: It's interesting, you described the idea of infusing AI into the governance tools as Brownfield and I actually think, I think I might take the exact opposite perspective that I think that it'll orient the work and change it in a way that may even change the shape of the tools themselves.
0:44:44.9 Sanjeev Mohan: And you could be absolutely right. So sorry, I didn't want to interrupt here, but I just had this idea. The disruption can be so big. So, just the way you are saying that it's actually Greenfield, it's not Brownfield. While I spent my entire life doing ERD diagrams, that was actually data modeling was one of my favorite things to fire up Erwin and then just model. Why do I need to do third normal form data modeling? Why can I not leave data as it is and then just use a foundation model to infer the relationships?
0:45:20.9 Satyen Sangani: There's that and then there's also this question of, "Well, let's even say that I had to do some form of... " I mean, all of that work is work to essentially allow systems to talk to other systems more efficiently or people to talk to systems more efficiently.
0:45:35.1 Sanjeev Mohan: Correct.
0:45:35.5 Satyen Sangani: And ultimately this point of like, "Well, I just, tell me, like, I just want to do something. I just want to know how many transactions this customer had. And I don't want to know that I have a transaction table and a customer table. Then I have to join them on, you know, as of date and customer ID in order to be able to get to something that actually makes some sense. Like, can you just give me the answer?"
0:45:53.3 Sanjeev Mohan: Yeah, exactly.
0:45:54.4 Satyen Sangani: This is, I think the interesting thing about, like these models is that it just can abstract so much from what you otherwise would have to do. So, no, it's super exciting. Sanjeev, any worries or anything that you're concerned about? You seem very optimistic about this and age, any concerns about all this stuff that happening?
0:46:13.2 Sanjeev Mohan: So, I left Gartner three and a half, four years ago because I wanted to be on cutting edge. In Gartner, I had to be very careful not to steer my customers into what might be a harebrained rabbit hole. Now I get to do exactly that. So, that's my concern sometimes, that I'm letting my wild ideas run free and I'm pushing the envelope. Sometimes, I'm ripping it apart. So sometimes, I see there's so much of interest on AI agents and I sometimes fear. I hope I haven't added fuel to the fire and created this rosy picture of how things will be. And it turns out that we overestimated the benefits and we have to take a pause. So basically, I guess what I'm saying is I don't want my enthusiasm to come in front of what's practical, because that's my goal, is to provide practical advice. But I don't want my customers to hold back. I want them to go out and experiment and find it for themselves. Maybe data product is not the right answer for you, and maybe AI agent is not the right answer. But unless you experiment, how do you know? So, that's... In fact all my customers when I'm with them, I'm just taking them on a journey to be bolder and faster.
0:47:40.1 Satyen Sangani: Well, sounds like we found the episode title. Unless you experiment, how do you know? So, with that permanent question, I want to say thank you, Sanjeev. Always a delight to speak with you. Your enthusiasm and sort of wisdom both shine through. So, great to see you and I look forward to seeing you in person in a couple of days.
0:47:55.0 Sanjeev Mohan: Yes, thank you. Take care.
0:47:56.7 Satyen Sangani: Sanjeev lays out a bold vision for the future of data and AI. One where data products aren't just assets, but engines of trust, value and reuse. By shifting to a product oriented mindset, organizations can turn data from a technical byproduct into a strategic capability. With the rise of generative AI and autonomous agents, we're seeing productivity redefined across the enterprise. To thrive in this fast changing landscape, Sanjeev reminds us, organizations must foster a culture of experimentation. That mindset will be essential in this age of transformation. I'm Satyen Sangani, CEO of Alation. Thanks for listening to Data Radicals, keep learning and sharing. Until next time.
0:48:41.1 Producer: This podcast is brought to you by Alation. Your boss may be AI ready, but is your data? Learn how to prepare your data for a range of AI use cases. This whitepaper will show you how to build an AI success strategy and avoid common pitfalls. Visit alation.com/ai-ready. That's, alation.com/ai-ready.
Season 2 Episode 27
In this unique episode, we introduce "Saul GP Talinsky," an AI iteration of Saul Alinsky, the pioneering force behind community organizing and the influential author of Rules for Radicals. The dialogue bridges the past and present, highlighting how modern data analytics culture echos Alinsky's ethos of empowerment and societal change. Through the lens of data, Alinsky's AI counterpart illustrates the transformative potential in both grassroots activism and corporate realms, advocating for a future where data-driven insights fuel innovation, challenge traditional paradigms and foster more just and equitable decision-making.
Season 2 Episode 17
Everyone’s talking about GenAI, but there's so much we still don't understand. Tamr co-founders Mike Stonebraker and Andy Palmer break down its impact and limitations in the realm of data integration. They also discuss deep learning vs. traditional machine learning, the rise of data products, and the collaborative spirit that fuels their pioneering work.
Season 2 Episode 5
Seeing is believing — or is it? Today, Photoshop and AI make it easy to falsify images that can find their way into scientific research. Science integrity consultant Dr. Elisabeth Bik, an expert at spotting fishy images, addresses the murky world of research, the impact of “publish or perish”, and how to restore trust in science through reproducibility.