By Radha Selvaraj
Published on 2025年4月21日
In my years working with organizations on their data journeys, I've observed a persistent pattern: companies pour millions into data lakes, AI initiatives, and analytics platforms, yet many still struggle to realize the promised returns on these investments. The reason isn't technical—it's linguistic. Beneath most failed data initiatives lies a fundamental language problem that's rarely addressed: semantic inconsistency.
When I speak with data leaders about their biggest challenges, I typically hear about technical integration problems, platform limitations, or skills gaps. But I believe we're collectively misdiagnosing the problem. Most data initiatives focus on data wrangling, without solving for semantics consistency. This demands resolving tough questions: Is this data meaningful? Do we understand it?
Unifying data language across an enterprise is fundamentally a sociotechnical challenge—it's about 70% human and organizational, and only 30% technical. Many organizations make the mistake of approaching it as primarily a technical exercise, which is why so many data governance and semantic unification efforts fail.
This 70/30 split explains why organizations continue to struggle despite sophisticated technical solutions. They're investing in solving the wrong problem.
Consider what happens in large organizations daily: Finance defines a "customer" as an entity with a billing relationship, while sales defines a "customer" as anyone in the sales pipeline. Support defines a "customer" as anyone with an open ticket, and marketing might consider anonymous website visitors as potential "customers."
These aren't merely semantic quibbles—they fundamentally shape how departments operate, measure success, and make decisions.
Each team might have certain incentives, incentive structures tied to performance. There are performance implications. If you are defining certain KPIs, a team would want more control over what that actually means, because that is their performance metric, and that's where the politics comes in.
These political dimensions of data semantics remain largely unacknowledged in most organizations. Definitions aren't neutral technical artifacts—they're expressions of power and control that directly impact how performance is measured and rewarded.
In my work, I've found that most organizations fundamentally misunderstand what semantics really means in a business context:
1. Semantics isn't just definitions—it's context.
Organizations typically think semantics is a glossary of terms, and it's usually an afterthought. After you have generated all the data, at the point of consumption, or if you have a governance program and a dedicated team, it's a governance afterthought: "Let me unify the definitions or let me create these metrics and glossaries."
A "conversion" means something completely different in marketing, religious studies, and chemical engineering. Even within a single department, the same term can have different operational implications depending on the user's role and intent.
2. Semantics evolves—it's not a one-time project.
There is people change, there is technology change, there is market change, and with that, definitions change. There's no reason for definitions to be rigid as well.
If you look at the energy sector, you usually have glossaries or terms or policies published by various governments and regulations, and you have a lot of terminologies. Things evolve even from a regulation standpoint, not to mention all the other dimensions of changes that organizations go through. Unless you stand on the forefront to bridge that gap, you do tend to drift off. That will lead to reconciliation challenges down the road.
3. Semantics is relational—not a technical imposition.
Perhaps the most fundamental misunderstanding is treating semantics as something that can be imposed through technical architecture alone. In reality, semantic understanding emerges through conversation, negotiation, and shared experience.
It is more about arriving collaboratively at the meaning. It's one thing to say, "Hey, as a governance team, we have decided this is how it should be." It is less about saying "I'm going to publish these definitions and everybody follow." It is more about arriving at this shared meaning of what a customer means, what a metric is, what KPIs mean, what active users mean, et cetera.
Yes, there are going to be trade-offs because some teams might want to define things differently—marketing might want to do it in a certain way, product will think of "active" one way, while sales will think of "active" in different ways—but you need to unify terms and have a way for people or teams to come together to collaborate and build that meaning together.
In my experience, enterprises vastly underestimate how profoundly language impacts their ability to extract value from data, often with costly consequences:
Decision paralysis emerges. Leadership meetings devolve into debates about whose numbers are correct rather than focusing on strategic decisions. I've seen executive teams spend entire quarters arguing about metrics discrepancies rather than acting on insights.
Trust in data collapses. When teams encounter conflicting definitions, they frequently retreat to their own shadow analytics, creating a proliferation of duplicate spreadsheets and local databases that further fragment understanding.
Analytics ROI plummets. Organizations invest millions in data platforms only to find adoption stalls because users can't relate the platform's language to their daily work context. What good is democratizing data access if no one speaks the same language?
Innovation stagnates. Cross-functional initiatives that could drive breakthrough value—like unified customer experience or integrated supply chains—stall because they require coordinated action across domains with incompatible terminology.
As organizations rush to implement AI and machine learning, semantic challenges become both more critical and more complex.
An AI product is not any different from any other data product in the sense that if you don't start from the right foundational meaning, anything that you build on top of it is on shaky ground.
AI models trained on data with inconsistent semantics will inevitably produce inconsistent, unreliable outputs. This challenge multiplies as organizations develop multiple AI products and services, each potentially operating with different semantic understandings.
The question is, is this going to be shared understanding across the organization or is it going to be a siloed understanding on that particular team that is trying to build that AI product?
If anything, I think Gen AI capability is making a stronger case to contextualize and provide business context awareness to AI models. This will definitely expose the need to unify more. The only question that remains is: Is this going to be shared understanding across the organization or is it going to be a siloed understanding on that particular team that is trying to build that AI product?
Not all sectors struggle equally with semantic challenges. In my experience, highly regulated industries tend to have the strongest semantic foundations:
The ones that are good at it usually stem from understanding the gravity of data purely because there is a regulation implication. Financial services is our largest industry. The next industry that is really good is actually pharma or manufacturing. In pharma, again, there is a regulatory aspect of things and there's a need for more formal ontology.
These industries recognize that semantic alignment isn't just a governance burden—it's core to value creation and risk management. When consistent language is tied directly to compliance requirements or patient safety, organizations find ways to solve the problem.
Another industry that I should mention is Consumer Packaged Goods (CPG). If you oversee a lot of products, the silo is so huge, there's such a heavy need to have a unifying contextual layer. It is not about one product here and one product there. It is about the application of this business, all the various data contexts to make sense of the massive, underlying data.
Regulated industries recognize that semantic alignment isn't just a governance burden—it's core to value creation and risk management. When consistent language is tied directly to compliance requirements or patient safety, organizations find ways to solve the problem.
If semantics is primarily an organizational and human challenge rather than a technical one, we need to rethink our approach. I propose that enterprises adopt "Semantic Products" as first-class organizational artifacts with dedicated ownership, lifecycle management, and clear economic incentives.
Rather than treating semantics as an infrastructure-level concern or governance afterthought, organizations would define specific "Semantic Products" that encapsulate key business concepts across domains. These would have:
Dedicated product teams responsible for their development and evolution
Clear value propositions tied to business outcomes
Internal "customers" who subscribe to them
Economic incentives where teams receive budget/resources based on adoption metrics
Continuous integration with actual data products to ensure practical alignment
This approach would shift semantics from being perceived as technical overhead to being viewed as strategic assets with measurable ROI. The radical part isn't the technology—it's restructuring organizational incentives to reward semantic clarity.
At the heart of it, this unification should be part of your value creation. It is not about adding processes and bottlenecks and semantics for the sake of semantics. This is tied to your value creation.
Data can be technically integrated, but really the problem is: Is it conceptually coherent? That's the root of the problem that semantics is trying to address. It's not about just integrating data. It is about: Is this coherent? Is it meaningful? Do we understand it?
As data continues to grow in volume and importance, the organizations that master semantic consistency will hold a significant competitive advantage. They'll make faster decisions, build more reliable AI systems, and operate with greater efficiency across organizational boundaries.
The path forward isn't primarily about better technology—it's about recognizing the human and organizational dimensions of the semantic challenge. It's about elevating semantics from a governance afterthought to a core strategic capability.
In a data-driven world, shared language isn't just a nice-to-have—it's the foundation upon which all other data investments depend. Perhaps it's time we started treating it that way.
Curious to learn how a data catalog can help you tackle semantics at your organization? Book a demo with us today.
Loading...