Ontology engineering: what it is, why it's back, and why agents need it
Adrian Brudaru,
Co-Founder & CDO
A small community has been quietly arguing about ontology engineering for twenty-five years. You may have assumed it was academic furniture and looked the other way. It wasn't, it was load-bearing for AI.
Humans were quietly using tribal knowledge for decisions, and no one noticed. Now we're pulling the humans out and replacing them with agents, and suddenly what you call “hallucination” is a bug.

But what we naively call “hallucination” is actually a lack of definition. Definitions are infrastructure, and the relationships between things carry load.
What ontology engineering actually is
Forget the tooling for a second. Ontology engineering is the practice of writing down what exists in a domain, what it's called, and how it's related, precisely enough that something other than a human can act on it without asking.
It answers three questions, and no other questions matter until these three are answered:
- What things exist here? (Entities. Nouns. "Campaign", "Customer", "Shipment".)
- How are they related? (Verbs. "A Campaign contains Ad Groups.")
- What can we say about them? (Attributes. Rules. Constraints.)
Why ontology engineering is back
Four things changed simultaneously, and the convergence is the whole story.
First, LLMs can read text but can't read between the lines. Hand a language model a messy schema with twenty tables named after the engineer who built them, and it will happily generate SQL. It will happily generate wrong SQL. It will explain at length why the wrong SQL is right. The BIRD benchmark made this concrete: on real-world schemas with actual ambiguity, frontier models hover in the mid-50s for execution accuracy. Give them a clean schema with column descriptions and domain context and the same models jump into the 70s. The gap isn't the model, it's the semantic map. Without that map, every agent is a tourist with a confident voice and no sense of direction.
Second, agents are removing the human in the loop. In a BI tool, a human reads the chart and applies common sense. In an agent, there is no chart and no human. Every ambiguity in your data model becomes an ambiguity in the agent's behavior, and the behavior isn't confined to a screen, it's writing tickets, sending emails, moving budgets, updating records. Ambiguity that used to cost you a mildly confused analyst can now cost you a message sent to the wrong customer.
Third, AI people are arriving at ontology engineering through practice, without calling it that. Karpathy published an LLM Wiki pattern that is, essentially, an ontology built from documents with humans in the curation loop, a curated ground truth for a domain. No OWL. No SPARQL. Markdown files in a filesystem. And it works.
Fourth, the workflow math broke. Workflows don't scale with a human in every decision. If every step is "LLM does the slop part, human does the thinking part," that's a recipe for burnout. The way out is for the human to capture their problem/decision ontology in an agent-readable format, a skill, a markdown file, a graph. So the human's reasoning scales with them. Ontology becomes a forced convergence point for the teams trying to solve the LLM hallucination problem at workflow scale.
The reason markdown in a filesystem works as well as OWL for most agentic use cases is that the content of the ontology is doing the work, not the syntax. A well-written markdown definition of "Customer", what it means, what it excludes, which systems are authoritative, which fields are canonical, will outperform a formally correct OWL ontology that describes the same thing badly. The rigor of description logic mattered when the consumer was a DL reasoner. The consumer is now a language model. The language model reads… language. Prose is the native format of the modern consumer, and the ontology's value has always been in the semantic commitments, not the format.
The map and the path
The cleanest way to explain why this matters is the peanut butter and jelly exercise. You know the one: a parent asks a kid to write instructions for making a PB&J, then follows them literally. "Put the peanut butter on the bread." The parent places the closed jar on the loaf. Chaos. The kid is furious.
Here's what's happening. The parent and kid both know how to make a sandwich. The problem isn't skill. The instructions don't carry the kid's ontology: what "bread" means (sliced, in a bag, open the bag), what "spread" means (knife, thin layer), what a "sandwich" even is.

Without ontology, the parent "plays dumb" and follows the instruction path literally to humorous effect. Now look at how most people write AI prompts and skills:
- Step 1: Open the file.
- Step 2: Parse the headers.
- Step 3: Format as markdown.
- Step 4: Add a summary.
These are paths without a map. Rigid, brittle, and they work right up until the context doesn't implicitly create the same mental map the author had.
A skill is really three things:
- Intent: what needs to happen.
- Procedure: the path how to navigate from A to B. The LLM is great at pathfinding though and can manage deviations just fine, as long as it has a map.
- Ontology: the map of the terrain. The LLM almost never has this.
The gap is almost always ontology. The LLM fails to find its path because the map of your domain is wrong. If it hallucinates an API method, it's not because it's psychotic, but because its ontology of that API is incomplete. If it formats your doc wrong, it's not because it can't write, but because it doesn't know your style guide exists as a thing.
The missing half: agentic
Here's the piece I think is undersold, and it's where the field has the most left to build.

Ontologies were built as comprehension layers. You model the domain, you query it, you reason over it, you understand the system. That's what OWL is for, that's what OBDA is for, that's what every knowledge graph product on the market is for. Beautiful for understanding what's true across fifteen systems. Beautiful for audit trails and provenance. Beautiful for the view from above.
The reason they were built that way is that the only general-purpose decider available was a human. You'd model the domain rigorously, the human would read it, the human would decide what to do, the human would act. The decision and the action lived in the human's head and hands, not in the ontology. That wasn't a limitation anyone was upset about. It was the division of labor. Ontologies were the map; humans were the drivers.
There were always rule engines bolted on the side (Drools, SHACL, business rule systems) and they could fire on facts and trigger effects. But they were a separate component, parallel to the ontology, not part of it. The ontology supplied facts; the rule engine decided. And the rules they encoded were narrow: validate this row, approve this claim, flag this transaction. Not "given everything we know about this customer's situation, draft the email."
LLMs change which side of that line is the bottleneck. An agent can now read the same ontology a human would, reason over it, decide what to do, and execute the action, including writing its own code to do so. The thing that used to require a human can now run on the LLM. The decider got cheap.
But for that to work, the decision logic (what counts as a qualified lead, when to escalate, what "billed customer" means, what threshold triggers what action) has to actually be in the ontology. Otherwise the agent is reasoning over terrain with no rules, and you get a tourist with a confident voice and no sense of direction. The full agentic loop (READ → REASON → DECIDE → ACT/WRITE) needs the ontology to carry weight it was never asked to carry before.
This is the missing half. The decision layer that historically lived in human heads needs to move into the ontology. Not because ontologies couldn't have held it before. They could have, awkwardly, with rule engines next door. But because there was no point. There was no reader who could use it that way. Now there is.
The other piece, which barely has tooling at all, is write-back. Updating the CRM, sending the message, closing the ticket, and recording the action in the ontology so the next agent reasoning over the state sees what just happened. People do it. They do it in glue code, off to the side of whatever knowledge graph happens to be around. Nobody has built it as a coherent layer.
Today's human-in-the-loop is exactly this gap: the human is the decision-and-action layer, running on a generic substrate (their own brain). The ontology version is the same logic on a domain-specific substrate that every agent in the org can read. Both work. Only the second one scales across domains without rewriting the agent.
Put differently: ontology pushes more of the decisions to the LLM by giving it the decision key. The "ask a human" threshold stops being a generic heuristic ("is the model unsure?") and becomes a property of the domain, written down once, reused by every agent touching it.
Where we started
The question that started it: we can load the data, now how do we model it with the LLM?
The first try was with a scaffold that walked an agent through Raw → Canonical → Dimensional, with "20-Questions to answer from this data" as the simplest way to keep it on goal. It worked, and it pointed somewhere bigger: If an ontology (bootstrapped via questions or differently) could drive the modeling, a data stack could eventually be managed at two layers instead of ten: ontology (the semantics) and precision (still code, but clean code an LLM writes against a well-defined model, not a pile of bespoke SQL). Next we explored how an ontology is different from a semantic layer. Afterwards, we explored using the ontology for retrieval, as a thinking substrate. The demo breaks when LLMs don't use ontology when thinking over data.
By the time we shipped the toolkit, the workflow had a clear shape: split taxonomy (what entities are) from ontology (how they behave), generate a Canonical Data Model from both, let the star schema sit downstream as performance. Minimum Viable Context was the blog we wrote about the bootstrap process, how little context the model actually needs to do this well (less than you think, more than you want to admit). Long-term the loop closes on itself: the ontology feeds the modeling agent, the agent ships code, the code's behavior refines the ontology. Using the ontology as chat-BI semantic layer serves both the semantic layer role, and additionally adds a "truth" and "rule" layer that makes agents stop hallucinating, while also reasoning better over data. But high semantics (ontology) and high precision (code) don't meet on their own. For the foreseeable future, the seam between them, where a clean ontological decision becomes code you'd stake the numbers on, is where a data person still sits.
The ontology isn't just for building the data model. It also helps an AI agent answer questions about the data.
Does it work?

We ran a test on a generated dataset based on a client's schema (their data model, dummy data), 20 Finance questions, gpt-5, against three setups:
- Just the Canonical Data Model. The LLM is pretty good at retrieval from such a clean model, but in our experiments, if we ask it for something that doesn't exist, it may hallucinate it (5% hallucination rate).
- Data model + taxonomy (what's in, what's out, and why). The taxonomy raises the verification bar. On the same trick question, the LLM uses the taxonomy to notice the rule isn't defined and refuses with a specific reason instead of fabricating one. The same happens for out of scope questions about tables that do not exist in the model; the model refuses instead of hallucinating something plausible.
- Data model + taxonomy + ontology. On the trickier questions, the answers are higher quality and reflect the intended Finance semantics, not just the first plausible SQL. Concretely, an example of how ARPU was calculated:
- Without the ontology, the AI computes total monthly revenue ÷ total monthly active users, globally. Looks reasonable. But you've smeared two unrelated populations: the customers who got billed this month, and the headcount across all customers regardless of who paid. But since billing happens at the end of the cycle, this also calculates into the ratio new users, who are not going to be paying this month, making the metric misleading.
- With the ontology, the AI follows the documented relationship: match revenue to billed headcount in the same company in the same month, then aggregate. You get a number per month that actually means something.
If you want to try the toolkit on your own domain, the Agentic Data Engineering course covers it end-to-end.
The future worth building
For twenty years, ontology engineering was a discipline that mostly cost money to manage risk. It built useful and important things for regulated industries where the cost of ambiguity was a life or a lawsuit. And so it remained niche.
That equation is breaking. The cost of ambiguity in an agentic world is the lack of automation, so now unmodeled knowledge becomes the most expensive thing in a data stack.
Ontology enables improved agentic workflows across the board, from better self service, to agentic data products. More importantly, when used in combination with a classic data stack, it doesn't require heavy technology. A small text file is often enough, which makes experimentation and iteration fast and accessible to everyone.
The missing agentic half will get built. The question is whether it gets built by the people who understand what an ontology is, or by people who reinvent it badly.
Every team deploying an agent into production is wondering why it keeps putting the jar on the bread.
Give the agent a good map. It already knows how to drive.
Try it
The Agentic Data Engineering course covers the full pipeline from raw API to production using the dltHub AI Workbench. Ontology comes in during the transformation section, where you use it to guide modeling decisions.
The full workbench includes toolkits for REST API ingestion, data exploration, and production deployment, so you can go end-to-end without leaving your editor.
The ontology-driven data modelling toolkit is part of the dltHub AI Workbench, available in dltHub Pro, due for release in Q2 and currently in design partnership stage. Currently, it’s being leveraged by commercial data engineering agencies who benefit from standardisation and acceleration. if you’re interested, apply for the design partnership!
For more background on the toolkit: Ontology Toolkit Preview