Everyone is making structured graphs now

Let's look at how and why graph-y databases are validating structure.

Joshua Send


For a long time, the graph database world was selling a pretty fundamental assumption: your database shouldn’t enforce the structure of your data and be maximally flexible.

You had databases like Neo4j championing schema-optional design. The pitch was ease – throw your nodes and edges in, connect things however you like, and evolve as you go.

On the other, databases like TypeDB have always championed having a type system for your data. We always believed that if your data has expected structure (and 99% of the time, it does), the database should know about it and enforce it.

Now it seems the focus is shifting, and mostly in one direction.

Neo4j recently previewed a feature called GRAPH TYPE, a mechanism for declarative shape enforcement on property graphs. In the RDF world, SHACL has been adopted by a number of major vendors and is now the subject of serious academic work on validation under graph updates. Across the board, databases and data models that built their reputation on flexibility are now investing in ways to constrain it.

The industry is starting to echo what we’ve been insisting on for years. So, why does structuring graph data make absolute sense?

Schema-optional was never actually schema-free

In the real world, a database powers an application. A team picks a schema-optional graph because they like the flexibility. It starts out great – you’re exploring, iterating, the data model is taking shape. However, at some point, more people start writing data, more services start consuming it, and the data starts to lose coherence.

It can look like nodes showing up with missing properties, or relationships connecting entity types that shouldn’t be connected, or duplicate concepts creeping in. Likely, no one notices until a query returns something wrong – or worse, something that looks right but isn’t.

So what does every team do? They start building validation: application-layer checks, ETL scripts that reject malformed inserts. Documentation is also a basic starting point, writing down what the data should look like. In effect, we see repeated, hand-rolled attempts at schema enforcement. Not only is that duplicate effort, but it’s also scattered across application code and completely invisible to the database.

So, the semantics of your data always exist somewhere. The lower you can push them, the more components can leverage them. If they are all the way down in the database, they can be enforced, optimized against, and reasoned about by all components in your stack, including the database itself.

In the OWL/RDF world, this is exactly what SHACL initiated in the late 2010s. SHACL introduced a way to define shapes that data should conform to on top of OWL’s basic [subject][predicate][object] triple. It’s now supported by Cambridge Semantics, Ontotext, Stardog, TopQuadrant, and others and has become a typical part of a semantic web stack.

Neo4j’s GRAPH TYPE follows the same trajectory for property graphs. Where you previously needed to scatter dozens of individual constraints across your database – existence constraints, uniqueness constraints, etc – GRAPH TYPE is a single declarative definition of shape. It’s useful – and it’s a pretty significant admission that the schema-optional model, on its own, isn’t going to work in this day and age.

The AI stakes

Graph databases have always been used in domains where relationships matter – fraud detection, supply chains, network analysis. But the typical consumer of that graph data used to be a human or an application. A data team that understood the model could mentally compensate for messy data, or could tolerate some inconsistency with extra application code.

That’s not who’s consuming your graph anymore. Knowledge graphs and ontologies have become critical infrastructure for AI. When an LLM retrieves context from a knowledge graph, the quality of that context directly shapes the quality of the answer. When an AI agent reasons over a graph to plan its next action, a malformed relationship directly empowers reasoning errors.

The industry is clearly feeling this. Gartner’s 2026 predictions for data and analytics call universal semantic layers “critical infrastructure” by 2030 — on par with data platforms and cybersecurity — and describe them as “a must-do for D&A leaders either leading or supporting AI.” On the research side, early results are promising: one study in the Journal of Biomedical Informatics found that grounding an LLM in an ontology-based knowledge graph reduced hallucination rates from 63% to under 2% in clinical question answering (Ali et al., 2026). That’s a specific domain, but the pattern it points to is general — structured, typed data gives AI systems something reliable to reason over.

Context engineering is emerging as its own discipline right now. And in gathering context, experience is showing that vector search alone isn’t enough. You need knowledge graphs, ontologies, and semantic structure that actually teaches the AI how your domain works.

Structure and beyond

So the industry is converging on structured graphs. That’s great! But it’s also worth pointing out that this isn’t a new idea – SQL has had enforced schemas for decades. In programming languages, we’ve long been able to choose between weakly typed languages and strong, statically languages with a range of interesting compilers available.

But structure on its own only gets you so far. A schema that says “this node must have these properties” or “this relationship can only connect these labels” gives you data integrity. But it doesn’t give a machine — or a person unfamiliar with the domain — any deeper understanding of what the data actually represents. For that, you still need extensive documentation, tribal knowledge, or a lot of time spent reading the data and guessing.

This is where TypeDB takes a different approach. Our type system isn’t a constraint layer on top of a flexible graph — it’s the foundation of the data model. Every entity, relation, and attribute has a type. Every role in a relation specifies what can play it. Inheritance is native, so both human and company are kinds of legal-entity and queries resolve through that hierarchy without you flattening things into labels. Functions encode reusable domain knowledge. The schema simultaneously constrains and describes the domain. An LLM reading it gets a coherent picture of how concepts relate, not just a list of what’s allowed.

We didn’t build this because we anticipated the AI use case. We built it because we think knowledge – the kind that organizations depend on for critical decisions – has inherent meaning, and that meaning belongs in the database, not scattered across documentation and application code.

When your knowledge graph is the foundation that agents reason over, structure alone isn’t enough. The machine needs semantics – or at the very least, a lot of documentation. And even then, documentation drifts, goes stale, and lives outside the system that’s actually serving the data. TypeDB combines data modeling, semantics, and and structural validation into the fundamentals of how you represent data.

Structured graphs

It’s now a question of data model: do you want to use a flexible graph, and add a validation system afterwards, effectively learning two languages – or do you use a system that’s built from the ground up for human and machine understandability, with one unified language serving both?

We know what we’d choose!

Share this article

TypeDB Newsletter

Stay up to date with the latest TypeDB announcements and events.

Subscribe to newsletter

Further Learning

Feedback