SHACL vs. a closed-world graph: what TypeDB enforces natively

In a strongly-typed, closed-world database like TypeDB, SHACL's core checks become the schema itself.

Joshua Send


If you work with RDF, you can clearly imagine this scenario: someone defines ex:Employee. Someone else asserts ex:Spot rdf:type ex:Dog and ex:Spot ex:isEmployerOf ex:AcmeCorp. The triple store accepts it without validation. Under the Open World Assumption, the store doesn’t know a dog can’t be an employer – it simply hasn’t been told otherwise.

SHACL is the standard fix. You write shapes describing what well-formed data looks like, run a validator, and get back a report listing the violations. The W3C Recommendation is mature, implementations across Jena, RDF4J, GraphDB, TopBraid and pySHACL agree on the test suite, and the whole ecosystem understands SHACL reports.

So here’s a question: what would change if your database natively rejected “Spot the dog is an employer” at insert time, before the data ever landed?

That’s what TypeDB does. This post walks through SHACL’s core constraints and shows what each one becomes when your storage substrate is strongly typed and closed-world by construction, rather than open-world with a closed-world overlay bolted on.

Structural constraints: where the type system makes SHACL unnecessary

Most of SHACL Core (the ~30 constraint components every conformant validator must implement) covers structural and value-level checks. Here’s the mapping to TypeDB 3:

SHACLTypeDB 3
sh:targetClass + sh:class on a propertyType system + plays / owns
sh:datatypeNative value types (integer, decimal, datetime-tz, string, boolean, date, duration)
sh:minCount / sh:maxCount@card(N..M)
sh:minInclusive / sh:maxInclusive@range(N..M)
sh:pattern@regex
sh:in, sh:hasValue@values(...)
sh:nodeKindStructural — entity / relation / attribute is a category, not a constraint
sh:closedImplicit — TypeDB schemas are closed by construction

Take the dog-as-employer rejection case. 

ex:EmploymentShape a sh:NodeShape ;
  sh:targetClass ex:Employment ;
  sh:property [
    sh:path ex:employer ;
    sh:class ex:Company ;     # employer must be a Company
    sh:minCount 1 ;
    sh:maxCount 1 ;
] .

In TypeDB:

define
  relation employment,
    relates employer @card(1),
    relates employee @card(1);
  entity company, plays employment:employer;
  entity person,  plays employment:employee;

Insert a dog as employer and the transaction fails automatically because dogs don’t play employment:employer. The schema is the enforcement.

Datatypes work the same way. sh:datatype xsd:integer exists because RDF will happily accept "twenty-two" as the value of :age. In TypeDB:

define
  attribute age, value integer;
  entity person, owns age @card(1);

"twenty-two" doesn’t type check, never mind commit. And sh:minCount/sh:maxCount — arguably the most-used SHACL constraint — is just the @card annotation we already wrote on the owns.

This pattern repeats for every row in that table. The structural-validation problem SHACL was invented to solve largely doesn’t arise when the database is typed to begin with. The reason it arises in RDF is the Open World Assumption, and TypeDB doesn’t make that assumption.

Cross-property validation: functions take over

The SHACL constraints that don’t collapse into the schema are the ones that span more than one property:

  • “A project’s end-date must not precede its start-date”
  • “An order’s total must equal the sum of its line items”

In SHACL these reach for sh:lessThanOrEquals or sh:sparql. In TypeDB 3 you can do this with functions. A function encodes a TypeQL query and let you return a stream of violators:

define
fun invalid_project_dates() -> { project }:
  match
    $p isa project, has start-date $start, has end-date $end;
    $end < $start;
  return { $p };

Running it:

match let $bad in invalid_project_dates();
fetch { $bad.* };

The sum-of-line-items check is the same shape, with a reduce:

define
fun invalid_order_totals() -> { order, decimal, decimal }:
  match
    $o isa order;
    $line isa line-item, links (order: $o, item: $item), has amount $a;
  reduce $sum = sum($a) groupby $o;
  match
    $o has total $declared;
    $sum != $declared;
  return { $o, $declared, $sum };

However, these more complex validators do not operate as ACID compliant transactional data checks, unlike the structural type-based value and cardinality validations. 

So, using a TypeDB function this way means querying for violations and is not a commit-time gate. This is essentially identical to running pyshacl over a corpus. In TypeDB however, these functions can also be type checked, committed into the database schema, and evolve with your data model in one place.

If you do want to generate failure on commit, you can wrap check the validators before committing each transaction. However, this does not automatically validate concurrent transactional writes to the database**.

And if you prefer SHACL’s habit of materialising failures as inspectable records rather than just returning them, a function plus a match-insert pipeline gives you exactly that:

define
  attribute error-message, value string;
  relation validation-error,
    relates invalid-item,
    owns error-message;
  project plays validation-error:invalid-item;
match
  let $p in invalid_project_dates();
insert
  $e isa validation-error,
    links (invalid-item: $p),
    has error-message "End date precedes start date";

Now your “validation report” is queryable data, searchable with everything else in the graph. You can easily add date, or lineage information, or a traceable evolution of validation reports as an audit log.

**Forcing on-commit failures

Your option here is to force a collision across transactions by replacing a known key with an updated value in concurrent transactions, forcing one to abort. In the future, TypeDB could implement proper on-write rejection based on full queries that gives ACID transactionality in an optimized way.

What TypeDB does not give you

There are several things that SHACL is designed to do that TypeDB isn’t.

Validation reports and severity levels. SHACL distinguishes sh:Violation / sh:Warning / sh:Info, and emits a standardised sh:ValidationReport that downstream tools consume. For a whole class of business-rule validation, the severity ladder and the audit report are the point, not the structural checks. TypeDB can reproduce those operations (severity tiers, queryable audit trails, generic report processing) using functions and modelled data. What it can’t do is emit a report in the standard SHACL vocabulary that a third party’s off-the-shelf validator understands. If cross-vendor report interop is your requirement, stay on RDF + SHACL.

Shapes as RDF, federable across publishers. A DCAT-AP shape lives in a .ttl file in a public repo; any conformant validator anywhere can fetch and apply it. That’s about the ecosystem, and not just technical. When using RDF as the schema-distribution substrate, then having equally share-able validation compounds across publishers. TypeDB schemas are designed to be non-federated i.e. designed for specific purposes and closed databases.

If those are core to your use case, RDF & SHACL are absolutely the right tool! 

When to reach for TypeDB

SHACL is a graph of interlocking standards – Core, the SPARQL extensions, Advanced Features, the rules layer, the report vocabulary, the interop. Precisely the latter, the interoperability with the RDF/SHACL ecosystem, is what you would give up most by using TypeDB.

However, if the reason you reached for SHACL was mostly for the constraints – the type checks, the cardinalities, the regexes, the value ranges, the cross-property business rules – TypeDB gives you all of that without an overlay language. The structural ones are enforced by the schema at commit. The cross-property ones are implementable in functions you query when you want to know whether your data is clean. 

Ultimately, if you reached for SHACL because RDF’s Open World Assumption was letting bad data into your store and you wanted a closed-world overlay to keep it out, consider using a closed-world graph database with an expressive type system instead.

“Spot the dog is an employer” doesn’t need to be a violation in a report: it can just be something your database prevents you from saying.

Share this article

TypeDB Newsletter

Stay up to date with the latest TypeDB announcements and events.

Subscribe to newsletter

Further Learning

Feedback