Officially out now: The TypeDB 3.0 Roadmap

TypeDB Fundamentals

The Strong Type System of TypeDB


The backbone of TypeDB is its strong type system, which powers its declarative polymorphic queries and semantic data validation. The type system is managed by the type-inference engine, which resolves every query against the schema to determine the possible return types. In this article, we’ll take a deep dive into the type system to learn how the type-inference engine works. The first half breaks down some example queries to see how declarative polymorphic queries are resolved against the schema, allowing results to be retrieved without enumerating the possible return types. In the second half, we then see how queries are checked for semantic validity, preventing nonsensical queries from being executed.

This article uses an example data model and queries for a simple filesystem based on a discretionary access control (DAC) permission system. In the DAC framework, all objects (for instance files, directories, and user groups) have owners, and permissions on an object are granted to other users by its owner. The schema, dataset, and queries can all be found on GitHub.

Query resolution

Inheritance polymorphism

In this query, we retrieve the ID and event timestamps associated with each resource. This query makes use of inheritance polymorphism, which allows us to declaratively query a supertype to retrieve instances of it and all its subtypes.

match
$resource isa resource;
fetch
$resource: id, event-timestamp;

To begin building the set of return types for this query, we must consider the constraints it comprises. For this demonstration, it is useful to imagine a TYPE($x) function that returns the possible set of types of the variable $x. This is not actually TypeQL syntax! The pattern in the match clause tells us that $resource has the type resource, indicating that TYPE($resource) must be either resource itself or one of its subtypes.

TYPE($resource) = { resource, file, directory, repository }

We can then exclude resource itself as it is abstract so cannot be returned.

TYPE($resource) = { file, directory, repository }

So file, directory, and repository are the possible return types for $resource. The fetch clause then tells us that we would like to retrieve the id and event-timestamp attributes of $resource. The return types of both attributes depend on TYPE($resource). We’ll denote these dependencies using a style similar to generic parameters in an OOP language.

TYPE($resource: id)<TYPE($resource)> = { id, email, name, path, hash }
TYPE($resource: event-timestamp)<TYPE($resource)> = { event-timestamp, created-timestamp, modified-timestamp, login-timestamp}

Again, we can immediately exclude the abstract types id and event-timestamp.

TYPE($resource: id)<TYPE($resource)> = { email, name, path, hash }
TYPE($resource: event-timestamp)<TYPE($resource)> = { created-timestamp, modified-timestamp, login-timestamp}

In order to work out the actual return types, we can now substitute in the possible return types of $resource, and identify the return types of the attributes based on the attribute ownerships in the schema.

TYPE($resource: id)<file> = { path }
TYPE($resource: id)<directory> = { path }
TYPE($resource: id)<repository> = { name }
TYPE($resource: event-timestamp)<file> = { created-timestamp, modified-timestamp}
TYPE($resource: event-timestamp)<directory> = { created-timestamp, modified-timestamp}
TYPE($resource: event-timestamp)<repository> = { created-timestamp, modified-timestamp}

We have no identified the possible return types for every variable in the query, so this concludes the type-inference process. The return types can be enumerated based on their co-dependencies as rows in the following table.

TYPE($resource)TYPE($resource: id)<TYPE($resource)>TYPE($resource: event-timestamp)<TYPE($resource)>
filepathcreated-timestamp
filepathmodified-timestamp
directorypathcreated-timestamp
directorypathmodified-timestamp
repositorynamecreated-timestamp
repositorynamemodified-timestamp

This is roughly how TypeDB’s type-inference engine works to resolve the possible return types of a query. The return types are then supplied to the query planner, which searches for instances of the those types in the data.

Interface polymorphism

Let’s try another query, this time more complicated. It retrieves the ID of every resource and that of its owner. This query makes use of interface polymorphism, which allows us to declaratively query an interface (in this case the roles of resource-ownership) to retrieve instances of all types that implement it.

match
(resource: $resource, resource-owner: $owner) isa resource-ownership;
fetch
$resource: id;
$owner: id;

This time we have two variables in the match clause, $resource and $owner, and we don’t don’t know their types from the query. We only know that those types need to play the resource and owner roles in resource-ownership, which is enough information to look through our schema and determine those possible types.

TYPE($resource) = { resource }
TYPE($owner) = { user, user-group }

We can then expand these sets of types to include subtypes.

TYPE($resource) = { resource, file, directory, repository }
TYPE($owner) = { user, admin, user-group }

Once again, we exclude abstract types.

TYPE($resource) = { file, directory, repository }
TYPE($owner) = { user, admin, user-group }

Next, we determine types for variables in the fetch clause.

TYPE($resource: id)<TYPE($resource)> = { email, name, path, hash }
TYPE($owner: id)<TYPE($owner)> = { email, name, path, hash }

And finally, substitute the possible types of $resource and $owner in based on their dependencies.

TYPE($resource: id)<file> = { path }
TYPE($resource: id)<directory> = { path }
TYPE($resource: id)<repository> = { name }
TYPE($owner: id)<user> = { email }
TYPE($owner: id)<admin> = { email }
TYPE($owner: id)<user-group> = { name }

We can now enumerate the possible return types and their co-dependencies.

TYPE($resource)TYPE($resource: id)<TYPE($resource)>TYPE($owner)TYPE($owner: id)<TYPE($owner>
filepathuseremail
filepathadminemail
filepathuser-groupname
directorypathuseremail
directorypathadminemail
directorypathuser-groupname
repositorynameuseremail
repositorynameadminemail
repositorynameuser-groupname

Type inference is performed every time a query is executed and the type-inference engine always uses the most up-to-date type definitions available in the schema. If we update the schema with new types that play roles in resource-ownership and own subtypes of id, then this list of return types would expand if the query is run again!

Semantic validation

The type-inference engine also checks queries for semantic validity. As we will see, the process behind semantic validation is the same as is used to resolve return types! In the next examples, we’ll examine what happens when we try to resolve the return types for a nonsensical query.

match
$permission (subject: $subject, object: $object, access: $access) isa permission;
fetch
$permission: id;
$subject: id;
$object: id;
$access: id;

In this query, we’re trying to retrieve the IDs of the subject, object, and access in every permission along with the ID of the permission itself. However, permissions do not have IDs. To see how TypeDB interprets this query, we’ll begin by inferring the types of the variables in the match clause.

TYPE($permission) = { permission }
TYPE($subject) = { user, user-group }
TYPE($object) = { user-group, resource }
TYPE($access) = { access }

As before, we resolve subtypes then remove abstract types.

TYPE($permission) = { permission }
TYPE($subject) = { user, admin, user-group }
TYPE($object) = { user-group, file, directory, repository }
TYPE($access) = { access }

Next, we infer the types of the variables in the fetch clause.

TYPE($permission: id)<TYPE($permission)> = { email, name, path, hash }
TYPE($subject: id)<TYPE($subject)> = { email, name, path, hash }
TYPE($object: id)<TYPE($object)> = { email, name, path, hash }
TYPE($access: id)<TYPE($access)> = { email, name, path, hash }

But we then run into a problem when substituting in the type dependencies.

TYPE($permission: id)<permission> = { }
TYPE($subject: id)<user> = { email }
TYPE($subject: id)<admin> = { email }
TYPE($subject: id)<user-group> = { name }
TYPE($object: id)<user-group> = { name }
TYPE($object: id)<file> = { path }
TYPE($object: id)<directory> = { path }
TYPE($object: id)<repository> = { name }
TYPE($access: id)<access> = { name }

The set of types for the permission ID is empty, as the only possible type of $permission is permission, which doesn’t own email, name, path, or hash. As a result, there is no possible return type for $permission: id. Every result of a query must include each variable exactly once, so there can be no results for this query. This indicates that the query is not semantically valid and TypeDB throws an error! Let’s try another example of a semantically invalid query.

match
(group: $group, group-owner: $owner) isa group-ownership;
$login (subject: $group) isa login-event;
fetch
$group: name;
$login: login-timestamp, success;

We’re trying to retrieve the list of login events for the owner of each user group. For each login event, we want to know when it happened and whether it was successful or not. However, we’ve made a mistake: we’ve asked for login events associated with $group itself rather than $owner. From the first line in the match pattern, we can tell that the types of $group and $owner must play the roles of group and group-owner in group-ownership.

TYPE($group) = { user-group }
TYPE($owner) = { admin }

From the second line, we can tell that $group must be of a type that plays subject in login-event.

TYPE($group) = { user }

We then expand to include subtypes.

TYPE($group) = { user, admin }

By default, all the constraints in a match clause must be satisfied simultaneously. In order to determine the possible return types of $group, we must take the intersection of the sets for each constraint.

TYPE($group) = { user-group }{ user, admin }
TYPE($owner) = { admin }

This resolves to a empty set.

TYPE($group) = { }
TYPE($owner) = { admin }

For this query, we don’t even have to look at the fetch clause to determine it’s invalid. It is clear at this stage there are no possible sets of return types, and so TypeDB will throw an error.

Summary

TypeDB is designed with polymorphism as a central feature. By combining a conceptual data model with a type-theoretic query language, the type-inference engine is able to resolve declarative polymorphic queries and identify semantically invalid ones. In other database systems, query resolution and validation of this kind are not possible, because they lack the syntax to describe polymorphic constraints between types.

Share this article

TypeDB Newsletter

Stay up to date with the latest TypeDB announcements and events.

Subscribe to Newsletter

Further Learning

TypeDB's Core Features

Explore TypeDB's core features in detail, including validated polymorphic data models and declarative querying, and learn about their impact on database engineering.

Read article

More Strongly-Typed

Learn about how the role of types in TypeDB is conceived in a fundamentally more powerful way than in other databases.

Read article

Semantic Integrity Loss

Relational, document, and graph databases are unable to natively describe polymorphic constraints between types, leaving them unable to enforce the semantic integrity of data.

Read article

Feedback