TypeDB Learning Center

TypeDB: the Polymorphic Database


TypeDB is a polymorphic database with a conceptual data model, a strong subtyping system, a symbolic reasoning engine, and a type-theoretic language: TypeQL. This article breaks this down and explores these features in detail, discussing the impact on database engineering.

To begin with, a schema for a simple filesystem is presented. It illustrates how data domains can be modeled naturally by using the conceptual PERA data model, including the occurrence of polymorphism and complex relation types. The expressive power and elegance of TypeQL is then demonstrated by presenting several declarative queries on that schema, featuring full polymorphic querying capabilities and type variablization. The strong type system that enables type inference and semantic validation is then examined to understand how declarative polymorphic querying is achieved. Finally, one of the most useful outcomes of the type system is showcased: TypeDB’s built-in symbolic reasoner.

The examples shown in this article are designed so that they can be followed along. If you would like to do so, ensure you have set up TypeDB. The examples can be copied directly from the code blocks in this article or found on GitHub.

Conceptual data model

TypeDB uses the polymorphic entity-relation-attribute (PERA) model for schemas and data. It is an extension of the entity-relationship (ER) model (Chen, 1976), which is the most widely used tool for designing conceptual data models due to its elegant simplicity and expressive power. With most database paradigms, the conceptual model is only a starting point, and it must be translated into a logical model that meets the modeling capabilities of the database system. As TypeDB uses the PERA model, any ER model can be directly implemented without translation. Likewise, any logical data model derived from an ER model can also be directly implemented, enabling trivial migration of data from other database paradigms.

Like the ER model, the PERA model consists entities, relations, and attributes, but additionally introduces polymorphism in the form of inheritance and interfaces. A TypeDB schema is composed of type definitions. Every type inherits from one of the three built-in root types, entity, relation, and attribute, or a previously user-defined type. Interfaces can be defined between types in the form of attribute ownerships and relation roles. All types inherit the interfaces of their supertypes, in addition to the general capabilities defined for their root types.

Entities represent single concepts with independent existences. They might practically require other concepts to exist, such as a car that cannot exist without its parts, but can be conceptualized without reference to those other concepts: a car can be imagined without considering its parts. Entity types are able to own attribute types and play roles in relation types.

Relations represent single concepts with existences that depend on at least one other concept. They cannot be conceptualized without reference to without those concepts: it is impossible to imagine a marriage without considering its spouses. Like entity types, relation types are able to own attribute types and play roles in other relation types.

Attributes represent fixed values in a given domain. They are properties of their domains rather than of their owners, and so are not uniquely defined by any concepts that might own them, nor do they require any owners to exist. Attribute types have specific value types, like boolean, double, and string, and can be owned by both entity and relation types.

To demonstrate the PERA model, this article will use a data model for a simple filesystem based on a discretionary access control (DAC) permission system. In the DAC framework, all objects (for instance files, directories, and user groups) have owners, and permissions on an object are granted to other users by its owner. The model is described using TypeQL, and familiarity is assumed with all basic syntax. Even without knowledge of the relevant keywords, it is easily understandable due to its near-natural language design.

Defining entity types

To begin with, we’ll define some entity types along with the attribute types they own with a define query:

define

user sub entity,
    owns email,
    owns password-hash,
    owns created-timestamp,
    owns active;

admin sub user;

user-group sub entity,
    owns name,
    owns created-timestamp;

resource sub entity,
    abstract,
    owns id,
    owns created-timestamp,
    owns modified-timestamp;

file sub resource,
    owns path as id;

directory sub resource,
    owns path as id;

access sub entity,
    owns name;

id sub attribute, abstract, value string;
email sub id;
name sub id;
path sub id;
password-hash sub attribute, value string;
event-timestamp sub attribute, abstract, value datetime;
created-timestamp sub event-timestamp;
modified-timestamp sub event-timestamp;
active sub attribute, value boolean;

It is important to remember that TypeQL is fully declarative, so the statements in the query can be placed in whatever order we like. There are a number of powerful features we’ve made use of here, some examples being:

  • admin is a subtype of user, so it inherits all of the supertype’s capabilities. Currently, they are ownership of email, password-hash, created-timestamp, and active, but when we give user more capabilities later on, they’ll be inherited as well.
  • Both user-group and resource own created-timestamp. This is possible because the ownership of attribute types is an interface that can be independently implemented by different entity types with no common supertype. Because of this, if two entities each have an attribute with the same type and value, they actually own exactly the same attribute. This is contrary to most other database paradigms, where two records containing the same value in the same field store that value twice.
  • resource is abstract. It can’t be instantiated, so we’ll have to define at least one non-abstract subtype in order to be able to insert resources into the database. Here, we’ve defined the subtypes file and directory.
  • id is also abstract. As it is nonsensical for concrete entity types to own abstract attribute types, only abstract entities are permitted to own them, like resource in this case. Subtypes of resource would inherit the ability to own id, and so they must override the capability by instead owning a subtype of id. In this case, file and directory both own path.

Defining relation types

With the entity types defined, we can now start thinking about relation types:

define

ownership sub relation,
    abstract,
    relates owned,
    relates owner;

group-ownership sub ownership,
    relates group as owned,
    relates group-owner as owner;

resource-ownership sub ownership,
    relates resource as owned,
    relates resource-owner as owner;

membership sub relation,
    abstract,
    relates parent,
    relates member;

group-membership sub membership,
    relates group as parent,
    relates group-member as member;

directory-membership sub membership,
    relates directory as parent,
    relates directory-member as member;

permission sub relation,
    relates subject,
    relates object,
    relates access;

login-event sub relation,
    relates subject,
    owns login-timestamp,
    owns success;

login-timestamp sub event-timestamp;
success sub attribute, value boolean;

Each relation type is defined along with its role interfaces. Like ownership of attribute types, these interfaces can be independently implemented by different entity (or relation) types, as we’ll see shortly. We’ve also used some of the same features that we used for the entity types, in addition to some new ones:

  • Like with the entity types, some relation types are abstract, for instance ownership. All roles are concrete, so it would be valid for group-ownership to be defined without roles, as owned and owner would be inherited, but in this case we’ve chosen to override them with group and group-owner in order to specialize them. The new roles are functionally subtypes of the overridden ones.
  • Like the entity types, the relation types are able to own attribute types. In this case, we’ve only got one that does: login-event, which owns login-timestamp and success.
  • login-timestamp is a subtype of the previously defined event-timestamp. Once a type is defined, we can extend it with subtypes at any time in the future.
  • The number of roles a relation type has is fully configurable. Most of the relation types have two roles, but permission has three and login-event has only one. When we get to inserting data, we’ll see how this let’s us create unary, binary, and ternary relations. This is because TypeDB implements n-ary relations contrary to graph databases, in which all relations are necessarily binary.

Defining roleplayers

Now we can define which types play which roles:

define

user plays resource-ownership:resource-owner,
    plays group-membership:group-member,
    plays permission:subject,
    plays login-event:subject;

admin plays group-ownership:group-owner;

user-group plays group-ownership:group,
    plays resource-ownership:resource-owner,
    plays group-membership:group,
    plays permission:subject,
    plays permission:object;

resource plays resource-ownership:resource,
    plays permission:object;

file plays directory-membership:directory-member;

directory plays directory-membership:directory,
    plays directory-membership:directory-member;

access plays permission:access;

As with ownership of attribute types, there are cases where we have different types independently playing the same role, for instance user and user-group, which both play the role resource-owner in resource-ownership. Role-playing capabilities are also inherited in the type hierarchy, so admin can play all of the roles that user can. While we don’t use the feature for this data model, it is of course possible to define that a relation type plays a role in the same way as we do for entity types, allowing the instantiation of relations that plays roles in other relations (nested relations). Finally, we can see that some types are defined to play multiple roles in the same relation, like user-group playing both subject and object in permission, a form of variadic relation.

Inserting data

With the schema in place, we can begin inserting data. If you are following along with the examples, make sure to insert the full initial dataset found here, after which you will not need to insert any additional data in this section. Let’s take a closer look at some insert queries, starting with some filesystem users.

insert
$rhonda isa user,
    has email "rhonda@vaticle.com",
    has active true,
    has created-timestamp 2023-11-10T15:19:43,
    has password-hash "6f1127d4b8ee9bd64df9b0ae3f8a7f58";

insert
$cedric isa admin,
    has email "cedric@vaticle.com",
    has active true,
    has created-timestamp 2023-01-01T00:00:00,
    has password-hash "e0d29e328f65b8074d7df218c73b1726";

In each of these two insert queries we create a user, namely Rhonda and Cedric, and give them each several attributes. Cedric is in fact an admin, and therefore also a user by extension. Because admin is a subtype of user, it inherits all its behaviours, particularly the ability to own the four inserted attributes. In the next query, we create a login-event relation for Rhonda.

match
$rhonda isa user, has email "rhonda@vaticle.com";
insert
(subject: $rhonda) isa login-event,
    has login-timestamp 2023-12-29T13:38:51,
    has success true;

This insertion query starts with a match clause, which allows us to bind existing data instances and reference them in the inserted data. The login event is a unary relation: it has only a single roleplayer playing the subject role. This is because it conceptually depends on the subject of the login (and nothing else). Without a subject to log themselves in, a login event cannot exist. We also assign the event some attributes. In the next query, we’ll once again match an existing user, Cedric, and then insert a new user group Engineers with him as the owner via a binary group-ownership relation.

match
$cedric isa admin, has email "cedric@vaticle.com";
insert
$engineers isa user-group,
    has name "engineers",
    has created-timestamp 2023-09-07T17:47:57;
(group: $engineers, group-owner: $cedric) isa group-ownership;

This time, the tuple representation of the relation has two elements, one for each of the two roleplayers. This design extends trivially to represent relations with any number of roleplayers, as we’ll see shortly with a ternary relation. In the next query, we’ll create a file and and see how we can give it multiple modified-timestamp attributes.

insert
$release-notes isa file,
    has path "/vaticle/engineering/projects/typedb-3.0/release-notes.md",
    has created-timestamp 2023-03-18T20:24:08,
    has modified-timestamp 2023-12-17T22:31:07,
    has modified-timestamp 2023-06-16T04:07:35,
    has modified-timestamp 2023-08-29T17:35:51,
    has modified-timestamp 2023-10-16T17:47:04;

In this case, the modified timestamp is effectively a multivalued attribute. TypeDB allows easy insertion and querying of multivalued attributes in this way without having to consider normalization. If we were to query for modified timestamps of this particular file, we would get back four results. Finally, we’ll create some access entities then insert a permission relation.

insert
$read isa access, has name "read";
$write isa access, has name "write";
$delete isa access, has name "delete";

match
$engineers isa user-group, has name "engineers";
$release-notes isa file, has path "/vaticle/engineering/projects/typedb-3.0/release-notes.md";
$write isa access, has name "write";
insert
(subject: $engineers, object: $release-notes, access: $write) isa permission;

Here we’ve used a ternary relation to represent the following concept:

The Engineers user group has write permissions on the release notes file.

It’s important to note that there are other ways this could be modelled. We could instead, for instance, have a binary write-permission relation between a subject and object, but this approach gives us a bit more extensibility to create many different kinds of access that might exist in our system without having to modify the schema to do so. Modelling in TypeDB is a conceptual exercise, and apart from the overruling principles laid out earlier that define the differences between entities, relations, and attributes, there are very few restrictions on what can and can’t be done. We never have to modify the conceptual model to implement it, but there are many ways in which we could validly model the same concepts according to the business logic of the data domain.

One of the biggest advantages of the conceptual schema is in data validation. Much like relational schemas, conceptual schemas can be validated against to ensure integrity of the data. But unlike relational schemas which can only enforce referential integrity, validation against conceptual schemas can enforce semantic integrity as well. For example, if we tried to insert the following nonsensical data, validation would fail and the query would be rejected by the database.

insert
$omar isa user,
    has path "/vaticle/omar",
    has success true;
## Error> [CXN05] The transaction is closed because of the error(s):
[THW03] Invalid Write: Attribute of type 'success' is not defined to be owned by type 'user'.
## Terminated

This data is obviously unsound as users do not have path or success attributes. Semantic validation also serves to block insertions of apparently semantically sound data in which a mistake has been made. In the following example, we attempt try to insert a user as the owner of a user group, which seems reasonable but is not possible as only admins can own user groups.

insert
$omar isa user, has email "omar@vaticle.com";
$researchers isa user-group, has name "researchers";
(group: $researchers, group-owner: $omar) isa group-ownership;
## Error> [CXN05] The transaction is closed because of the error(s):
[THW08] Invalid Write: The type 'user' does not play the role type 'group-ownership:group-owner'.
## Terminated

What is and isn’t semantically sound is determined by the schema, which serves as an encoding of the data domain and provides semantic context for all queries. All write queries are validated against the schema to ensure the data is in a sound state on completion of the transaction. Similarly, read queries are also validated to ensure correctness, as will be examined later in this article. The next section will explore TypeDB’s polymorphic querying capabilities using the schema defined so far.

Type-theoretic language

TypeDB’s query language is the type-theoretic language TypeQL. One of the most important design principles of TypeQL is:

Everything has a type, and so everything can be a variable.

This allows for highly expressive querying power beyond the capabilities of other contemporary database paradigms. In particular, it allows for declarative polymorphic querying, which is not currently possible in any other paradigm. This section goes through several examples of polymorphic querying:

  • Querying with inheritance polymorphism.
  • Querying with interface polymorphism.
  • Querying with parametric polymorphism.
  • Querying the schema.
  • Mixed schema-data querying.

The declarative nature of these queries is then illustrated with an example of a schema extension, after which the queries do not need to be modified to return instances of the newly defined types.

Querying with inheritance polymorphism

In this query, we ask for a list of users, retrieving the email and active status of each one.

match
$user isa user;
fetch
$user: email, active;
{
    "user": {
        "active": [ { "value": true, "value_type": "boolean", "type": { "label": "active", "root": "attribute" } } ],
        "email": [ { "value": "cedric@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "admin", "root": "entity" }
    }
}
{
    "user": {
        "active": [ { "value": true, "value_type": "boolean", "type": { "label": "active", "root": "attribute" } } ],
        "email": [ { "value": "reginald@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "user", "root": "entity" }
    }
}
{
    "user": {
        "active": [ { "value": true, "value_type": "boolean", "type": { "label": "active", "root": "attribute" } } ],
        "email": [ { "value": "tommy@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "user", "root": "entity" }
    }
}
{
    "user": {
        "active": [ { "value": false, "value_type": "boolean", "type": { "label": "active", "root": "attribute" } } ],
        "email": [ { "value": "jimmy@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "user", "root": "entity" }
    }
}
{
    "user": {
        "active": [ { "value": true, "value_type": "boolean", "type": { "label": "active", "root": "attribute" } } ],
        "email": [ { "value": "kima@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "user", "root": "entity" }
    }
}
{
    "user": {
        "active": [ { "value": true, "value_type": "boolean", "type": { "label": "active", "root": "attribute" } } ],
        "email": [ { "value": "rhonda@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "user", "root": "entity" }
    }
}

Each result object represents a user, containing any email and active attributes they have in addition to the user’s entity type. Examining the type field for the results indicates that one of the users, Cedric, is an admin rather than a user. Despite only asking for users, TypeDB has returned an admin because admin is a subtype of user and therefore admins are users themselves. This query leverages inheritance polymorphism: the ability to query a type and retrieve instances of it and all of its subtypes. If we chose to, we could instead get back only users and not admins by using the isa! keyword instead of isa in our query, which indicates that an exact type match should be performed. This keyword will be demonstrated in a later query. In this next query, we use inheritance polymorphism to get a list of all resources.

match
$resource isa resource;
fetch
$resource: id, event-timestamp;
{
    "resource": {
        "event-timestamp": [
            { "value": "2023-08-08T11:57:54.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-08-13T13:16:10.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-10-10T13:22:37.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-10-10T13:39:19.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-11-29T09:17:47.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-06-14T22:44:22.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } }
        ],
        "id": [ { "value": "/vaticle/research/prototypes/nlp-query-generator.py", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    }
}
{
    "resource": {
        "event-timestamp": [
            { "value": "2023-06-22T11:36:20.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-06-29T12:16:33.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-07-26T12:33:37.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-01-31T19:39:32.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } }
        ],
        "id": [ { "value": "/vaticle/research/prototypes/root-cause-analyzer.py", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    }
}
{
    "resource": {
        "event-timestamp": [
            { "value": "2023-02-15T09:13:55.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-03-25T12:11:15.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-06-02T06:17:54.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-12-04T23:07:09.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-01-28T09:16:24.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } }
        ],
        "id": [ { "value": "/vaticle/engineering/tools/performance-profiler.rs", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    }
}
{
    "resource": {
        "event-timestamp": [
            { "value": "2023-11-23T07:05:18.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-10-01T01:46:23.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } }
        ],
        "id": [ { "value": "/vaticle/engineering/projects/typedb-3.0", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "directory", "root": "entity" }
    }
}
{
    "resource": {
        "event-timestamp": [
            { "value": "2023-10-20T13:24:19.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-10-22T12:22:51.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-12-01T03:58:22.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-12-28T02:00:04.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-03-13T14:25:07.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } }
        ],
        "id": [ { "value": "/vaticle/engineering/projects/typedb-cloud-beta", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "directory", "root": "entity" }
    }
}

There are a lot of results for this query, so the result set shown above has been trimmed down to five selected results. This will be the case for many of the queries shown in this article. If you would like to see the full result set for any query, you can find the dataset and queries on GitHub.

Much like the previous query, we are retrieving files and directories, which are subtypes of resource. In fact, we will not get any results of type resource as the type is abstract. However, inheritance polymorphism also affects the attributes returned. Each attribute retrieved has a type field nested in it, and we can see that all of the IDs retrieved are of type path. We can also see that each resource has event timestamps of two different types: created-timestamp and modified-timestamp. These attributes are returned because path is a subtype of id and the two timestamp types are subtypes of event-timestamp. In this way, inheritance polymorphism can be used to retrieve results that are polymorphic in entity types, attribute types, relation types (as will be seen later), or any combination of them.

Querying with interface polymorphism

The second kind of polymorphism we will see exhibited is interface polymorphism. In this query, we ask for things that have created timestamps, without specifying the type of the things we’re looking for, here denoted with the generic $x variable binding.

match
$x has created-timestamp $created;
fetch
$x: id, created-timestamp;
{
    "x": {
        "created-timestamp": [ { "value": "2023-02-03T12:35:44.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } } ],
        "id": [ { "value": "/vaticle/engineering/projects", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "directory", "root": "entity" }
    }
}
{
    "x": {
        "created-timestamp": [ { "value": "2023-01-31T19:39:32.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } } ],
        "id": [ { "value": "/vaticle/research/prototypes/root-cause-analyzer.py", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    }
}
{
    "x": {
        "created-timestamp": [ { "value": "2023-03-11T02:30:39.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } } ],
        "id": [ { "value": "reginald@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "user", "root": "entity" }
    }
}
{
    "x": {
        "created-timestamp": [ { "value": "2023-09-07T17:47:57.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } } ],
        "id": [ { "value": "engineers", "value_type": "string", "type": { "label": "name", "root": "attribute" } } ],
        "type": { "label": "user-group", "root": "entity" }
    }
}
{
    "x": {
        "created-timestamp": [ { "value": "2023-01-01T00:00:00.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } } ],
        "id": [ { "value": "cedric@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "admin", "root": "entity" }
    }
}

In this selection of results, we can see that five different types have been returned for $x: directory, file, user, user-group, and admin. This is because instances of all of those types have a created timestamp, as we earlier defined that all of those types own created-timestamp. Here the attribute ownership is acting as an interface that we are querying against. We can also see inheritance polymorphism at play in this query, as we have again queried the id type and retrieved results back of multiple subtypes. Most natural-language questions typically include elements of different polymorphism types, but this is not often apparent until we attempt to solve them programmatically. In the next query, we leverage interface polymorphism on a relation’s roles instead of an attribute ownership.

match
(resource: $resource, resource-owner: $owner) isa resource-ownership;
fetch
$resource: id;
$owner: id;
{
    "owner": {
        "id": [ { "value": "rhonda@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "user", "root": "entity" }
    },
    "resource": {
        "id": [ { "value": "/vaticle/research/prototypes/root-cause-analyzer.py", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    }
}
{
    "owner": {
        "id": [ { "value": "kima@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "user", "root": "entity" }
    },
    "resource": {
        "id": [ { "value": "/vaticle/engineering/projects/typedb-cloud-beta", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "directory", "root": "entity" }
    }
}
{
    "owner": {
        "id": [ { "value": "engineers", "value_type": "string", "type": { "label": "name", "root": "attribute" } } ],
        "type": { "label": "user-group", "root": "entity" }
    },
    "resource": {
        "id": [ { "value": "/vaticle/engineering", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "directory", "root": "entity" }
    }
}
{
    "owner": {
        "id": [ { "value": "cedric@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "admin", "root": "entity" }
    },
    "resource": {
        "id": [ { "value": "/vaticle", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "directory", "root": "entity" }
    }
}
{
    "owner": {
        "id": [ { "value": "cedric@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "admin", "root": "entity" }
    },
    "resource": {
        "id": [ { "value": "/vaticle/engineering/projects/typedb-3.0/traversal-engine.rs", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    }
}

This query involves two interfaces: the resource and resource-owner roles of the resource-ownership relation type. As a result, the range of types that can be found in the results is combinatorial. As there are two concrete types that implement resource, and three that implement resource-owner, there are six possible pairs of types that can be returned as answers:

resourceresource-owner
fileuser
fileadmin
fileuser-group
directoryuser
directoryadmin
directoryuser-group

In this selection of results, we can see five different pairs. Because of TypeDB’s declarative polymorphic querying, we can retrieve results without enumerating the possible return types in the query.

Querying with parametric polymorphism

The final type of polymorphism is parametric polymorphism. In this query, we’ve actually variablized the type $attribute-type. By making this type a parameter, we can perform a parametric query over it, and return its value in the response. In fact, this query does not involve a single explicitly specified type or role. To break down what it does, we’re looking for something $x that has two attributes $attribute-1 and $attribute-2. Both of those attributes must be exactly the same type $attribute-type, and those two attributes cannot be the same attribute. In effect, this query searches for multivalued attributes of any type. The query then returns the exact type of those attributes, along with the exact type and ID of their owners (if they have IDs).

match
$x has $attribute-1;
$x has $attribute-2;
$attribute-1 isa! $attribute-type;
$attribute-2 isa! $attribute-type;
not { $attribute-1 is $attribute-2; };
fetch
$attribute-type;
"x-type": { match $x isa! $t; fetch $t; };
"x-id": { match $x has id $id; fetch $id; };
{
    "attribute-type": { "label": "modified-timestamp", "root": "attribute" },
    "x-id": [ { "id": { "value": "/vaticle/engineering/tools/performance-profiler.rs", "value_type": "string", "type": { "label": "path", "root": "attribute" } } } ],
    "x-type": [ { "t": { "label": "file", "root": "entity" } } ]
}
{
    "attribute-type": { "label": "modified-timestamp", "root": "attribute" },
    "x-id": [ { "id": { "value": "/vaticle/research/prototypes/root-cause-analyzer.py", "value_type": "string", "type": { "label": "path", "root": "attribute" } } } ],
    "x-type": [ { "t": { "label": "file", "root": "entity" } } ]
}
{
    "attribute-type": { "label": "modified-timestamp", "root": "attribute" },
    "x-id": [ { "id": { "value": "/vaticle/research/papers", "value_type": "string", "type": { "label": "path", "root": "attribute" } } } ],
    "x-type": [ { "t": { "label": "directory", "root": "entity" } } ]
}
{
    "attribute-type": { "label": "modified-timestamp", "root": "attribute" },
    "x-id": [ { "id": { "value": "/vaticle/engineering/tools", "value_type": "string", "type": { "label": "path", "root": "attribute" } } } ],
    "x-type": [ { "t": { "label": "directory", "root": "entity" } } ]
}
{
    "attribute-type": { "label": "modified-timestamp", "root": "attribute" },
    "x-id": [ { "id": { "value": "/vaticle/engineering/projects/typedb-cloud-beta/user-guide.md", "value_type": "string", "type": { "label": "path", "root": "attribute" } } } ],
    "x-type": [ { "t": { "label": "file", "root": "entity" } } ]
}

We can see from the selected results that the only attribute type returned is modified-timestamp. In this particular model, it is the only multivalued attribute type, but this query would return a list of such types and their owners for any schema. The pattern in the match clause of this query contains no specific type labels and so could be run against any database independent of the schema, a feature common to all parametric queries.

Querying the schema

In the previous query, we variablized a type. Now in the following queries, we use this property to query the schema, returning only types rather than any data instances, and by utilising the same syntax keywords we used in the schema definition.

match
file owns $attribute-type;
fetch
$attribute-type;
{ "attribute-type": { "label": "modified-timestamp", "root": "attribute" } }
{ "attribute-type": { "label": "path", "root": "attribute" } }
{ "attribute-type": { "label": "created-timestamp", "root": "attribute" } }

This query simply retrieves the list of attribute types that files can own. The results contain created-timestamp and modified-timestamp, which are owned by resource and inherited by file, in addition to path, which is owned directly by file. In the next query, we query for any concrete types that play a role in an ownership relation.

match
$ownership-type sub ownership;
$ownership-type relates $role;
$player plays $role;
not { $player abstract; };
fetch
$role;
$player;
{
    "player": { "label": "user-group", "root": "entity" },
    "role": { "label": "group-ownership:group", "root": "relation:role" }
}
{
    "player": { "label": "admin", "root": "entity" },
    "role": { "label": "group-ownership:group-owner", "root": "relation:role" }
}
{
    "player": { "label": "file", "root": "entity" },
    "role": { "label": "resource-ownership:resource", "root": "relation:role" }
}
{
    "player": { "label": "directory", "root": "entity" },
    "role": { "label": "resource-ownership:resource", "root": "relation:role" }
}
{
    "player": { "label": "admin", "root": "entity" },
    "role": { "label": "resource-ownership:resource-owner", "root": "relation:role" }
}
{
    "player": { "label": "user", "root": "entity" },
    "role": { "label": "resource-ownership:resource-owner", "root": "relation:role" }
}
{
    "player": { "label": "user-group", "root": "entity" },
    "role": { "label": "resource-ownership:resource-owner", "root": "relation:role" }
}

While no types play roles in the abstract ownership relation, we see that we retrieve roleplayer types for its subtypes: group-ownership and resource-ownership. These types of queries against schemas (or data structures in schemaless databases) are not possible in most contemporary database paradigms, as their query languages do not allow for variablization of types: tables and columns in relational databases, collections and fields in document databases, and labels and properties in graph databases.

Mixed schema-data querying

Querying the schema is a powerful and unique feature of a type-theoretic query language, but the most effective way to utilise it is in mixed schema-data queries. For example, here we ask for all instances of any type that can play the role of resource-owner in a resource-ownership, even if some of those particular instances do not currently do so.

match
$type plays resource-ownership:resource-owner;
$x isa $type;
fetch
$x: id;
{
    "x": {
        "id": [ { "value": "cedric@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "admin", "root": "entity" }
    }
}
{
    "x": {
        "id": [ { "value": "reginald@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "user", "root": "entity" }
    }
}
{
    "x": {
        "id": [ { "value": "tommy@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "user", "root": "entity" }
    }
}
{
    "x": {
        "id": [ { "value": "jimmy@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "user", "root": "entity" }
    }
}
{
    "x": {
        "id": [ { "value": "kima@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "user", "root": "entity" }
    }
}
{
    "x": {
        "id": [ { "value": "rhonda@vaticle.com", "value_type": "string", "type": { "label": "email", "root": "attribute" } } ],
        "type": { "label": "user", "root": "entity" }
    }
}
{
    "x": {
        "id": [ { "value": "researchers", "value_type": "string", "type": { "label": "name", "root": "attribute" } } ],
        "type": { "label": "user-group", "root": "entity" }
    }
}
{
    "x": {
        "id": [ { "value": "engineers", "value_type": "string", "type": { "label": "name", "root": "attribute" } } ],
        "type": { "label": "user-group", "root": "entity" }
    }
}

In the results, we see we’ve returned all users (including admins) and user groups, which are the filesystem entities that can own resources. Simple schema-data queries first use schema-querying syntax to bind a set of types (in this case user and user-group) and then return instances of those types. Like pure schema queries, they are only possible in query languages that allow type variablization.

Extending the schema

Now that we’ve seen how TypeQL enables construction of declarative polymorphic queries, we’ll see how this allows for trivial data model extensions and flexible application architectures. To illustrate this, we’ll begin by adding a new repository resource type to the schema, and a commit relation relating a repository and an author. Repositories are identified by their name attribute, and commits by a new hash attribute. Then we’ll insert some new data.

define

repository sub resource,
    owns name as id,
    plays commit:repository;

commit sub relation,
    relates repository,
    relates author,
    owns hash,
    owns created-timestamp;

user plays commit:author;

hash sub id;

If we re-run two of the previous queries, we can see that their results have updated to reflect the newly defined types and inserted data, now including results relating to repositories and commits.

match
$resource isa resource;
fetch
$resource: id, event-timestamp;
{
    "resource": {
        "event-timestamp": [ { "value": "2023-05-20T19:12:11.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } } ],
        "id": [ { "value": "/vaticle/engineering/projects/typedb-3.0/query-parser.rs", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    }
}
{
    "resource": {
        "event-timestamp": [
            { "value": "2023-11-23T07:05:18.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-10-01T01:46:23.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } }
        ],
        "id": [ { "value": "/vaticle/engineering/projects/typedb-3.0", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "directory", "root": "entity" }
    }
}
{
    "resource": {
        "event-timestamp": [
            { "value": "2023-10-20T13:24:19.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-10-22T12:22:51.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-12-01T03:58:22.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-12-28T02:00:04.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
            { "value": "2023-03-13T14:25:07.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } }
        ],
        "id": [ { "value": "/vaticle/engineering/projects/typedb-cloud-beta", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "directory", "root": "entity" }
    }
}
{
    "resource": {
        "event-timestamp": [ { "value": "2023-05-07T11:30:32.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } } ],
        "id": [ { "value": "typedb", "value_type": "string", "type": { "label": "name", "root": "attribute" } } ],
        "type": { "label": "repository", "root": "entity" }
    }
}
{
    "resource": {
        "event-timestamp": [ { "value": "2023-06-05T14:13:23.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } } ],
        "id": [ { "value": "typedb-cloud", "value_type": "string", "type": { "label": "name", "root": "attribute" } } ],
        "type": { "label": "repository", "root": "entity" }
    }
}

match
$x has created-timestamp $created;
fetch
$x: id, created-timestamp;
{
    "x": {
        "created-timestamp": [ { "value": "2023-06-05T14:13:23.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } } ],
        "id": [ { "value": "typedb-cloud", "value_type": "string", "type": { "label": "name", "root": "attribute" } } ],
        "type": { "label": "repository", "root": "entity" }
    }
}
{
    "x": {
        "created-timestamp": [ { "value": "2023-06-14T22:44:22.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } } ],
        "id": [ { "value": "/vaticle/research/prototypes/nlp-query-generator.py", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    }
}
{
    "x": {
        "created-timestamp": [ { "value": "2023-02-15T07:47:57.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } } ],
        "id": [ { "value": "/vaticle", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "directory", "root": "entity" }
    }
}
{
    "x": {
        "created-timestamp": [ { "value": "2023-06-30T04:37:33.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } } ],
        "id": [ { "value": "52b1142dabd766ec2e467bc74913de41", "value_type": "string", "type": { "label": "hash", "root": "attribute" } } ],
        "type": { "label": "commit", "root": "relation" }
    }
}
{
    "x": {
        "created-timestamp": [ { "value": "2023-06-28T17:00:14.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } } ],
        "id": [ { "value": "/vaticle/engineering/projects/typedb-3.0/technical-specification.md", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    }
}

This is because polymorphic queries written in TypeQL are fully declarative. As long as we write our queries in such a way that reflects the high-level question being asked, we do not have to update them when the schema is extended. When we leverage inheritance polymorphism, we do not have to enumerate the subtypes of the supertype being queried. Similarly for interface polymorphism, we do not have to enumerate the types that implement the interface being queried. And for parametric polymorphism, we do not have to enumerate the complete list of types to operate on. As a result, because the types are never explicitly listed in the query, they do not have to be modified to include newly added types or exclude previously removed types. This is very different to other database paradigms where such enumeration would normally be necessary.

Strong type system

TypeDB’s declarative polymorphic querying capabilities are the result of its built-in type inference engine, which resolves queries against the conceptual data model in the schema to determine the set of possible return types. The first part of this section breaks down how the type inference engine works by following through some simple examples, illustrating the procedure that might be followed by type inference. In the second part, we then see how this can ensure our queries are semantically validated, preventing nonsensical queries from being run.

Type inference

Let’s start with one of the queries from earlier:

match
$resource isa resource;
fetch
$resource: id, event-timestamp;

This query is simple to follow as there is only one variable in the match clause: $resource. We will now consider each line of the query to begin building the set of return types. For this procedure, it is useful to imagine a type($x) function that returns the type of the variable $x. The pattern in the match clause tells us that $resource has the type resource, indicating that type($resource) must be either resource itself or one of its subtypes:

type‎($resource) = { resource, file, directory, repository }

We can then immediately exclude resource as it is abstract so cannot be returned:

type($resource) = { file, directory, repository }

Thus, file, directory, and repository are the possible return types for $resource. The fetch clause then tells us that we would like to retrieve the id and event-timestamp attributes of $resource. These could resolve to a number of return types:

type($resource: id)<type($resource)> = { id, email, name, path, hash }
type($resource: event-timestamp)<type($resource)> = { event-timestamp, created-timestamp, modified-timestamp, login-timestamp}

Both depend on type($resource), which we’ve denoted here using a style similar to OOP language generics. Again, we can immediately exclude the abstract types id and event-timestamp:

type($resource: id)<type($resource)> = { email, name, path, hash }
type($resource: event-timestamp)<type($resource)> = { created-timestamp, modified-timestamp, login-timestamp}

As we previously worked out the possible return types of $resource, we can now go ahead and identify the return types of the attributes based on the attribute ownerships in the schema:

type($resource: id)<file> = { path }
type($resource: id)<directory> = { path }
type($resource: id)<repository> = { name }
type($resource: event-timestamp)<file> = { created-timestamp, modified-timestamp}
type($resource: event-timestamp)<directory> = { created-timestamp, modified-timestamp}
type($resource: event-timestamp)<repository> = { created-timestamp, modified-timestamp}

This completes the type inference procedure, as we have now identified the possible return types of each variable and can easily enumerate them:

type($resource)type($resource: id)<type($resource)>type($resource: event-timestamp)<type($resource)>
filepathcreated-timestamp
filepathmodified-timestamp
directorypathcreated-timestamp
directorypathmodified-timestamp
repositorynamecreated-timestamp
repositorynamemodified-timestamp

This is roughly how TypeDB’s type inference engine works to identify the return types. These are then supplied to the query planner, which searches for data instances of the correct types. Let’s try another query from earlier, this time more complicated:

match
(resource: $resource, resource-owner: $owner) isa resource-ownership;
fetch
$resource: id;
$owner: id;

Now we have two variables in the match clause, $resource and $owner, and we know nothing about what their supertypes might be, only that they need to play the resource and owner roles in resource-ownership. That is still enough information to look through our schema and determine a list of return types:

type($resource) = { resource }
type($owner) = { user, user-group }

We can then expand these sets of types to include subtypes:

type($resource) = { resource, file, directory, repository }
type($owner) = { user, admin, user-group }

Once again, we exclude abstract types:

type($resource) = { file, directory, repository }
type($owner) = { user, admin, user-group }

Next, we determine types for variables in the fetch clause:

type($resource: id)<type($resource)> = { email, name, path, hash }
type($owner: id)<type($owner)> = { email, name, path, hash }

And finally, substitute the possible types of $resource and $owner in:

type($resource: id)<file> = { path }
type($resource: id)<directory> = { path }
type($resource: id)<repository> = { name }
type($owner: id)<user> = { email }
type($owner: id)<admin> = { email }
type($owner: id)<user-group> = { name }

With these, we arrive at the final enumeration of return types:

type($resource)type($resource: id)<type($resource)>type($owner)type($owner: id)<type($owner>
filepathuseremail
filepathadminemail
filepathuser-groupname
directorypathuseremail
directorypathadminemail
directorypathuser-groupname
repositorynameuseremail
repositorynameadminemail
repositorynameuser-groupname

By comparing this to the table in the previous section, we can see how the search space for this query has been expanded from six possible sets of return types to nine possible sets by the introduction of the new repository type.

Semantic validation

Type inference also serves as a way to perform semantic validation of queries. In the next examples, we’ll examine what happens when we try to execute a nonsensical query.

match
$permission (subject: $subject, object: $object, access: $access) isa permission;
fetch
$permission: id;
$subject: id;
$object: id;
$access: id;

In this query, we’re trying to retrieve the IDs of the subject, object, and access in every permission along with the ID of the permission itself. This is going to be a problem as permissions don’t have IDs. To see how TypeDB interprets this query, we’ll follow through the process of type inference, beginning by inferring the types of the variables in the match clause:

type($permission) = { permission }
type($subject) = { user, user-group }
type($object) = { user-group, resource }
type($access) = { access }

As before, we expand supertypes then remove abstract types:

type($permission) = { permission }
type($subject) = { user, admin, user-group }
type($object) = { user-group, file, directory, repository }
type($access) = { access }

Next, we infer the types of the variables in the fetch clause:

type($permission: id)<type($permission)> = { email, name, path, hash }
type($subject: id)<type($subject)> = { email, name, path, hash }
type($object: id)<type($object)> = { email, name, path, hash }
type($access: id)<type($access)> = { email, name, path, hash }

But we then run into a problem when substituting in the type dependencies:

type($permission: id)<permission> = { }
type($subject: id)<user> = { email }
type($subject: id)<admin> = { email }
type($subject: id)<user-group> = { name }
type($object: id)<user-group> = { name }
type($object: id)<file> = { path }
type($object: id)<directory> = { path }
type($object: id)<repository> = { name }
type($access: id)<access> = { name }

The set of types for $permission: id is empty, as the only possible type of $permission is permission, which doesn’t own email, name, path, or hash. As a result, there is no possible return type for $permission: id. As every result of a query must include each variable exactly once, there can be no results for this query. This indicates that the query is not semantically valid and TypeDB throws an error. This is different to the case where TypeDB infers at least one set of return types but does not find any instances of those types. In the first case, the schema indicates that results are not possible because the query is badly constructed, while in the second case, the query is correctly constructed but no results exist, so TypeDB would simply return an empty result set. Let’s try another example of a semantically invalid query.

match
(group: $group, group-owner: $owner) isa group-ownership;
$login (subject: $group) isa login-event;
fetch
$group: name;
$login: login-timestamp, success;

Here, we’re trying to retrieve the list of login events for the owner of each user group, and for each login event we want to know when it happened and whether it was successful or not. However, we’ve made a mistake: we’ve asked for login events associated with the group itself rather than its owner. This is not immediately obvious as the type of $group is never explicitly stated, but if we examine the entire query then it becomes apparent that it doesn’t make sense. From the first line in the match pattern, we can tell that the types of $group and $owner must play the roles of group and group-owner in group-ownership:

type($group) = { user-group }
type($owner) = { admin }

From the second line, we can tell that $group must be of a type that plays subject in login-event:

type($group) = { user }

We then expand to include subtypes:

type($group) = { user, admin }

By default, all lines in a match clause are combined in a conjunction. When performing type inference on a conjunction, the inferred type sets must be combined by taking the intersection:

type($group) = { user-group }{ user, admin }
type($owner) = { admin }

This resolves to a empty set:

type($group) = { }
type($owner) = { admin }

For this query, we don’t even have to look at the fetch clause to determine it’s invalid. It is clear at this stage there are no possible sets of return types, and so TypeDB will throw an error. This is possible because of the strong type system, which enables TypeDB to perform type inference. In other database systems, this kind of validation is not possible, and poorly constructed queries would simply return an empty result set. This makes it tricky tell if a query incorporates a bug, or if there is indeed no matching data. With TypeDB, we can guard against nonsensical queries by relying on semantic validation.

Symbolic reasoning

TypeDB’s strong type system allows it to perform symbol reasoning, generating new facts from existing data by resolving user-defined rules. Reasoning has a number of powerful use cases, ranging from creating abbreviations for patterns used in many queries, to capturing powerful business logic from the data domain. We’ll begin with a simple example of a rule, then see how we can combine them in creative ways.

Simple rules

In the following query, we retrieve the most recent modified timestamp for each resource.

match
$resource isa resource, has modified-timestamp $t;
not {
    $resource has modified-timestamp $t-2;
    $t-2 > $t;
};
fetch
$resource: id;
$t;
{
    "resource": {
        "id": [ { "value": "/vaticle/engineering/projects/typedb-3.0/traversal-engine.rs", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    },
    "t": { "value": "2023-12-09T14:36:26.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } }
}
{
    "resource": {
        "id": [ { "value": "/vaticle/engineering/projects/typedb-cloud-beta/user-guide.md", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    },
    "t": { "value": "2023-12-06T07:15:09.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } }
}
{
    "resource": {
        "id": [ { "value": "/vaticle/engineering/projects/typedb-3.0/technical-specification.md", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    },
    "t": { "value": "2023-11-24T14:06:36.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } }
}
{
    "resource": {
        "id": [ { "value": "/vaticle/engineering/projects/typedb-3.0/release-notes.md", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    },
    "t": { "value": "2023-12-17T22:31:07.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } }
}
{
    "resource": {
        "id": [ { "value": "/vaticle/engineering/projects/typedb-cloud-beta/user-manager.rs", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    },
    "t": { "value": "2023-12-12T03:45:35.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } }
}

It’s not a complex query, but it’s fairly cumbersome, and we’ll likely need to query last modified timestamps as part of a lot of larger queries, so this pattern will need to be repeated many times across our application code. We could separately insert a special last-modified timestamp on every resource, but then we’d need to manually update it whenever we add a new modified-timestamp, which could lead to inconsistent data if we’re not careful. This is a perfect use case for reasoning, which can ensure our data stays consistent. To begin with, we go ahead and define the new attribute.

define

last-modified sub attribute, value datetime;
resource owns last-modified;

Next, we write a rule using the pattern from our original query, and add it to our database with a define query. This is because rules, like types, are a core part of the schema.

define

rule resource-last-modified:
    when {
        # The original query pattern
        $resource isa resource, has modified-timestamp $t;
        not {
            $resource has modified-timestamp $t-2;
            $t-2 > $t;
        };
        # Copy the value of $t
        ?last-t = $t;
    } then {
        # Generate a new attribute with the same value
        $resource has last-modified ?last-t;
    };

As rules use the same patterns as queries, they make use of type inference. We can see this here: the rule is applied to all subtypes of resource as type inference resolves the inheritance polymorphism present in the pattern. This also means that semantic validation can be applied to rules. In order for a new rule to be successfully committed to the schema, it must be validated or be rejected by the database. This ensures that the results of reasoning are always semantically correct. Now with the rule defined, we can replace our original query by querying for the newly created attribute. The results are identical to those before, except that that type field of $t has changed from modified-timestamp to last-modified.

match
$resource isa resource, has last-modified $t;
fetch
$resource: id;
$t;
{
    "resource": {
        "id": [ { "value": "/vaticle/engineering/projects/typedb-3.0/traversal-engine.rs", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    },
    "t": { "value": "2023-12-09T14:36:26.000", "value_type": "datetime", "type": { "label": "last-modified", "root": "attribute" } }
}
{
    "resource": {
        "id": [ { "value": "/vaticle/engineering/projects/typedb-cloud-beta/user-guide.md", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    },
    "t": { "value": "2023-12-06T07:15:09.000", "value_type": "datetime", "type": { "label": "last-modified", "root": "attribute" } }
}
{
    "resource": {
        "id": [ { "value": "/vaticle/engineering/projects/typedb-3.0/technical-specification.md", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    },
    "t": { "value": "2023-11-24T14:06:36.000", "value_type": "datetime", "type": { "label": "last-modified", "root": "attribute" } }
}
{
    "resource": {
        "id": [ { "value": "/vaticle/engineering/projects/typedb-3.0/release-notes.md", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    },
    "t": { "value": "2023-12-17T22:31:07.000", "value_type": "datetime", "type": { "label": "last-modified", "root": "attribute" } }
}
{
    "resource": {
        "id": [ { "value": "/vaticle/engineering/projects/typedb-cloud-beta/user-manager.rs", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
        "type": { "label": "file", "root": "entity" }
    },
    "t": { "value": "2023-12-12T03:45:35.000", "value_type": "datetime", "type": { "label": "last-modified", "root": "attribute" } }
}

This new attribute type can be used in any other query too, meaning we don’t have to repeat the full pattern from the original query anywhere outside of the rule. An important factor to be aware of when designing rules is that they don’t actually insert new data to the database. The data created by rules is generated at query-time and held in memory. This means that they will always reflect the most recent state of the data when the query is run, so we don’t have to worry about staleness or inconsistencies, nor does it take up any disk space. Once we close the transaction, the reasoning cache is cleared and the resources are released. If we utilise reasoning wherever we need dependent or redundant data, we can ensure that the entire dataset remains consistent and up-to-date.

Rule chaining and branching

When used together, multiple rules can emulate very powerful logic consistent with the way we naturally model data. In the following rule, we check for resources that don’t have any modified timestamps, and we generate last-modified attributes equal to the created timestamps instead. This generates the same conclusion as the previous rule, but as their conditions are mutually exclusive, only one last-modified attribute will be generated per resource.

define

rule implicit-resource-last-modified:
    when {
        $resource isa resource, has created-timestamp $t;
        not { $resource has modified-timestamp $t-2; };
        ?last-t = $t;
    } then {
        $resource has last-modified ?last-t;
    };

In this next rule, we generate modified timestamps for repositories equal to the created timestamps of the commits on those repositories.

define

rule repository-modified-timestamps:
    when {
        $repository isa repository;
        (repository: $repository) isa commit, has created-timestamp $t;
        ?new-t = $t;
    } then {
        $repository has modified-timestamp ?new-t;
    };

TypeDB’s reasoning uses a backward-chaining method to resolve rules, meaning that this new rule can be combined with the two previous ones. If we query for the last modified timestamps of repositories, we now get the timestamps of the most recent commits on them.

match
$repository isa repository, has last-modified $t;
fetch
$repository: id;
$t;
{
    "repository": {
        "id": [ { "value": "typedb-cloud", "value_type": "string", "type": { "label": "name", "root": "attribute" } } ],
        "type": { "label": "repository", "root": "entity" }
    },
    "t": { "value": "2023-12-13T01:05:03.000", "value_type": "datetime", "type": { "label": "last-modified", "root": "attribute" } }
}
{
    "repository": {
        "id": [ { "value": "typedb", "value_type": "string", "type": { "label": "name", "root": "attribute" } } ],
        "type": { "label": "repository", "root": "entity" }
    },
    "t": { "value": "2023-08-19T02:08:58.000", "value_type": "datetime", "type": { "label": "last-modified", "root": "attribute" } }
}

In order to generate these results, TypeDB roughly follows this procedure:

  • Check if the repository has any last-modified attributes.
    • Trigger rule resource-last-modified: check if the repository has any modified-timestamp attributes.
      • Trigger rule repository-modified-timestamps: check if the repository is in any commit relations that have a created-timestamp attribute and assign their values as modified-timestamp attributes to the repository.
      • If the repository has any modified-timestamp attributes, assign the value of the most recent one as a last-modified attribute to the repository.
    • Trigger rule implicit-resource-last-modified: check if the repository has any modified-timestamp attributes.
      • Trigger rule repository-modified-timestamps: check if the repository is in any commit relations that have a created-timestamp attribute and assign their values as modified-timestamp attributes to the repository.
      • If the repository has no modified-timestamp attributes, check if the repository has a created-timestamp and assign its value as a last-modified attribute to the repository.
  • Return any last-modified attributes the repository has.

This description is purely illustrative, intended to model the overarching logic of backward-chaining for these rules. TypeDB’s reasoner minimizes the search space by employing similar optimization algorithms and caching strategies to the query planner, and so may not execute these steps in the order given.

This query showcases two core features of TypeDB’s reasoner: rule chaining and rule branching. The resource-last-modified and implicit-resource-last-modified act as parallel branches of reasoning (in this case mutually exclusive), and both chain onto repository-last-modified. Writing rules in such a way can create complex outcomes from individually simple rules, and can be used to capture the true business logic of natural data domains. Next, we’ll examine a special case of rule chaining: recursion.

Rule recursion

For this example, we need to move away from modified timestamps and focus on another part of the data model. Let’s consider the file type-theory.tex, and query the directories it is in.

match
$paper isa file, has path "/vaticle/research/papers/type-theory.tex";
$directory isa directory, has path $path;
(directory: $directory, directory-member: $paper) isa directory-membership;
fetch
$path;
{ "path": { "value": "/vaticle/research/papers", "value_type": "string", "type": { "label": "path", "root": "attribute" } } }

We only get one result, though logically the file is also in the directories /vaticle/research and /vaticle. We can see this by querying further up the directory structure.

match
$papers isa directory, has path "/vaticle/research/papers";
$directory isa directory, has path $path;
(directory: $directory, directory-member: $papers) isa directory-membership;
fetch
$path;
{ "path": { "value": "/vaticle/research", "value_type": "string", "type": { "label": "path", "root": "attribute" } } }

match
$research isa directory, has path "/vaticle/research";
$directory isa directory, has path $path;
(directory: $directory, directory-member: $research) isa directory-membership;
fetch
$path;
{ "path": { "value": "/vaticle", "value_type": "string", "type": { "label": "path", "root": "attribute" } } }

In application code, this would be an excellent use case for recursion, which would allow us to write a function that easily retrieves all of a file’s parent directories at once. We can do the same with rule chaining, by defining a rule that has a conclusion that matches its own condition.

define

indirect-directory-membership sub directory-membership;

rule transitive-directory-memberships:
    when {
        (directory: $directory-1, directory-member: $directory-2) isa directory-membership;
        (directory: $directory-2, directory-member: $x) isa directory-membership;
    } then {
        (directory: $directory-1, directory-member: $x) isa indirect-directory-membership;
    };

Because indirect-directory-membership is a subtype of directory-membership, the rule can be triggered recursively by backward chaining, and we can query all directory memberships using inheritance polymorphism. If we run the query again after adding this rule to the schema, we can see that the expected directory tree is correctly returned.

match
$paper isa file, has path "/vaticle/research/papers/type-theory.tex";
$directory isa directory, has path $path;
(directory: $directory, directory-member: $paper) isa directory-membership;
fetch
$path;
{ "path": { "value": "/vaticle/research/papers", "value_type": "string", "type": { "label": "path", "root": "attribute" } } }
{ "path": { "value": "/vaticle/research", "value_type": "string", "type": { "label": "path", "root": "attribute" } } }
{ "path": { "value": "/vaticle", "value_type": "string", "type": { "label": "path", "root": "attribute" } } }

While the rule doesn’t define an explicit base case, it terminates because the reasoner will only generate each given concept once. This means that reasoning is always guaranteed to terminate except in a few niche cases, such as using a recursive rule involving arithmetic to generate numeric attributes in an unbounded sequence. In practice, those cases do not arise in practical database design, but is a consequence of TypeQL’s Turing completeness.

These rules demonstrate a few simple ways in which rules can be used, but they only scratch the surface of what is possible with symbolic reasoning. Some additional rules are included in the sample code for the interested reader, but a full treatment will be the topic of a separate article.

Conclusion

TypeDB’s unique features as a polymorphic database allow it to intuitively describe the polymorphism naturally present in complex data domains. This is possible because of its core features, which are designed with polymorphism foremost in mind:

  • The conceptual data model enables the complete elimination of mismatch with object models while enforcing the semantic integrity of inserted data.
  • The type-theoretic query language permits the construction of declarative polymorphic queries, encompassing inheritance, interfaces, and parametricity.
  • The strong type system allows the execution of declarative polymorphic queries by validating and resolving those queries against the schema.
  • The symbolic reasoner captures the logic of the data domain through rules, which generate new facts in a consistent and up-to-date manner.

In addition to these core elements, TypeDB incorporates many other features required of modern databases, such as a robust API and resilient clustering. In future articles, we’ll examine particular impacts TypeDB has on database engineering, with deep-dives into topic like solving object model mismatch, continuous data model extensions, and ensuring real-time data consistency.

Share this article

TypeDB Newsletter

Stay up to date with the latest TypeDB announcements and events.

Subscribe to Newsletter

Further Learning

TypeDB Polymorphism Lecture

Learn what it means for TypeDB to be a polymorphic database, with a conceptual data model and strong subtyping, and see its modern type-theoretic query language in action.

Watch lecture

Why Polymorphic Database Lecture

There are plenty of database paradigms, but they all are tailored to specific data domains. Learn how in order to efficiently query diverse data we need a polymorphic database.

Watch lecture

Type Theory Lecture

Redesigning database foundations from first principles based on contemporary mathematics — type theory.

Watch lecture

Feedback