TypeDB Fundamentals
TypeDB: a New Kind of Database
TypeDB is a new kind of database with a polymorphic conceptual data model, a strong subtyping system, a symbolic reasoning engine, and a type-theoretic language: TypeQL. These are features not normally associated with a database, and TypeDB is a novel technology in the database space, solving several key challenges.
This article series breaks down the above statement down and explores each of these core features in detail. To begin with, a schema for a simple filesystem is presented. It illustrates how data domains can be modeled naturally by using the conceptual PERA model, including the occurrence of polymorphism and complex relation types. The expressive power and elegance of the type-theoretic query language TypeQL is then demonstrated by presenting several declarative queries on that schema, featuring complete polymorphic querying capabilities and type variablization. The strong type system that enables type inference and semantic validation is then examined to understand how declarative polymorphic querying is achieved. Finally, one of the most useful outcomes of the type system is showcased: TypeDB’s built-in symbolic reasoner.
Conceptual data model
TypeDB uses the polymorphic entity-relation-attribute (PERA) model (Dorn, Pribadi, 2024) for schemas and data. It is an extension of the entity-relationship (ER) model (Chen, 1976), which is the most widely used tool for designing conceptual data models due to its elegant simplicity and expressive power. Normally, the conceptual model is only a starting point, and it must be translated into a logical model that meets the modeling capabilities of the database system. As TypeDB uses the PERA model, any ER model can be directly implemented without translation. Likewise, any logical data model derived from an ER model can also be directly implemented, enabling easy migration of data from other database paradigms.
The PERA model allows us to make use of powerful polymorphic features, some examples in the schema excerpt below being:
- Inheritance:
admin
is a subtype ofuser
, so it inherits all of the supertype’s capabilities: any attributes it owns, and any roles it playes. When we giveuser
more capabilities later on, they will be inherited as well. - Interfaces: Both
user-group
andresource
owncreated-timestamp
. This is possible because attribute ownerships and relation roles are interfaces that can be independently implemented by types with no common supertype. - Abstraction:
resource
is abstract. It can only be instantiated through one of its non-abstract subtypes,file
anddirectory
. - Overriding:
file
anddirectory
both ownpath
, a subtype ofid
. This specific implementation overrides the ownership ofid
by their supertyperesource
.
define
user sub entity,
owns email,
owns password-hash,
owns created-timestamp,
owns active,
plays resource-ownership:resource-owner;
admin sub user,
plays group-ownership:group-owner;
user-group sub entity,
owns name,
owns created-timestamp,
plays group-ownership:group,
plays resource-ownership:resource-owner;
resource sub entity,
abstract,
owns id,
owns created-timestamp,
owns modified-timestamp,
plays resource-ownership:resource;
file sub resource,
owns path as id;
directory sub resource,
owns path as id;
ownership sub relation,
abstract,
relates owned,
relates owner;
group-ownership sub ownership,
relates group as owned,
relates group-owner as owner;
resource-ownership sub ownership,
relates resource as owned,
relates resource-owner as owner;
id sub attribute, abstract, value string;
email sub id;
name sub id;
path sub id;
password-hash sub attribute, value string;
event-timestamp sub attribute, abstract, value datetime;
created-timestamp sub event-timestamp;
modified-timestamp sub event-timestamp;
active sub attribute, value boolean;
A database serves as a source-of-truth for an organisation’s data. For this reason, it is essential that the database model represents the desired business logic as closely as possible. Normally, the conceptual data model must be translated into the database’s logical model. This results in object model mismatch, and exposes data to silent corruption by semantic integrity loss. Because TypeDB directly implements the conceptual PERA model as its logical model, there is no mismatch with application models. This allows polymorphic constraints between types to be accurately expressed, ensuring semantic integrity of data.
You can read the full article to learn more about the conceptual data model of TypeDB.
Type-theoretic query language
TypeQL is the type-theoretic query language of TypeDB. One of the central design principles of the language is:
This allows for highly expressive querying power beyond the capabilities of other modern database paradigms. In particular, it allows for declarative polymorphic querying, which is not possible in non-polymorphic databases. We can use three fundamental types of polymorphism in our queries: inheritance polymorphism, interface polymorphism, and parametric polymorphism.
In the following example query, we use inheritance polymorphism to list the ID and event timestamps of all resources. Inheritance polymorphism allows us to declaratively query a type and retrieve instances of that type and all of its subtypes, and we retrieve instances of the subtypes of resource
: files and directories. In fact, we will not get any results that are specifically of type resource
as the type is abstract. Inheritance polymorphism also affects the attributes returned. The IDs retrieved are paths because path
is a subtype of id
, and the event timestamps retrieved are creation and modification timestamps because created-timestamp
and modified-timestamp
are subtypes of event
timestamp. Here, the polymorphic constraints on the entity and its attributes have been combined to produce the result set, without having to use any special syntax to do so!
match
$resource isa resource;
fetch
$resource: id, event-timestamp;
{
"resource": {
"event-timestamp": [
{ "value": "2023-08-08T11:57:54.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
{ "value": "2023-08-13T13:16:10.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
{ "value": "2023-10-10T13:22:37.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
{ "value": "2023-10-10T13:39:19.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
{ "value": "2023-11-29T09:17:47.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
{ "value": "2023-06-14T22:44:22.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } }
],
"id": [ { "value": "/vaticle/research/prototypes/nlp-query-generator.py", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
"type": { "label": "file", "root": "entity" }
}
}
{
"resource": {
"event-timestamp": [
{ "value": "2023-06-22T11:36:20.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
{ "value": "2023-06-29T12:16:33.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
{ "value": "2023-07-26T12:33:37.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
{ "value": "2023-01-31T19:39:32.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } }
],
"id": [ { "value": "/vaticle/research/prototypes/root-cause-analyzer.py", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
"type": { "label": "file", "root": "entity" }
}
}
{
"resource": {
"event-timestamp": [
{ "value": "2023-02-15T09:13:55.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
{ "value": "2023-03-25T12:11:15.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
{ "value": "2023-06-02T06:17:54.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
{ "value": "2023-12-04T23:07:09.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
{ "value": "2023-01-28T09:16:24.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } }
],
"id": [ { "value": "/vaticle/engineering/tools/performance-profiler.rs", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
"type": { "label": "file", "root": "entity" }
}
}
{
"resource": {
"event-timestamp": [
{ "value": "2023-11-23T07:05:18.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
{ "value": "2023-10-01T01:46:23.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } }
],
"id": [ { "value": "/vaticle/engineering/projects/typedb-3.0", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
"type": { "label": "directory", "root": "entity" }
}
}
{
"resource": {
"event-timestamp": [
{ "value": "2023-10-20T13:24:19.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
{ "value": "2023-10-22T12:22:51.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
{ "value": "2023-12-01T03:58:22.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
{ "value": "2023-12-28T02:00:04.000", "value_type": "datetime", "type": { "label": "modified-timestamp", "root": "attribute" } },
{ "value": "2023-03-13T14:25:07.000", "value_type": "datetime", "type": { "label": "created-timestamp", "root": "attribute" } }
],
"id": [ { "value": "/vaticle/engineering/projects/typedb-cloud-beta", "value_type": "string", "type": { "label": "path", "root": "attribute" } } ],
"type": { "label": "directory", "root": "entity" }
}
}
TypeQL’s declarative polymorphic queries allow us to easy extend data models without refactoring queries. When we leverage inheritance polymorphism, we do not have to enumerate the subtypes of the supertype being queried. Because the types are never explicitly listed, queries do not have to be modified to include newly added types or exclude previously removed types. This is very different to other database paradigms where such enumeration would normally be necessary.
You can read the full article to learn more about the type-theoretic query language of TypeDB.
Strong type system
The backbone of TypeDB is its strong type system, which powers its declarative polymorphic queries and semantic data validation. The type system is managed by the type-inference engine, which resolves every query against the schema to determine the possible return types. It also performs semantic validation, preventing nonsensical queries from being executed.
To build the set of return types for the above query, TypeDB identifies the constraints it comprises. The pattern in the match
clause tells us that $resource
has the type resource
, indicating that the type of $resource
must be either resource
itself or one of its subtypes. The type resource
itself is abstract, and so is excluded. The fetch
clause then tells us that we would like to retrieve the id
and event-timestamp
attributes of $resource
. The return types of both attributes have dependencies on the resolved type of $resource
, so we can identify the return types of the attributes based on the attribute ownerships in the schema. The actual return types of the query can be enumerated as rows in the following table.
$resource | $resource: id | $resource: event-timestamp |
---|---|---|
file | path | created-timestamp |
file | path | modified-timestamp |
directory | path | created-timestamp |
directory | path | modified-timestamp |
repository | name | created-timestamp |
repository | name | modified-timestamp |
This is roughly how TypeDB’s type-inference engine works to resolve the possible return types of a query. The return types are then supplied to the query planner, which searches for instances of the those types in the data. If a query has no possible sets of return types, then it is not permitted by the schema and thus semantically invalid! TypeDB is designed with polymorphism as a central feature. By combining a conceptual data model with a type-theoretic query language, the type-inference engine is able to resolve declarative polymorphic queries and identify semantically invalid ones.
You can read the full article to learn more about the strong type system of TypeDB.
Symbol reasoning engine
In TypeDB, symbolic reasoning takes the form of rule inference, which allows the database to generate new facts based on existing data and user-defined rules. Resolution of rules is managed by the rule-inference engine. The data created by rules is generated at query-time and held in memory. This means that they will always reflect the most recent state of the schema and data when the query is run, so we don’t have to worry about stale or inconsistent data, and disk space is saved. Once we close the transaction, the rule-inference cache is cleared and the resources are released. Rule inference has a number of powerful use cases, ranging from creating convenient abstractions for patterns used across multiple queries, to capturing complex business logic from the data domain.
In the above data model, we’ll often need to query for the last modification timestamp of resources. Rather than searching through the modification timestamps each time, we can generate the attributes using the following rule. Rules use the same pattern syntax as queries, so they can make use of polymorphic patterns. In this rule, we use inheritance polymorphism to assign last-modified
timstamps to all resources via their supertype resource
. Rules also undergo semantic validation to ensure the integrity of generated data.
define
last-modified sub attribute, value datetime;
resource owns last-modified;
rule resource-last-modified:
when {
$resource isa resource, has modified-timestamp $last-modified;
not {
$resource has modified-timestamp $other-modified;
$other-modified > $last-modified;
};
# Copy the value of $last-modified
?timestamp = $last-modified;
} then {
# Generate a new attribute with the same value
$resource has last-modified ?timestamp;
};
With the rule defined, we can easily query the new attributes. We do not have to specify which rules to use, as the rule-inference engine will find and apply all applicable rules in the schema.
match
$resource isa resource, has last-modified $last-modified;
fetch
$resource: id;
$last-modified;
The new attribute type can be used in any other query too, so we don’t have to repeat the full pattern anywhere outside of the rule. This also means that if we want to change how the last modification timestamp is determined, we only have to change it in the rule, which serves as a single source of truth.
You can read the full article to learn more about the symbolic reasoning engine of TypeDB.
Summary
TypeDB’s unique features as a polymorphic database allow it to elegantly describe the polymorphism inherent in most application models. This is possible because of its core features, which are designed with polymorphism foremost in mind:
- The conceptual data model enables the complete elimination of mismatch with object models while enforcing the semantic integrity of inserted data.
- The type-theoretic query language permits the construction of declarative queries encompassing inheritance, interface, and parametric polymorphism.
- The strong type system allows the execution of declarative polymorphic queries by validating and resolving those queries against the schema.
- The symbolic reasoner captures the logic of the data domain through rules, which generate new facts in a consistent and up-to-date manner.
In addition to these core elements, TypeDB incorporates many other features required of modern databases, such as asynchronous query execution and resilient clustering. A complete list of TypeDB’s features would be too long for an article, but a high-level summary can be found on the features page.