TypeDB Fundamentals
TypeDB: a New Kind of Database
TypeDB is a new kind of database with a polymorphic conceptual data model, a strong subtyping system, a symbolic reasoning engine, and a type-theoretic language: TypeQL. These are features not normally associated with a database, and TypeDB is a novel technology in the database space, solving several key challenges.
This article series breaks down the above statement down and explores each of these core features in detail. To begin with, a schema for a simple filesystem is presented. It illustrates how data domains can be modeled naturally by using the conceptual PERA model, including the occurrence of polymorphism and complex relation types. The expressive power and elegance of the type-theoretic query language TypeQL is then demonstrated by presenting several declarative queries on that schema, featuring complete polymorphic querying capabilities and type variablization. The strong type system that enables type inference and semantic validation is then examined to understand how declarative polymorphic querying is achieved. Finally, one of the most useful outcomes of the type system is showcased: TypeDB’s built-in symbolic reasoner.
Conceptual data model
TypeDB uses the polymorphic entity-relation-attribute (PERA) model (Dorn, Pribadi, 2024) for schemas and data. It is an extension of the entity-relationship (ER) model (Chen, 1976), which is the most widely used tool for designing conceptual data models due to its elegant simplicity and expressive power. Normally, the conceptual model is only a starting point, and it must be translated into a logical model that meets the modeling capabilities of the database system. As TypeDB uses the PERA model, any ER model can be directly implemented without translation. Likewise, any logical data model derived from an ER model can also be directly implemented, enabling easy migration of data from other database paradigms.
The PERA model allows us to make use of powerful polymorphic features, some examples in the schema excerpt below being:
- Inheritance:
admin
is a subtype ofuser
, so it inherits all of the supertype’s capabilities: any attributes it owns, and any roles it playes. When we giveuser
more capabilities later on, they will be inherited as well. - Interfaces: Both
user-group
andresource
owncreated-timestamp
. This is possible because attribute ownerships and relation roles are interfaces that can be independently implemented by types with no common supertype. - Abstraction:
resource
is abstract. It can only be instantiated through one of its non-abstract subtypes,file
anddirectory
. - Overriding:
file
anddirectory
both ownpath
, a subtype ofid
. This specific implementation overrides the ownership ofid
by their supertyperesource
.
define
entity user,
owns email,
owns password-hash,
owns created-timestamp,
owns active,
plays resource-ownership:resource-owner;
entity admin sub user,
plays group-ownership:group-owner;
entity user-group,
owns name,
owns created-timestamp,
plays group-ownership:group,
plays resource-ownership:resource-owner;
entity resource @abstract,
owns id,
owns created-timestamp,
owns modified-timestamp,
plays resource-ownership:resource;
entity file sub resource,
owns path;
entity directory sub resource,
owns path;
relation ownership @abstract,
relates owned,
relates owner;
relation group-ownership sub ownership,
relates group as owned,
relates group-owner as owner;
relation resource-ownership sub ownership,
relates resource as owned,
relates resource-owner as owner;
attribute id @abstract, value string;
attribute email sub id;
attribute name sub id;
attribute path sub id;
attribute password-hash, value string;
attribute event-timestamp @abstract, value datetime;
attribute created-timestamp sub event-timestamp;
attribute modified-timestamp sub event-timestamp;
attribute active, value boolean;
A database serves as a source-of-truth for an organisation’s data. For this reason, it is essential that the database model represents the desired business logic as closely as possible. Normally, the conceptual data model must be translated into the database’s logical model. This results in object model mismatch, and exposes data to silent corruption by semantic integrity loss. Because TypeDB directly implements the conceptual PERA model as its logical model, there is no mismatch with application models. This allows polymorphic constraints between types to be accurately expressed, ensuring semantic integrity of data.
You can read the full article to learn more about the conceptual data model of TypeDB.
Type-theoretic query language
TypeQL is the type-theoretic query language of TypeDB. One of the central design principles of the language is:
This allows for highly expressive querying power beyond the capabilities of other modern database paradigms. In particular, it allows for declarative polymorphic querying, which is not possible in non-polymorphic databases. We can use three fundamental types of polymorphism in our queries: inheritance polymorphism, interface polymorphism, and parametric polymorphism.
In the following example query, we use inheritance polymorphism to list the ID and event timestamps of all resources. Inheritance polymorphism allows us to declaratively query a type and retrieve instances of that type and all of its subtypes, and we retrieve instances of the subtypes of resource
: files and directories. In fact, we will not get any results that are specifically of type resource
as the type is abstract. Inheritance polymorphism also affects the attributes returned. The IDs retrieved are paths because path
is a subtype of id
, and the event timestamps retrieved are creation and modification timestamps because created-timestamp
and modified-timestamp
are subtypes of event
timestamp. Here, the polymorphic constraints on the entity and its attributes have been combined to produce the result set, without having to use any special syntax to do so!
match
$resource isa resource;
fetch {
"resource": {
"id": $resource.id,
"event-timestamp": [ $resource.event-timestamp ],
}
};
{
"resource": {
"event-timestamp": [
"2023-08-08T11:57:54.000",
"2023-08-13T13:16:10.000",
"2023-10-10T13:22:37.000",
"2023-10-10T13:39:19.000",
"2023-11-29T09:17:47.000",
"2023-06-14T22:44:22.000"
],
"id": "/typedb/research/prototypes/nlp-query-generator.py"
}
}
{
"resource": {
"event-timestamp": [
"2023-06-22T11:36:20.000",
"2023-06-29T12:16:33.000",
"2023-07-26T12:33:37.000",
"2023-01-31T19:39:32.000"
],
"id": "/typedb/research/prototypes/root-cause-analyzer.py"
}
}
{
"resource": {
"event-timestamp": [
"2023-02-15T09:13:55.000",
"2023-03-25T12:11:15.000",
"2023-06-02T06:17:54.000",
"2023-12-04T23:07:09.000",
"2023-01-28T09:16:24.000"
],
"id": "/typedb/engineering/tools/performance-profiler.rs"
}
}
{
"resource": {
"event-timestamp": [
"2023-11-23T07:05:18.000",
"2023-10-01T01:46:23.000"
],
"id": "/typedb/engineering/projects/typedb-3.0"
}
}
{
"resource": {
"event-timestamp": [
"2023-10-20T13:24:19.000",
"2023-10-22T12:22:51.000",
"2023-12-01T03:58:22.000",
"2023-12-28T02:00:04.000",
"2023-03-13T14:25:07.000"
],
"id": "/typedb/engineering/projects/typedb-cloud-beta"
}
}
TypeQL’s declarative polymorphic queries allow us to easy extend data models without refactoring queries. When we leverage inheritance polymorphism, we do not have to enumerate the subtypes of the supertype being queried. Because the types are never explicitly listed, queries do not have to be modified to include newly added types or exclude previously removed types. This is very different to other database paradigms where such enumeration would normally be necessary.
You can read the full article to learn more about the type-theoretic query language of TypeDB.
Strong type system
The backbone of TypeDB is its strong type system, which powers its declarative polymorphic queries and semantic data validation. The type system is managed by the type-inference engine, which resolves every query against the schema to determine the possible return types. It also performs semantic validation, preventing nonsensical queries from being executed.
To build the set of return types for the above query, TypeDB identifies the constraints it comprises. The pattern in the match
clause tells us that $resource
has the type resource
, indicating that the type of $resource
must be either resource
itself or one of its subtypes. The type resource
itself is abstract, and so is excluded. The fetch
clause then tells us that we would like to retrieve the id
and event-timestamp
attributes of $resource
. The return types of both attributes have dependencies on the resolved type of $resource
, so we can identify the return types of the attributes based on the attribute ownerships in the schema. The actual return types of the query can be enumerated as rows in the following table.
$resource | $resource: id | $resource: event-timestamp |
---|---|---|
file | path | created-timestamp |
file | path | modified-timestamp |
directory | path | created-timestamp |
directory | path | modified-timestamp |
repository | name | created-timestamp |
repository | name | modified-timestamp |
This is roughly how TypeDB’s type-inference engine works to resolve the possible return types of a query. The return types are then supplied to the query planner, which searches for instances of the those types in the data. If a query has no possible sets of return types, then it is not permitted by the schema and thus semantically invalid! TypeDB is designed with polymorphism as a central feature. By combining a conceptual data model with a type-theoretic query language, the type-inference engine is able to resolve declarative polymorphic queries and identify semantically invalid ones.
You can read the full article to learn more about the strong type system of TypeDB.
Summary
TypeDB’s unique features as a polymorphic database allow it to elegantly describe the polymorphism inherent in most application models. This is possible because of its core features, which are designed with polymorphism foremost in mind:
- The conceptual data model enables the complete elimination of mismatch with object models while enforcing the semantic integrity of inserted data.
- The type-theoretic query language permits the construction of declarative queries encompassing inheritance, interface, and parametric polymorphism.
- The strong type system allows the execution of declarative polymorphic queries by validating and resolving those queries against the schema.
In addition to these core elements, TypeDB incorporates many other features required of modern databases, such as asynchronous query execution and resilient clustering. A complete list of TypeDB’s features would be too long for an article, but a high-level summary can be found on the features page.