TypeDB Fundamentals
The Conceptual Data Model of TypeDB
TypeDB uses the polymorphic entity-relation-attribute (PERA) model (Dorn, Pribadi, 2024) for schemas and data. It is an extension of the entity-relationship (ER) model (Chen, 1976), which is the most widely used tool for designing conceptual data models due to its elegant simplicity and expressive power. Normally, the conceptual model is only a starting point, and it must be translated into a logical model that meets the modeling capabilities of the database system. As TypeDB uses the PERA model, any ER model can be directly implemented without translation. Likewise, any logical data model derived from an ER model can also be directly implemented, enabling easy migration of data from other database paradigms.
This article will demonstate the key features of the PERA model. It uses an example data model for a simple filesystem based on a discretionary access control (DAC) permission system. In the DAC framework, all objects (for instance files, directories, and user groups) have owners, and permissions on an object are granted to other users by its owner. The model is described using TypeQL. Even without knowledge of the relevant keywords, it is easily understandable due to its near-natural language design.
The PERA model
Like the ER model, the PERA model consists entities, relations, and attributes, but additionally introduces polymorphism in the form of inheritance and interfaces. A TypeDB schema is composed of type definitions. Every type is a subtype of one of the three built-in root types, entity
, relation
, and attribute
, or a previously user-defined type. Interfaces are defined between types in the form of ownerships of attributes and roles in relations. All types inherit the interfaces of their supertypes, in addition to the general capabilities defined for their root types.
Entity types represent classes of independent objects. An entity might practically require other concepts to exist, such as a car that cannot exist without its parts, but can be conceptualized without reference to them: a car can be imagined without considering its parts.
Relation types represent classes of objects dependent on other objects, their roleplayers. Every relation must depend on at least one other concept, and cannot be conceptualized without those dependencies: it is impossible to imagine a marriage without considering its spouses.
Attribute types represent classes of values dependent on objects, their owners. They represent properties of those objects, such as names of people, dates of marriages, and license plates of cars.
Defining entity types
To begin with, we’ll define some entity types along with the attribute types they own with a Define query, which is used to define new types in the schema.
define
user sub entity,
owns email,
owns password-hash,
owns created-timestamp,
owns active;
admin sub user;
user-group sub entity,
owns name,
owns created-timestamp;
resource sub entity,
abstract,
owns id,
owns created-timestamp,
owns modified-timestamp;
file sub resource,
owns path as id;
directory sub resource,
owns path as id;
access sub entity,
owns name;
id sub attribute, abstract, value string;
email sub id;
name sub id;
path sub id;
password-hash sub attribute, value string;
event-timestamp sub attribute, abstract, value datetime;
created-timestamp sub event-timestamp;
modified-timestamp sub event-timestamp;
active sub attribute, value boolean;
TypeQL is fully declarative, so the statements in the query can be placed in any order. There are a number of powerful polymorphic features we’ve made use of here, some examples being:
- Inheritance:
admin
is a subtype ofuser
, so it inherits all of the supertype’s capabilities. Currently, they are ownership ofemail
,password-hash
,created-timestamp
, andactive
, but when we giveuser
more capabilities later on, they will be inherited as well. - Interfaces: Both
user-group
andresource
owncreated-timestamp
. This is possible because the attribute ownership is an interface that can be independently implemented by types with no common supertype. This is contrary to most other database paradigms, where an attribute only belongs to a single type.- Abstraction:
resource
is abstract. It can only be instantiated through one of its non-abstract subtypes,file
anddirectory
.
- Abstraction:
- Overriding:
file
anddirectory
both ownpath
, a subtype ofid
. This specific implementation overrides the ownership ofid
by their supertyperesource
.
Defining relation types
With the entity types defined, we can now start thinking about relation types.
define
ownership sub relation,
abstract,
relates owned,
relates owner;
group-ownership sub ownership,
relates group as owned,
relates group-owner as owner;
resource-ownership sub ownership,
relates resource as owned,
relates resource-owner as owner;
membership sub relation,
abstract,
relates parent,
relates member;
group-membership sub membership,
relates group as parent,
relates group-member as member;
directory-membership sub membership,
relates directory as parent,
relates directory-member as member;
permission sub relation,
relates subject,
relates object,
relates access;
login-event sub relation,
relates subject,
owns login-timestamp,
owns success;
login-timestamp sub event-timestamp;
success sub attribute, value boolean;
Each relation type is defined along with its roles using relates
statements. Like ownership of attribute types, roles can be independently played by different types, as we’ll see shortly. In the PERA model, relation types are first-class citizens and can do anything that entity types can. We’ve made use of all the same polymorphic features: inheritance, interfaces, abstraction, and overriding, in addition to some new ones:
- N-ary relations: The number of roles in relation type is fully configurable. Most relation types are binary and have two roles, but
permission
has three andlogin-event
has only one. - Schema extensions:
login-timestamp
is a subtype of the previously definedevent-timestamp
. Once a type is defined, we can extend it with subtypes at any time in the future.
Defining roleplayers
Finally, we will define which entity types play which roles using plays
statements.
define
user plays resource-ownership:resource-owner,
plays group-membership:group-member,
plays permission:subject,
plays login-event:subject;
admin plays group-ownership:group-owner;
user-group plays group-ownership:group,
plays resource-ownership:resource-owner,
plays group-membership:group,
plays permission:subject,
plays permission:object;
resource plays resource-ownership:resource,
plays permission:object;
file plays directory-membership:directory-member;
directory plays directory-membership:directory,
plays directory-membership:directory-member;
access plays permission:access;
As with ownerships, different types can independently play the same role, for instance user
and user-group
, which both play resource-ownership:resource-owner
. Role-playing capabilities are also inherited in the type hierarchy, so admin
can play all of the roles that user
can. Though we don’t do so in this model, it is even possible to have relations playing roles in other relations!
Inserting data
With the schema in place, we can begin inserting data. Let’s examine a few Insert queries from the initial dataset, starting with some filesystem users.
insert
$rhonda isa user,
has email "rhonda@vaticle.com",
has active true,
has created-timestamp 2023-11-10T15:19:43,
has password-hash "6f1127d4b8ee9bd64df9b0ae3f8a7f58";
insert
$cedric isa admin,
has email "cedric@vaticle.com",
has active true,
has created-timestamp 2023-01-01T00:00:00,
has password-hash "e0d29e328f65b8074d7df218c73b1726";
In each of these two Insert queries we create a user, and give them several attributes. Cedric is in fact an admin, and therefore also a user by extension. Because admin
is a subtype of user
, it inherits the ability to own the four inserted attributes. In the next query, we’ll insert a new user group and assign Cedric as the owner using a group-ownership
relation.
match
$cedric isa admin, has email "cedric@vaticle.com";
insert
$engineers isa user-group,
has name "engineers",
has created-timestamp 2023-09-07T17:47:57;
(group: $engineers, group-owner: $cedric) isa group-ownership;
This Insert query starts with a match
clause, which allows us to reference pre-existing data in the inserted data. The group ownership is represented with a binary relation, with Cedric and the user group as roleplayers. In the next query, we create a login-event
relation for Rhonda.
match
$rhonda isa user, has email "rhonda@vaticle.com";
insert
(subject: $rhonda) isa login-event,
has login-timestamp 2023-12-29T13:38:51,
has success true;
This time, the relation tuple has only one element elements as it is a unary relation. Relation tuples can represent relations with any number of roleplayers, as we’ll see shortly with a ternary relation. We also assign the login event some attributes. In the next query, we’ll create a file and and see how we can give it multiple modified-timestamp
attributes.
insert
$release-notes isa file,
has path "/vaticle/engineering/projects/typedb-3.0/release-notes.md",
has created-timestamp 2023-03-18T20:24:08,
has modified-timestamp 2023-12-17T22:31:07,
has modified-timestamp 2023-06-16T04:07:35,
has modified-timestamp 2023-08-29T17:35:51,
has modified-timestamp 2023-10-16T17:47:04;
The modified timestamp is a multivalued attribute. TypeDB allows easy insertion and querying of multivalued attributes without having to consider normalization. Finally, we’ll create some access
entities then insert a permission
relation.
insert
$read isa access, has name "read";
$write isa access, has name "write";
$delete isa access, has name "delete";
match
$engineers isa user-group, has name "engineers";
$release-notes isa file, has path "/vaticle/engineering/projects/typedb-3.0/release-notes.md";
$write isa access, has name "write";
insert
(subject: $engineers, object: $release-notes, access: $write) isa permission;
Here we’ve used a ternary relation to represent the following:
The Engineers user group has write permissions on the release notes file.
There are other ways this could be modelled. We could instead, for instance, have a binary write-permission
relation between a subject
and object
, but this approach gives us a bit more extensibility. Modelling in TypeDB is a conceptual exercise, and there are very few restrictions on what can and can’t be done.
Data validation
One of the biggest advantages of the conceptual schema is semantic data validation. Much like in a relational database, inserted data is validated against the schema to ensure integrity. But unlike relational schemas which can only enforce referential integrity, validation against conceptual schemas can enforce semantic integrity as well. If we tried to insert the following nonsensical data, validation would fail and the query would be rejected by the database.
insert
$omar isa user,
has path "/vaticle/omar",
has success true;
## Error> [CXN05] The transaction is closed because of the error(s):
[THW03] Invalid Write: Attribute of type 'success' is not defined to be owned by type 'user'.
## Terminated
This data is obviously incorrect as users do not have these properties. Semantic validation also serves to prevent insertion of apparently semantically sound data in which a mistake has been made. In the following example, we attempt to insert a user as the owner of a user group. This seems reasonable, but actually only admins can own user groups.
insert
$omar isa user, has email "omar@vaticle.com";
$researchers isa user-group, has name "researchers";
(group: $researchers, group-owner: $omar) isa group-ownership;
## Error> [CXN05] The transaction is closed because of the error(s):
[THW08] Invalid Write: The type 'user' does not play the role type 'group-ownership:group-owner'.
## Terminated
Summary
A database serves as a source-of-truth for an organisation’s data. For this reason, it is essential that the database model represents the desired business logic as closely as possible. Normally, the conceptual data model must be translated into the database’s logical model. This results in object model mismatch, and exposes data to silent corruption by semantic integrity loss. Because TypeDB directly implements the conceptual PERA model as its logical model, there is no mismatch with application models. This allows polymorphic constraints between types to be accurately expressed, ensuring semantic integrity of data.