TypeDB Fundamentals

The PERA Model: an In-Depth Guide

In this article we will take you on a guided journey through the individual components of the polymorphic entity-relation-attribute (PERA) model—the data model underlying our database TypeDB. On the practical side of things, we will also learn how these components can be efficiently and naturally accessed through TypeQL!

Key ingredients

We have already discussed many of the key ideas underlying the PERA model in related articles. To get us started, let’s give a brief overview of the terms that we will discuss in more detail in this article.

Root types in the PERA type system

In the PERA model, we distinguish three kinds of types. First, we speak of entity types to mean independent object types, i.e. types whose instances are objects and which do not depend on any other types. Complementing this, we speak of relation types when referring to dependent object types. Recall from our discussion of interface polymorphism that we generally consider type dependencies on “abstract interface types”, which specific concept types may then be cast into. In the case of relation types, we refer to these abstract interface types as role types (or simply “roles”), and their instances as role players. A relation may depend on one or more role types.

The PERA model also features attribute types: note that, in the context of the PERA model, we will require all our attribute types to be dependent concepts.The abstract interface types of attribute types are referred to as ownership types (or simply “ownerships”). Importantly, the model imposes for each attribute type to have exactly one such ownership type. In the 2×2 matrix of (independent/dependent)×(object/attribute) types we are now left only with the combination of “independent attribute types”—these global constant types do not feature very prominently in the model, but we will briefly address them later in the special case of PERA attribute types “without specified owner”.

Schema and data

Based on the above, the PERA model comprises two elementary components.

A “database schema” of types, falling into the above categories, and subtyping information about these types, describing inheritance hierarchies and interface implementations.
Appropriate “data” instances in each of these types.

In the following, we discuss in detail how both of these components can be defined. As a definitional language we will use TypeQL, which will allow us to directly translate what we learned into practice!

A brief aside: it is somewhat curious that we arrived at the distinction between entities, relations, and attributes purely from the perspective of concepts (and formally, type theory). Nonetheless, the resulting model structurally resembles the well-known entity-relationship approach to conceptual a.k.a. semantic data modeling. While we will not go into a detailed comparison here with topics from classical semantic data modeling. But we do take it as a good sign that the model reproduces well-established classical insights from a novel type-theoretic perspective!

Disclaimer: the upcoming release of TypeQL 3.0 will introduce several changes to the TypeQL, which will be reflected in a future version of this article. This version of the article is based on TypeQL 2.x syntax. Of course, the fundamental role of the PERA model will remain unchanged.

Entities

Entity types are the independent concepts in our database, and as such, they are the simplest to define in our database schema. In TypeQL a new entity type is specified by a statement of the following form:

define person sub entity;

The statement should be read as defining the type Person to be a subtype of an abstract super-type entity which is the default “root” type for entity types, i.e. all other entity types inherit from entity.

Sub-entities

In general, types may also inherit from other defined entity types. In TypeQL this is specified by writing, for example:

define employee sub person;

Recall, employee being a subtype of person means that when we introduce an instance of employee later on, then it can be cast into a person as well!

Importantly, when defining inheritance hierarchies for any kind of types (i.e. for entity, relation, or attribute types) the PERA model requires each type to have exactly one super-type, which may be the default root type. This condition is also known as single-inheritance, and lets us avoid the diamond problem (which becomes particularly acute when working with dependent types, since dependencies will be inherited from supertypes).

Abstract entities

We also mention that types may be designated as abstract types. For example, for our entity type person defined earlier we may further specify that

define person abstract;

Designating the type person as abstract in this way means that no instance of the type can ever be created directly in person. In other words, instances of person will only be obtainable by casting instances of subtypes of person into a person. The three root types, entity, relation, and attribute, are all examples of abstract types, and any new type can be defined to be abstract.

Relations

Relation types are the dependent analogs of entity types. Since a relation may have multiple roles (recall, these are the abstract interface types that relation types depend on), TypeQL has the additional keyword relates to specify the roles that a specific relation may depend on. For example, we could specify

define 
marriage sub relation,
    relates spouse;

in order to define a marriage relation type, which depends on the role spouse.

Sub-relations

As before, types may be specified to inherit from other types, but in contrast to entity types, this now also requires dealing with the inheritance of roles. We illustrate this with the following example of three different subtypings of the marriage type:

# case 1: role overwriting
define
hetero_marriage sub marriage,
    relates husband as spouse,
    relates wife as spouse;

# case 2: role inheritance
define
religious_marriage sub marriage;

# case 3: role extension
define 
witnessed_marriage sub marriage,
    relates witness;

The first of the above cases specifies a new relation type hetero_marriage which subtypes marriage, but overwrites the spouse by two (more specific) roles: husband and wife. Note, in general, we can overwrite roles with one, two, or more roles. In the second case, we specify a relation type religious_marriage which does not overwrite the spouse role and thus inherits it directly: this means that religious_marriage (like marriage) may depend on any instance that can be cast into the spouse role. In the third and final case, we define a relation type witnessed_marriage which, like religious_marriage inherits the spouse role since it is not overwritten. In addition, it also extends role dependencies to another role, called witness.

All three cases of overwriting, inheriting, and extending roles may be combined when defining subtypes of relation types.

Attributes

Unlike entity and relation types, attribute types have instances that are literal values of a specific, pre-defined type. In TypeQL this “value type” of an attribute type is indicated using the keyword value. For example, consider the specification

define
name sub attribute, value string;

which defines an attribute type name and states that instances of name will be literal values of type string.

Constraining attributes

We may further restrict the value type to a subtype with appropriate expressions. For example, replacing the above by

define
name sub attribute, value string, regex "^(Ana|Bob)$";

would mean that instances of name have to be among the set of strings {Ana, Bob}.

The ownership interface

Note that, unlike relation types, there is no explicit specification of the ownership interface of the attribute type. Indeed, since each attribute type has a single unique ownership interface, in TypeQL we leave this interface implicit!

We remark that the condition for attribute type to have exactly one interface is a very reasonable design choice: in many situations, 𝑛-ary attributes can create ambiguity for data modelers as they may often instead be conceived as unary attributes of 𝑛-ary relations. For example, the binary attribute concept of “distance between x and y” could instead be understood as the unary attribute concept of “length of p” where p is an instance of the binary relation concept of “shortest paths between x and y”.

Sub-attributes

Just as entity and relation types, attribute types, too, may be organized in inheritance hierarchies. For example, we may replace our earlier TypeQL specification by

define
identifier sub attribute, abstract, value string;
name sub ID;

which defines the attribute type name as a subtype of some abstract supertype identifier. Note that attribute subtypes inherit the value type of their supertype: e.g., instances of name will be string values like those of identifier. We remark that in TypeQL 2.x, all attribute supertype must be marked as abstract types: this is meant to avoid confusion when the same literal value is being used at different levels of an attribute type hierarchy.

Interface implementation

Since dependent types depend on abstract interface types, we still need to specify which types implement which interfaces in order to be able to actually instantiate dependent types. In essence, any such specification can be made when defining our database schema, with the simple rule that role players and owners must be objects (“objectified interfaces”). In other words, only object types can implement interfaces. Note that, without this rule, we might end up in situations in which our model loses track of the user’s intention due to the idempotency of literal value creation.

Implementing roles

Let us first consider the implementation of roles in a relation type. In TypeQL, the implementing types are specified using the plays keyword. For example, we could specify:

define
person plays marriage:spouse;

This specifies that we can cast instances of the type person into instances of the abstract interface type spouse of marriage. In natural language, we say that persons can play the role of spouse in marriages . The definition thus allows instantiation of the concept “marriage of spouses x and y” for persons x and y.

Role notation

Note that our plays specification above uses the “scoped” notation marriage:spouse in order to refer to the spouse role; this is because, generally in TypeQL, role identifiers (like spouse) are required to only be unique within the scope of their relation type hierarchy.

Roles playes by relations

Importantly, since relation types are object types, they, too, can play roles. For example, we could have:

define
civil_servant sub person;
registry_entry sub relation,
    relates registrar,
    relates event;
civil_servant plays registry_entry:registrar;
marriage plays registry_entry:event;

Implementing ownerships

Next, let’s consider implementing ownerships of attribute types. In TypeQL, implementations are specified using the keyword owns. For example, continuing our previous examples, we could write:

define
person owns name;

This specifies that instances of type person can be cast into the abstract interface type of “name owners” (recall this interface implicit and left unnamed in TypeDB). In natural language, persons can own names. As a result of the definition, we can now instantiate the concept of “name(s) of x” for a person x.

Any number of types can be specified to have a given ownership (and similarly, any number of types can play a given role). For example, in addition to the above, we could specify:

define 
city sub entity;
city owns name;

in order to allow cities to own names as well.

Attributes owned by relations

Finally, since all object types (including relations) can own attributes, we could further extend our example, for example:

define
date sub attribute, value datetime;
marriage owns date;

Inheritance of implementations

We remark that if an object type implements a role in relations (or an ownership of attributes) then this will be passed on to all its subtypes in the evident way: type-theoretically, we simply compose the respective casting operations. For example, since employee is a subtype of person, and since person implements name ownership, so does employee. Indeed, any employee instance can be cast into a person instance, and any person instance can be cast into a name owner.

Inserting data instances

In the previous sections we illustrated the key ingredients of database schemas in the PERA model, comprising entity, relation, and attribute types, as well as their inheritance hierarchies and their interface dependencies. In this section, we briefly describe how data instances can be created in the types specified by the schema in the PERA model.

As before, we will give all our specifications using TypeQL, which provides a simple and intuitive syntax for data creation, revolving around the keywords isa and has.

An example

Continuing with the example schema informally described in the previous sections, consider the following data insert query in TypeQL:

insert
$a isa civil_servant;
$a has name "Ana";
$b1 isa person, has name "Bob";
$b2 isa person, has name "Bob";
$m (spouse: $b1, spouse: $b2) isa marriage, has date 2004-05-17;
(event: $m, registrar: $a) isa registry_entry;

Let’s go through the above example line by line.

The first line creates a new civil_servant object, and assigns that object to the variable $a.
In the second line, we create value "Ana" in attribute type “name of $a“.
The third line is similar to the first two, but note that statements with the same subject can be concatenated!
The fourth line creates yet another person object, assigned to $b2. Importantly, note that $b1 and $b2 are not the same person object!
The fifth line creates a new object in type “marriage of spouses $b1 and $b2“, and assigns the object to variable $m. It then also creates the date value 2004-05-17 in the type “date of $m“
Finally, the last line, creates a new object in the type “registry_entry of event $m by registrar $a”. Note that the newly created object is not assigned to any variable here: it can be left implicit!

The example, in essence, covers all basic cases of data instantiation in the PERA model. But, of course, there is a little bit of fineprint to the topic. We address some of it in the following sections.

Cardinality and variadicity

We earlier saw how to create a new marriage instance between two spouses. However, we never specified that a marriage should have exactly two spouses. A priori, the cardinalities of roles in the PERA model are variadic, meaning any number of role players can be given when instantiating a relation type (as long as at least one roleplayer for the relation type is given). That means, for example, we could also create:

insert
$a isa person, has name "Austin";
(spouse: $a) isa marriage;
# indeed, the other spouse could be unknown!

which would create a marriage with a single spouse. Variadicity is highly useful in order to record partial information, in this case describing the case where one of the spouses in a marriage relation is unknown.

However, without care, variadicity could also allow us to record a marriage with three or more spouses which might violate the intended semantics of the type. For this reason, in the formal PERA model, cardinalities of roleplayers are bounded (e.g. to indicate that a marriage should have no more than two than spouses) and bounds are preserved when inherited. The ability to express precise cardinality constraints is firmly on our roadmap for TypeQL 3.0!

Preventing ambiguity

A rather subtle condition that data inserts need to satisfy is that of intentional casting. In brief, the condition ensures that there in no ambiguity in interpreting the user’s intention as to which roles or ownerships an object is cast into when instantiating a dependent type.

An example of ambiguous castings

To illustrate this, we consider an example of how to not satisfy the intentional casting condition, starting with the following database schema:

define
companionship sub relation,
    relates companion;
marriage sub companionship,
    relates spouse as companion;
friendship sub companionship,
    relates friend as companion;
person sub entity;
person plays friendship:friend;
person plays marriage:spouse;

This (purely illustrative) schema describes various forms of companionship, defining that a person can play the role of either a friend in a friendship or a spouse in a marriage. Since we didn’t make companionship an abstract type, we could now attempt to insert data as follows:

insert
$p isa person;
$q isa person;
(companion: $p, companion: $q) isa companionship;

At first glance, this insert looks fine: indeed, $p and $q are objects of the type person and thus can be cast into either friends or spouses, and thus also into the role of companions. Nonetheless, the above would be invalid in the PERA model as it creates unwanted ambiguity. Indeed, the schema of our database tells us that persons can precisely be either a friend or a spouse.

So in the insert clause above, which one of those two are $p and $q? The answer is ambiguous, and the intention of the user is unclear here. The intentional casting condition prevents these and similar ambiguities by requiring that dependent types can only be instantiated with objects whose types are explicitly defined to implement the needed interfaces (using plays or owns specifications). That means, the above data creation would be valid if we also had specified that person plays companionship:companion; !

Global attributes

When referring to an attribute (as opposed to “attribute type”) we usually don’t mean a bare literal value instance in an attribute type, but rather the tuple of both literal value and attribute type. For example, if some object has a name with value "Austin" (as created by our previous example) then we refer to the serialization name:"Austin" of attribute type identifier and a literal value as an attribute. In particular, if another object is created it may have the same attribute:

insert
$a isa person, has name "Austin";
$c isa city, has name "Austin";

Our terminology here reflects the following design choice: in TypeDB, we do not store duplicates of attributes. That is, if "Austin" is the name of some person and of some city, the database only needs to record one name:"Austin" attribute, with appropriate reference to both the person and the city. This can save both valuable space and time in our database application (note that the choice concerns only the implementation of the PERA model, not the definition of the model itself). We refer to this design choice as working with global attributes.

Independent attributes

The idea of globality for attributes is reflected in another feature: we allow attributes that are independent from their owners, a.k.a. global constants. This leverages the isa keyword as before, but this time used to create instances in attribute types. Consider for example:

insert
$n "Ana" isa name;
1970-01-01 isa date;
$a isa person;
$a has name $n;

Line by line, this data specification does the following:

In the first line, we create the literal value "Ana" in the type “namewithout owner“, and assign the value to a variable $n.
We then create creates value 1970-01-01 in the type “date without owner
Reusing variables as before, in the last two lines we create a new person object, and the use our value “Ana” in the variable $n to create a new instance in type “name of $a“

Thus, unlike in the case of relation types where at least one roleplayer needed to be instantiated, we do allow attribute types to be instantiated with no owners. This can be used to create global constants which are not associated with any owner, even though the feature is rarely used.

Summary

In this article we discussed in much detail the individual components of TypeDB’s data model: we saw the interaction of entity types, relation types, and attributes types, each playing an important role in the bigger picture of the PERA model. We also saw how these schema components could be addressed and defined via TypeQL’s declarative syntax. We then learned about how data can be inserted in a TypeDB and how we can leverage data constraint. Of course, this only covers half of what is needed from a database, the other (much larger) half concerning how the query the types and data we inserted. The topic of data reading queries, of course, is a substantial by itself, and we therefore cover it in separate articles!