TypeDB Fundamentals
The PERA Model: an In-Depth Guide
In this article we will take you on a guided journey through the individual components of the polymorphic entity-relation-attribute (PERA) model—the data model underlying our database TypeDB. On the practical side of things, we will also learn how these components can be efficiently and naturally accessed through TypeQL!
Key ingredients
We have already discussed many of the key ideas underlying the PERA model in related articles. To get us started, let’s give a brief overview of the terms that we will discuss in more detail in this article.
Root types in the PERA type system
In the PERA model, we distinguish three kinds of types. First, we speak of entity types to mean independent object types, i.e. types whose instances are objects and which do not depend on any other types. Complementing this, we speak of relation types when referring to dependent object types. Recall from our discussion of interface polymorphism that we generally consider type dependencies on “abstract interface types”, which specific concept types may then be cast into. In the case of relation types, we refer to these abstract interface types as role types (or simply “roles”), and their instances as role players. A relation may depend on one or more role types.
The PERA model also features attribute types: note that, in the context of the PERA model, we will require all our attribute types to be dependent concepts.The abstract interface types of attribute types are referred to as ownership types (or simply “ownerships”). Importantly, the model imposes for each attribute type to have exactly one such ownership type. In the 2×2 matrix of (independent/dependent)×(object/attribute) types we are now left only with the combination of “independent attribute types”—these global constant types do not feature very prominently in the model, but we will briefly address them later in the special case of PERA attribute types “without specified owner”.
Schema and data
Based on the above, the PERA model comprises two elementary components.
- A “database schema” of types, falling into the above categories, and subtyping information about these types, describing inheritance hierarchies and interface implementations.
- Appropriate “data” instances in each of these types.
In the following, we discuss in detail how both of these components can be defined. As a definitional language we will use TypeQL, which will allow us to directly translate what we learned into practice!
A brief aside: it is somewhat curious that we arrived at the distinction between entities, relations, and attributes purely from the perspective of concepts (and formally, type theory). Nonetheless, the resulting model structurally resembles the well-known entity-relationship approach to conceptual a.k.a. semantic data modeling. While we will not go into a detailed comparison here with topics from classical semantic data modeling. But we do take it as a good sign that the model reproduces well-established classical insights from a novel type-theoretic perspective!
Disclaimer: the upcoming release of TypeQL 3.0 will introduce several changes to the TypeQL, which will be reflected in a future version of this article. This version of the article is based on TypeQL 2.x syntax. Of course, the fundamental role of the PERA model will remain unchanged.
Entities
Entity types are the independent concepts in our database, and as such, they are the simplest to define in our database schema. In TypeQL a new entity type is specified by a statement of the following form:
define person sub entity;
The statement should be read as defining the type Person
to be a subtype of an abstract super-type entity
which is the default “root” type for entity types, i.e. all other entity types inherit from entity
.
Sub-entities
In general, types may also inherit from other defined entity types. In TypeQL this is specified by writing, for example:
define employee sub person;
Recall, employee
being a subtype of person
means that when we introduce an instance of employee later on, then it can be cast into a person as well!
Importantly, when defining inheritance hierarchies for any kind of types (i.e. for entity, relation, or attribute types) the PERA model requires each type to have exactly one super-type, which may be the default root type. This condition is also known as single-inheritance, and lets us avoid the diamond problem (which becomes particularly acute when working with dependent types, since dependencies will be inherited from supertypes).
Abstract entities
We also mention that types may be designated as abstract types. For example, for our entity type person
defined earlier we may further specify that
define person abstract;
Designating the type person
as abstract in this way means that no instance of the type can ever be created directly in person
. In other words, instances of person
will only be obtainable by casting instances of subtypes of person
into a person. The three root types, entity
, relation
, and attribute
, are all examples of abstract types, and any new type can be defined to be abstract.
Relations
Relation types are the dependent analogs of entity types. Since a relation may have multiple roles (recall, these are the abstract interface types that relation types depend on), TypeQL has the additional keyword relates
to specify the roles that a specific relation may depend on. For example, we could specify
define
marriage sub relation,
relates spouse;
in order to define a marriage
relation type, which depends on the role spouse
.
Sub-relations
As before, types may be specified to inherit from other types, but in contrast to entity types, this now also requires dealing with the inheritance of roles. We illustrate this with the following example of three different subtypings of the marriage
type:
# case 1: role overwriting
define
hetero_marriage sub marriage,
relates husband as spouse,
relates wife as spouse;
# case 2: role inheritance
define
religious_marriage sub marriage;
# case 3: role extension
define
witnessed_marriage sub marriage,
relates witness;
The first of the above cases specifies a new relation type hetero_marriage
which subtypes marriage
, but overwrites the spouse
by two (more specific) roles: husband
and wife
. Note, in general, we can overwrite roles with one, two, or more roles. In the second case, we specify a relation type religious_marriage
which does not overwrite the spouse
role and thus inherits it directly: this means that religious_marriage
(like marriage
) may depend on any instance that can be cast into the spouse
role. In the third and final case, we define a relation type witnessed_marriage
which, like religious_marriage
inherits the spouse
role since it is not overwritten. In addition, it also extends role dependencies to another role, called witness
.
All three cases of overwriting, inheriting, and extending roles may be combined when defining subtypes of relation types.
Attributes
Unlike entity and relation types, attribute types have instances that are literal values of a specific, pre-defined type. In TypeQL this “value type” of an attribute type is indicated using the keyword value
. For example, consider the specification
define
name sub attribute, value string;
which defines an attribute type name
and states that instances of name
will be literal values of type string
.
Constraining attributes
We may further restrict the value type to a subtype with appropriate expressions. For example, replacing the above by
define
name sub attribute, value string, regex "^(Ana|Bob)$";
would mean that instances of name
have to be among the set of strings {Ana
, Bob
}.
The ownership interface
Note that, unlike relation types, there is no explicit specification of the ownership interface of the attribute type. Indeed, since each attribute type has a single unique ownership interface, in TypeQL we leave this interface implicit!
We remark that the condition for attribute type to have exactly one interface is a very reasonable design choice: in many situations, 𝑛-ary attributes can create ambiguity for data modelers as they may often instead be conceived as unary attributes of 𝑛-ary relations. For example, the binary attribute concept of “distance between x
and y
” could instead be understood as the unary attribute concept of “length of p” where p is an instance of the binary relation concept of “shortest paths between x
and y
”.
Sub-attributes
Just as entity and relation types, attribute types, too, may be organized in inheritance hierarchies. For example, we may replace our earlier TypeQL specification by
define
identifier sub attribute, abstract, value string;
name sub ID;
which defines the attribute type name
as a subtype of some abstract supertype identifier
. Note that attribute subtypes inherit the value
type of their supertype: e.g., instances of name
will be string
values like those of identifier
. We remark that in TypeQL 2.x, all attribute supertype must be marked as abstract types: this is meant to avoid confusion when the same literal value is being used at different levels of an attribute type hierarchy.
Interface implementation
Since dependent types depend on abstract interface types, we still need to specify which types implement which interfaces in order to be able to actually instantiate dependent types. In essence, any such specification can be made when defining our database schema, with the simple rule that role players and owners must be objects (“objectified interfaces”). In other words, only object types can implement interfaces. Note that, without this rule, we might end up in situations in which our model loses track of the user’s intention due to the idempotency of literal value creation.
Implementing roles
Let us first consider the implementation of roles in a relation type. In TypeQL, the implementing types are specified using the plays
keyword. For example, we could specify:
define
person plays marriage:spouse;
This specifies that we can cast instances of the type person
into instances of the abstract interface type spouse
of marriage
. In natural language, we say that persons can play the role of spouse in marriages . The definition thus allows instantiation of the concept “marriage of spouses x and y” for persons x and y.
Role notation
Note that our plays
specification above uses the “scoped” notation marriage:spouse
in order to refer to the spouse
role; this is because, generally in TypeQL, role identifiers (like spouse
) are required to only be unique within the scope of their relation type hierarchy.
Roles playes by relations
Importantly, since relation types are object types, they, too, can play roles. For example, we could have:
define
civil_servant sub person;
registry_entry sub relation,
relates registrar,
relates event;
civil_servant plays registry_entry:registrar;
marriage plays registry_entry:event;
Implementing ownerships
Next, let’s consider implementing ownerships of attribute types. In TypeQL, implementations are specified using the keyword owns
. For example, continuing our previous examples, we could write:
define
person owns name;
This specifies that instances of type person
can be cast into the abstract interface type of “name
owners” (recall this interface implicit and left unnamed in TypeDB). In natural language, persons can own names. As a result of the definition, we can now instantiate the concept of “name(s) of x” for a person x.
Any number of types can be specified to have a given ownership (and similarly, any number of types can play a given role). For example, in addition to the above, we could specify:
define
city sub entity;
city owns name;
in order to allow cities to own names as well.
Attributes owned by relations
Finally, since all object types (including relations) can own attributes, we could further extend our example, for example:
define
date sub attribute, value datetime;
marriage owns date;
Inheritance of implementations
We remark that if an object type implements a role in relations (or an ownership of attributes) then this will be passed on to all its subtypes in the evident way: type-theoretically, we simply compose the respective casting operations. For example, since employee
is a subtype of person
, and since person
implements name
ownership, so does employee
. Indeed, any employee
instance can be cast into a person
instance, and any person
instance can be cast into a name owner.
Inserting data instances
In the previous sections we illustrated the key ingredients of database schemas in the PERA model, comprising entity, relation, and attribute types, as well as their inheritance hierarchies and their interface dependencies. In this section, we briefly describe how data instances can be created in the types specified by the schema in the PERA model.
As before, we will give all our specifications using TypeQL, which provides a simple and intuitive syntax for data creation, revolving around the keywords isa
and has
.
An example
Continuing with the example schema informally described in the previous sections, consider the following data insert query in TypeQL:
insert
$a isa civil_servant;
$a has name "Ana";
$b1 isa person, has name "Bob";
$b2 isa person, has name "Bob";
$m (spouse: $b1, spouse: $b2) isa marriage, has date 2004-05-17;
(event: $m, registrar: $a) isa registry_entry;
Let’s go through the above example line by line.
- The first line creates a new
civil_servant
object, and assigns that object to the variable$a
. - In the second line, we create value
"Ana"
in attribute type “name
of$a
“. - The third line is similar to the first two, but note that statements with the same subject can be concatenated!
- The fourth line creates yet another
person
object, assigned to$b2
. Importantly, note that$b1
and$b2
are not the same person object! - The fifth line creates a new object in type “
marriage
of spouses$b1
and$b2
“, and assigns the object to variable$m
. It then also creates the date value2004-05-17
in the type “date
of$m
“ - Finally, the last line, creates a new object in the type “
registry_entry
of event$m
by registrar$a
”. Note that the newly created object is not assigned to any variable here: it can be left implicit!
The example, in essence, covers all basic cases of data instantiation in the PERA model. But, of course, there is a little bit of fineprint to the topic. We address some of it in the following sections.
Cardinality and variadicity
We earlier saw how to create a new marriage instance between two spouses. However, we never specified that a marriage should have exactly two spouses. A priori, the cardinalities of roles in the PERA model are variadic, meaning any number of role players can be given when instantiating a relation type (as long as at least one roleplayer for the relation type is given). That means, for example, we could also create:
insert
$a isa person, has name "Austin";
(spouse: $a) isa marriage;
# indeed, the other spouse could be unknown!
which would create a marriage with a single spouse. Variadicity is highly useful in order to record partial information, in this case describing the case where one of the spouses in a marriage relation is unknown.
However, without care, variadicity could also allow us to record a marriage with three or more spouses which might violate the intended semantics of the type. For this reason, in the formal PERA model, cardinalities of roleplayers are bounded (e.g. to indicate that a marriage should have no more than two than spouses) and bounds are preserved when inherited. The ability to express precise cardinality constraints is firmly on our roadmap for TypeQL 3.0!
Preventing ambiguity
A rather subtle condition that data inserts need to satisfy is that of intentional casting. In brief, the condition ensures that there in no ambiguity in interpreting the user’s intention as to which roles or ownerships an object is cast into when instantiating a dependent type.
An example of ambiguous castings
To illustrate this, we consider an example of how to not satisfy the intentional casting condition, starting with the following database schema:
define
companionship sub relation,
relates companion;
marriage sub companionship,
relates spouse as companion;
friendship sub companionship,
relates friend as companion;
person sub entity;
person plays friendship:friend;
person plays marriage:spouse;
This (purely illustrative) schema describes various forms of companionship, defining that a person can play the role of either a friend
in a friendship
or a spouse
in a marriage
. Since we didn’t make companionship
an abstract type, we could now attempt to insert data as follows:
insert
$p isa person;
$q isa person;
(companion: $p, companion: $q) isa companionship;
At first glance, this insert looks fine: indeed, $p
and $q
are objects of the type person
and thus can be cast into either friend
s or spouse
s, and thus also into the role of companion
s. Nonetheless, the above would be invalid in the PERA model as it creates unwanted ambiguity. Indeed, the schema of our database tells us that persons can precisely be either a friend or a spouse.
So in the insert clause above, which one of those two are $p
and $q
? The answer is ambiguous, and the intention of the user is unclear here. The intentional casting condition prevents these and similar ambiguities by requiring that dependent types can only be instantiated with objects whose types are explicitly defined to implement the needed interfaces (using plays
or owns
specifications). That means, the above data creation would be valid if we also had specified that person plays companionship:companion;
!
Global attributes
When referring to an attribute (as opposed to “attribute type”) we usually don’t mean a bare literal value instance in an attribute type, but rather the tuple of both literal value and attribute type. For example, if some object has a name
with value "Austin"
(as created by our previous example) then we refer to the serialization name:"Austin"
of attribute type identifier and a literal value as an attribute. In particular, if another object is created it may have the same attribute:
insert
$a isa person, has name "Austin";
$c isa city, has name "Austin";
Our terminology here reflects the following design choice: in TypeDB, we do not store duplicates of attributes. That is, if "Austin"
is the name of some person and of some city, the database only needs to record one name:"Austin"
attribute, with appropriate reference to both the person and the city. This can save both valuable space and time in our database application (note that the choice concerns only the implementation of the PERA model, not the definition of the model itself). We refer to this design choice as working with global attributes.
Independent attributes
The idea of globality for attributes is reflected in another feature: we allow attributes that are independent from their owners, a.k.a. global constants. This leverages the isa
keyword as before, but this time used to create instances in attribute types. Consider for example:
insert
$n "Ana" isa name;
1970-01-01 isa date;
$a isa person;
$a has name $n;
Line by line, this data specification does the following:
- In the first line, we create the literal value
"Ana"
in the type “name
without owner“, and assign the value to a variable$n
. - We then create creates value
1970-01-01
in the type “date
without owner - Reusing variables as before, in the last two lines we create a new person object, and the use our value “Ana” in the variable
$n
to create a new instance in type “name
of$a
“
Thus, unlike in the case of relation types where at least one roleplayer needed to be instantiated, we do allow attribute types to be instantiated with no owners. This can be used to create global constants which are not associated with any owner, even though the feature is rarely used.
Summary
In this article we discussed in much detail the individual components of TypeDB’s data model: we saw the interaction of entity types, relation types, and attributes types, each playing an important role in the bigger picture of the PERA model. We also saw how these schema components could be addressed and defined via TypeQL’s declarative syntax. We then learned about how data can be inserted in a TypeDB and how we can leverage data constraint. Of course, this only covers half of what is needed from a database, the other (much larger) half concerning how the query the types and data we inserted. The topic of data reading queries, of course, is a substantial by itself, and we therefore cover it in separate articles!