TypeDB Fundamentals
Streamlining clauses for schema and data migration
When working with highly structured data the topics of “schema migration” and “data migration” are usually well-known pain points to database engineers. In TypeDB 3.0 we introduce a range of TypeQL-native syntax that provides powerful tooling to address these pain points.
Extending the clause system
To begin, let us recall the clause system of TypeDB. Clauses represent operations (like retrievals or updates) that the user intends to perform “in one go”.
Clauses can contain variables, and essentially all clauses take inputs and outputs in the for of “concept maps” which are mappings from variable identifiers to concepts (i.e. data instances, values, or types).
The single most-powerful clause is the match
clause which is responsible for retrieving concepts from the database (and may even include conditions that express computations and comparisons of the concepts).
Besides the match
clause, TypeQL 3.0 features the following schema- and data-level clause. As you may notice, this already shows several novelties over pre-3.0 syntax, with put
/redefine
/update
all being newly introduce. In the rest of the article, we will discuss the role of all of the below clauses!
Schema-level | Data-level | Description |
---|---|---|
define | insert /put 1 | Creates concepts and behaviors2 |
undefine | delete | Removes concept and concept behaviors |
redefine | update | Updates existing concepts and behaviors3 |
(1) The latter clause, put
, is a conditional version of the former clause, insert
, see our pipeline fundamentals for more information.
(2) Concepts and behavior refers, at the schema-level, to schema types, subtyping, dependencies, and constraints, and at the data-level, data and data inter-dependencies.
(3) This will fail if concepts to be redefined/updated do not exist!
Subjects, verbs, objects in TypeQL statements
The user of TypeDB define how data is categorized using type inheritance hierarchies, and how it interacts through dependencies using typed interfaces. This gives a “typed” perspective of classical RDF subject-verb-object (SVO) triples. Let’s explain this! For example,
(company)—(named)—(“Apple”)
could translate into a TypeQL statement:
$company has name "Apple"
In TypeDB, the verb “name” becomes an interface type that now controls who exactly “owns names”. The type system checks whether it can cast the data instance $company
into a name owner, while "Apple"
, the object, must be data in the type of name
s owned by $company
. In this way, TypeDB provides a clean way of “typing” core concepts of the semantic data model.
When manipulating the databse with subjects, verbs, and objects, ambiguity can arise has to what we are manipulating: the subject? the verb? the object? This is why in TypeDB 3.0 we depart from strict SVO ordered statements, and also employ statement that focus on the object or verb instead—we call these statements O/V statements. The following provides an idea of where in the above clauses these new statement forms will be used:
define
: Uses SVO.undefine
: Uses O/V (with special new keyword:from
). As a first brief example, we could writeundefine owns name from person
.redefine
: Uses SVO.insert
/put
: Uses SVO.delete
: Uses O/V (with special new keyword:of
). As a first brief example, we could writedelete name of $x;
.update
: Uses SVO.
Now, let’s see in more detail how (SVO and O/V) statements and clauses fit together.
Fitting statements to clauses
We will discuss key TypeQL statements in order. These include:
- Subtype statements using
sub
, and data-level typing usingisa
. - Relation dependencies (via role interface types) using
relates
, and data-level dependencies usinglinks
. - Role subtypes using
relates ... as ...
(no data-level counterpart). - Combined attribute dependencies (via ownership interfaces) and ownership implementation using
owns
, as well as data-level dependencies usinghas
- Ownership implementation overrides using
owns ... as ...
(no data-level counterpart). - Role implementations using
plays
(no data-level counterpart). - Role implementation overrides using
plays ... as ...
(no data-level counterpart). - Various annotation statements at schema-level (which do not have data-level counterparts).
- Type labels !
Observe that, generally, interface implementations and their overrides are statements that do not have data-level counterparts, meaning they will not be relevant to data-level clauses from Table 1.
1. Subtypes and typing (sub/isa)
Typing affects both the data-level and the schema-level. For the former, note that data is always created in a specified type (its canonical type). For the latter, note that types are organized into a subtyping hierarchy. This means data may live in multiple types simultaneously (by casting it from types into super-types).
define
: We may define, e.g.,define entity person
for entity types without a supertype (this is a synonym fordefine person sub entity
but clarifies that no user-defined supertype is intended). Otherwise, we defineperson sub being
, which declaresperson
as a subtype ofbeing
.undefine
: We may undefine types, e.g., by writingundefine entity person
or simplyundefine A
. Note: nosub
is needed! We are undefining the typeA
itself.redefine
: We may redefineredefine person sub intelligent_being
, which redefines the parent ofperson
.insert
/put
: We may insertinsert $a isa person
, which insert a new data instance with canonical typeperson
.delete
: We may delete by writingdelete person $a
, or simplydelete $a
, without specifying the type. Note: ifA
is an attribute type, then this requires the type to be@indepdent
, otherwise attributes cannot be globally deleted in this way (to avoid accidental deletion for multiple owners).update
: For now, we cannot update the canonical types of data with anupdate
clause, as the operation comes with subtle difficulties. Instead, users may change types with a copy-and-delete pipeline (eg,match $old_x isa A...; insert $new_x isa A';
followed bymatch $old_x's data; insert $new_x's data;
anddelete $old_x;
).
2. Relation dependencies (relates/links)
Relations are dependent data, meaning in many cases we will delete the relation objects if the objects it links are deleted (for example, an edge should be deleted if the nodes it links are!). The type system of TypeDB controls which data a relation instances can link via so-called role interface types, that other types can implement (see #6). Role interfaces and their members, role players, can be defined/inserted/manipulated as follows.
define
: We may define, e.g., thatdefine marriage relates spouse
which states that the relation typeA
can link roleplayers of typeR
.undefine
: We may, similarly, undefinerelates spouse from marriage
which removes the role interface from the relationA
. Note, this uses the newfrom
keyword! Recall also, this is an O/V ordered statements (i.e. the object/verb comes first here:relates spouse
).redefine
: We cannot redefine role interfaces of a relation in a meaningful way: whether or not a relation depends on a role interface is a boolean, which cannot be meaningfully redefined. But see #9 for how change the label of a role interface.insert
/put
: We may insert, e.g.,insert $m links (spouse: $a)
.delete
: Recall, we delete using O/V order: for example, we can writedelete (spouse: $a) of $m
… this uses the newof
keyword!update
: We can update role players, by writingupdate $m links (spouse: $b)
. Warning: this replaces all existing roleplayers of the given type! In our example,$b
would replace all existing spouses linked by$m
.
3. Role sub-interface (as)
When a relation type subtypes another relation type, we may have its role interfaces subtype the role interfaces of its relation supertype. This (schema-only) behaviour can be addressed as follows.
define
: We may define role sub-interface behaviour using the keywordas
, e.g. by writingdefine hetero_marriage relates wife as spouse
.undefine
: If we want to completely free a sub-interface from its parents, we can writeundefine as spouse from hetero_marriage relates wife
.redefine
: If we want to sub-interface a different role of the parent relation, then we can writeredefine hetero_marriage relates wife as primary_spouse
(assumingmarriage
was defined with a role inferfaceprimary_spouse
).
4. Attribute dependencies and ownership implementation (owns)
Similarly to relations depending on the object they link, attributes depend on the objects they are “owned by”; in other words, by default, if an attribute’s (sole) owner gets deleted then we delete the attribute as well! The type system controls which types of attributes can be owned by which type of objects through ownership interface types that other types can implement
define
: We may defineperson owns name
which means thatperson
implements the ownership interface ofname
; simply put, person instances are now valid owners of name instances!undefine
: The remove an ownership implementation, writeundefine owns name from person
. Note again: this uses O/V statements?redefine
: We cannot redefine interface implementations since these are a boolean truth values, which cannot be meaningfully redefined. But see #9 for how change the label of an attribute type.insert
/put
: Inserting attribute instances works as before:insert $person has name "Marry"
. This makes$person
the owner object of a name attribute"Marry"
.delete
: Deleting now uses O/V order:delete $name of $person;
, which may optionally also carry a typedelete name $name of $person
in which case we may even supress the instance all-together, simply writing,delete name of $person
.update
: Updating a name can now be done usingupdate $person has name "Merry"
. Note the type is absolutely needed in this case!
5. Ownership implementation overrides (as)
When subinterfaces are available, overrides of interface implementations in TypeDB are a kind of constraint: they constrain a subtype to only ever use the subinterface, while the parent type implements the superinterface as well. Ok, this was quite abstract—let’s make it more concrete!
define
: We can define an ownership override usingdefine child owns first_name as name
. This assumeschild sub person
andpersons owns name
. As a result,child
objects will only be able to be owners offirst_name
s, i.e. we cannot insert$some_child has name "John Doe"
.undefine
: We may remove an ownership interface override by writeundefine as name from child owns first_name
. As a result,child
objects could now be designated owners of bothname
andfirst_name
instances (note, thought, super-attributes are required to be abstract).redefine
: We cannot redefine interface overrides, as they are boolean truth values.
6. Role implementation (plays)
Similar to ownership interfaces implementation, object types (i.e. entity and relation types) may implement role interfaces.
define
: We can definedefine person plays marriage:spouse
to ensure thatperson
objects are valid for thespouse
role.undefine
: Using O/V order, we may writeundefine plays marriage:spouse from person
.redefine
: We cannot redefine interface implementations since these are a boolean truth values, which cannot be meaningfully redefined. But see #9 for how change the label of an attribute type.
7. Role implementation overrides (plays)
Again in analogy with ownership implementation overrides, role implementations may be overridden: the effect is the same, in that overrides constraint a subtype to not make use of the interface inherited from its parent type.
define
: We can usedefine hetero_man plays hetero_marriage:husband as marriage:spouse
to define a role implementation override. This assumehetero_man sub person
andperson plays marriage:spouse
.undefine
: As before, we can useundefine as marriage:spouse from hetero_man plays hetero_marriage:husband
to remove the override. As a result,hetero_man
may now also be inserted asmarriage:spouse
roleplayers.redefine
: We cannot redefine interface overrides, as they are boolean truth values.
8. Annotations
Besides overrides, TypeQL 3.0 brings a large variety of new annotations. Importantly, all these annotation follow the some principles. Firstly, each statement can have at most one annotation of a given kind (i.e. we disallow, e.g., person owns name @card(1..2) @card(1..3)
). Secondly, annotation can be addressed using the following definition language.
define
: We define annotations as part of the usual definition of statements, e.g.,define person owns name @card(1..2)
.undefine
: We remove annotation with O/V statements of the formundefine @card(1..2) from person owns name
.redefine
: Most annotation can be meaningfully redefined, e.g.,define person owns name @card(1..3)
replaces the previous annotation of@card(1..2)
.
9. Labels and aliases
Labels are a new feature of TypeDB 3.0. They provide aliases for types, which allow a more flexible approach to querying, schema, and data migration. Every time has a primary label, which is the label it was originally introduced with. A non-primary label is also called an alias.
define
: Additional (non-primary!) labels are defined usingdefine person alias personne
. Now our database is multi-lingual! Note:person
can be either the primary label or any alias of the type to which we want to add an additional alias.undefine
: Aliases can be undefined usingundefine alias personne from person
. Again this uses O/V order. And again,person
can be either the primary label or any alias of the type from which we want to remove the additional alias.redefine
: The primary label of a type can be redefined usingredefine old_type label new_type
. Aliases cannot be redefined, as they either exist or not.
Powerful migrations
The design of TypeDB’s migration language using update
and redefine
statements closely follows the fundamental principles of its type system. Taken together this allows for powerful migration operations in extremely simple syntax. For example,
match $T sub! person;
redefine $T sub humanoid;
moves all subtypes of person
under a new supertype, humanoid
. And
match person owns $T;
define humanoid owns $T;
copies all attributes from person
to humanoid
. That’s simple and intuitive!