TypeDB Fundamentals
Understanding variables, bindings, and type-checking in TypeDB
In this article, we explore the key principles underlying keywords, variables, and bindings in TypeDB. You will also learn about the simple design mechanisms that govern the type-checking of TypeQL patterns used in the match
clauses of query pipelines.
Keywords
To begin, let us highlight that in TypeDB 3.0, we filled in a missing keyword, links
, which you have seen in action in some of the previous examples in this article series. This means TypeQL keywords (including assignments and comparisons) now neatly organize into the following table.
(sub-) typing | relation dependency | attribute dependency | variable assignment | equality comparison | |
type | sub | relates | owns | — | is |
instance | isa | links | has | = (*) | is / == (**) |
value | — | — | — | = | == |
binding? | yes | yes | yes | yes | no |
Table notes
- (*) The
=
for data instances can only be used when the RHS is a function call (whose return, in fact, may be a tuple mixing instances and values). - (**)
==
is a “comparison by value” and can be only used for attribute instances. In contrast, theis
keyword not only compares attribute values, but also ensures that both attributes are of the same type! Theis
keyword can also be used for entity and relation instances.
Variables
With TypeDB 3.0 we abolished the need to differentiate between “value variables” (for value types like int
) and “concept variables” (for schema types or their data instances). All variables are now indicates using the dollar symbol $
.
This makes the overall coding experience more consistent and more concise—indeed, most other programming language do not distinguish objects from primitive values in their syntax for variables.
Importantly, this unification means that we may tacitly cast variables between different types. For example
match
$person has name $color;
$person has favorite_color $color;
The query will match persons, whose name is also their favorite color. Under the hood, $color
is inferred to be a variable of type string
, both name
s and favorite_color
s can be cast into string
so the above type checks! It is equivalent to writing:
match
$person has name $name;
$person has favorite_color $color;
$color == $name; # both are cast into strings
In contrast to the previous case, in this pattern, $name
can be safely inferred to be of type name
and $color
is inferred to be of type favorite_color
. To understand how this works, let us next explore the binding and type-checking principles of TypeDB.
Bindings
Any variable in a pattern needs to be bound to something—otherwise TypeDB will return an error to prevent infinitely looping searches. We now explain the “theory” of variable such bindings in TypeQL’s declarative patterns.
Variables are bound by certain statements. In the above table, we indicated in the last row which keywords and operators bind variables and which don’t. In addition to these keywords, we allow binding syntax for working with function returns, lists, and structs, that can be used to bind variables as follows.
- Single-return binding. The operator
=
binds its left-hand side (LHS), which can be a tuple of variables, and the right-hand side (RHS) calls a single-return function. Note: this is the only case where we assign schema type instance variables with an assignment operator (see table above). - Stream-return binding. The keyword
in
binds its LHS, which can be a tuple of variables, and the RHS calls a stream-return function. - List-member binding. The keyword
in
binds its LHS, which is a single variable, and the RHS is a list (or list expression). - List-index binding. Variables appearing as list indices
[$index]
of a listLIST
are implicitly bound to range over the available indices (i.e.,$index in [0...length(LIST)]
). The same is not true if the index is an arithmetic expression. - Destructuring binding. The operator
=
binds variables in its LHS, which is a struct with variables as fields, and the RHS is a struct expression.
Just as in TypeDB pre-3.0, value variable assignments need to be made in non-circular order—otherwise, we may run into impossible-to-solve equations!
Type-checking
Type-checking can often seen bit mysterious, but in TypeDB it follows straight-forward principles.
- Each variable in TypeQL pattern is either a type variable (representing a type) or an term variable which can be either a data instance from our database or a value computed as part of the schema.
- A solution to a pattern assigns each variable
$x
a conceptconcept($x)
, i.e. a term or a type, which we call the concept-assignment. We also assign each term variable$x
a type, which we call the type-assignment, which is writtentype($x)
. Ifconcept($x)
is data instances,type($x)
is the type of that instance as stored in the database. Forconcept($x)
is a value, thentype($x)
is the (uniquely) inferred value type of that value.
Principles of the concept and type assignment
The concept and type-assignments must follow a set of simple principles. We explain these in the following sections. For notational convenience, let’s write T ⊆ S
to mean the elements of a type T
can be cast into elements of another type S
; in other words, T
is a subtype of S
.
Schema statements
We start with the schema statements sub
, relates
, owns
, and plays
.
- For statements
$t sub $s
we must haveconcept($t) ⊆ concept($s)
andconcept($t) != concept($s)
. - For statements
$t relates $s
we must haveconcept($t)
relates the role typeconcept($s)
either by direct declaration in the schema, or by inheritance from a supertype (without overriding it). - For statements
$t plays $s
we must haveconcept($t)
plays the role typeconcept($s)
either by direct declaration in the schema or by inheritance from a supertype. - For statements
$t owns $s
we must haveconcept($t)
plays the role typeconcept($s)
either by direct declaration in the schema or by inheritance from a supertype.
Data statements
Next, let’s consider TypeQL’s core data statements.
- For statements
$x isa TYP
, we requiretype($x) ⊆ TYP
. - For
$x links ($r: $y)
, we requiretype($x) ⊆ R
for a relation typeR
such thatR relates $r
(as defined above), andtype($y) ⊆ ROL
. - If
ROL
is kept implicit by writing$x links ($y)
, then we requiretype($y) ⊆ ROL
for any role type oftype($x)
. - For
$x links (ROL[]: $y)
we requiretype($x)
to be a relation type with role typeROL
in list form, andtype($y) = list(ROL)
(we also write this typeROL[]
).ROL
cannot be kept implicit in this case. - For
$x has ATT $y
, we require thattype($x)
must own the attribute typeATT
, and eithertype($y) ⊆ ATT
ortype($y) = val(ATT)
, whereval(ATT)
is the value type ofATT
(*). IfATT
is kept implicit by writing$x has $y
, then we requiretype($y) ⊆ ATT
for any attributeATT
owned bytype($x)
. - For
$x has ATT[] $y
, we require thattype($x)
must own the attribute typeATT
in list form, and eithertype($y) ⊆ list(ATT)
ortype($y) = list(val(ATT))
, whereval(ATT)
is the value type ofATT
.ATT
cannot be kept implicit in this case. - In the case of list types
T[]
(i.e.ROL[]
andATT[]
) this works similarly, noting thatS ⊆ T
impliesT[] ⊆ S[]
.
(*) The inclusion of the case type($y) = val(ATT)
is purely for convenience of the developer. Under the hood, we think of $x has ATT $y
has syntactic sugar for $x has $z; $z isa ATT; $z == $y;
. In contrast, $x links (ROL: $y)
cannot be considered syntactic sugar ($x links ($y); $y isa ROL;
is not valid TypeQL since ROL
is an interface type, not a data-storing type!).
Type-checking failure
Extending our previous example, here is an example that as no solution:
match
$person has name $color;
$person has favorite_color $color;
$color isa name;
This combines the constraints: either type($color) ⊆ name
or type($color) = string
, either type($color) ⊆ val(favorite)
or type($color) = string
, and type($color) ⊆ name
.
Uniqueness of type-assignment for each solution
A remark for the theoretical computer scientists among us: for each solution of a TypeDB pattern, the type assignment is unique subject to the condition that we also choose the most specialized possible types (i.e. assign subtypes rather than super types whenever possible).
Summary
TypeQL is a highly principled language; types fall into four simple classes, and keyword organize accordingly. Variables (thanks to changes to how attributes work) have now been streamlined, making developing with TypeDB simpler and more intuitive. Finally, we clarified some of the principles working in the background of patterns governing how variables represented data point and computed variables are “bound” to types.