Officially out now: The TypeDB 3.0 Roadmap

TypeDB Fundamentals

Understanding variables, bindings, and type-checking in TypeDB



This article is part of our TypeDB 3.0 preview series. Sign up to our newsletter to stay up-to-date with future updates and webinars on the topic!

In this article, we explore the key principles underlying keywords, variables, and bindings in TypeDB. You will also learn about the simple design mechanisms that govern the type-checking of TypeQL patterns used in the match clauses of query pipelines.

Keywords

To begin, let us highlight that in TypeDB 3.0, we filled in a missing keyword, links, which you have seen in action in some of the previous examples in this article series. This means TypeQL keywords (including assignments and comparisons) now neatly organize into the following table.

(sub-) typingrelation dependencyattribute dependencyvariable assignmentequality comparison
typesubrelatesownsis
instanceisalinkshas= (*)is / == (**)
value===
binding?yesyesyesyesno
TypeQL 3.0 modeling and matching keywords.

Table notes

  • (*) The = for data instances can only be used when the RHS is a function call (whose return, in fact, may be a tuple mixing instances and values).
  • (**) == is a “comparison by value” and can be only used for attribute instances. In contrast, the is keyword not only compares attribute values, but also ensures that both attributes are of the same type! The is keyword can also be used for entity and relation instances.

Variables

With TypeDB 3.0 we abolished the need to differentiate between “value variables” (for value types like int) and “concept variables” (for schema types or their data instances). All variables are now indicates using the dollar symbol $.

This makes the overall coding experience more consistent and more concise—indeed, most other programming language do not distinguish objects from primitive values in their syntax for variables.

Importantly, this unification means that we may tacitly cast variables between different types. For example

match
  $person has name $color;
  $person has favorite_color $color;

The query will match persons, whose name is also their favorite color. Under the hood, $color is inferred to be a variable of type string, both names and favorite_colors can be cast into string so the above type checks! It is equivalent to writing:

match
  $person has name $name;
  $person has favorite_color $color;
  $color == $name; # both are cast into strings

In contrast to the previous case, in this pattern, $name can be safely inferred to be of type name and $color is inferred to be of type favorite_color. To understand how this works, let us next explore the binding and type-checking principles of TypeDB.

Bindings

Any variable in a pattern needs to be bound to something—otherwise TypeDB will return an error to prevent infinitely looping searches. We now explain the “theory” of variable such bindings in TypeQL’s declarative patterns.

Variables are bound by certain statements. In the above table, we indicated in the last row which keywords and operators bind variables and which don’t. In addition to these keywords, we allow binding syntax for working with function returns, lists, and structs, that can be used to bind variables as follows.

  1. Single-return binding. The operator = binds its left-hand side (LHS), which can be a tuple of variables, and the right-hand side (RHS) calls a single-return function. Note: this is the only case where we assign schema type instance variables with an assignment operator (see table above).
  2. Stream-return binding. The keyword in binds its LHS, which can be a tuple of variables, and the RHS calls a stream-return function.
  3. List-member binding. The keyword in binds its LHS, which is a single variable, and the RHS is a list (or list expression).
  4. List-index binding. Variables appearing as list indices [$index] of a list LIST are implicitly bound to range over the available indices (i.e., $index in [0...length(LIST)]). The same is not true if the index is an arithmetic expression.
  5. Destructuring binding. The operator = binds variables in its LHS, which is a struct with variables as fields, and the RHS is a struct expression.

Just as in TypeDB pre-3.0, value variable assignments need to be made in non-circular order—otherwise, we may run into impossible-to-solve equations!

Type-checking

Type-checking can often seen bit mysterious, but in TypeDB it follows straight-forward principles.

  1. Each variable in TypeQL pattern is either a type variable (representing a type) or an term variable which can be either a data instance from our database or a value computed as part of the schema.
  2. A solution to a pattern assigns each variable $x a concept concept($x), i.e. a term or a type, which we call the concept-assignment. We also assign each term variable $x a type, which we call the type-assignment, which is written type($x). If concept($x) is data instances, type($x) is the type of that instance as stored in the database. For concept($x) is a value, then type($x) is the (uniquely) inferred value type of that value.

Principles of the concept and type assignment

The concept and type-assignments must follow a set of simple principles. We explain these in the following sections. For notational convenience, let’s write T ⊆ S to mean the elements of a type T can be cast into elements of another type S; in other words, T is a subtype of S.

Schema statements

We start with the schema statements sub, relates, owns, and plays.

  • For statements $t sub $s we must have concept($t) ⊆ concept($s) and concept($t) != concept($s).
  • For statements $t relates $s we must have concept($t) relates the role type concept($s) either by direct declaration in the schema, or by inheritance from a supertype (without overriding it).
  • For statements $t plays $s we must have concept($t) plays the role type concept($s) either by direct declaration in the schema or by inheritance from a supertype.
  • For statements $t owns $s we must have concept($t) plays the role type concept($s) either by direct declaration in the schema or by inheritance from a supertype.

Data statements

Next, let’s consider TypeQL’s core data statements.

  • For statements $x isa TYP, we require type($x) ⊆ TYP.
  • For $x links ($r: $y), we require type($x) ⊆ R for a relation type R such that R relates $r (as defined above), and type($y) ⊆ ROL.
  • If ROL is kept implicit by writing $x links ($y), then we require type($y) ⊆ ROL for any role type of type($x).
  • For $x links (ROL[]: $y) we require type($x) to be a relation type with role type ROL in list form, and type($y) = list(ROL) (we also write this type ROL[]). ROL cannot be kept implicit in this case.
  • For $x has ATT $y, we require that type($x) must own the attribute type ATT, and either type($y) ⊆ ATT or type($y) = val(ATT), where val(ATT) is the value type of ATT (*). If ATT is kept implicit by writing $x has $y, then we require type($y) ⊆ ATT for any attribute ATT owned by type($x).
  • For $x has ATT[] $y, we require that type($x) must own the attribute type ATT in list form, and either type($y) ⊆ list(ATT) or type($y) = list(val(ATT)), where val(ATT) is the value type of ATT. ATT cannot be kept implicit in this case.
  • In the case of list types T[] (i.e. ROL[] and ATT[]) this works similarly, noting that S ⊆ T implies T[] ⊆ S[].

(*) The inclusion of the case type($y) = val(ATT) is purely for convenience of the developer. Under the hood, we think of $x has ATT $y has syntactic sugar for $x has $z; $z isa ATT; $z == $y;. In contrast, $x links (ROL: $y) cannot be considered syntactic sugar ($x links ($y); $y isa ROL; is not valid TypeQL since ROL is an interface type, not a data-storing type!).

Type-checking failure

Extending our previous example, here is an example that as no solution:

match
  $person has name $color;
  $person has favorite_color $color;
  $color isa name;

This combines the constraints: either type($color) ⊆ name or type($color) = string, either type($color) ⊆ val(favorite) or type($color) = string, and type($color) ⊆ name.

Uniqueness of type-assignment for each solution

A remark for the theoretical computer scientists among us: for each solution of a TypeDB pattern, the type assignment is unique subject to the condition that we also choose the most specialized possible types (i.e. assign subtypes rather than super types whenever possible).

Summary

TypeQL is a highly principled language; types fall into four simple classes, and keyword organize accordingly. Variables (thanks to changes to how attributes work) have now been streamlined, making developing with TypeDB simpler and more intuitive. Finally, we clarified some of the principles working in the background of patterns governing how variables represented data point and computed variables are “bound” to types.

Share this article

TypeDB Newsletter

Stay up to date with the latest TypeDB announcements and events.

Subscribe to Newsletter

Further Learning

The TypeDB 3.0 Roadmap

The upcoming release of version 3.0 will represent a major milestone for TypeDB: it will bring about fundamental improvements to the architecture and feel, and incorporate pivotal insights from our research and user feedback.

Read article

Pipelines (3.0 Preview)

In this article we describe how, from a set of basic query operations, complex data pipelines can be crafted in TypeDB.

Read article

Functions (3.0 Preview)

Functions provide powerful abstractions of query logic, which can be nested, recursed, or negated, and they natively embed into TypeQL's declarative patterns.

Read article

Feedback