Data and query model
This page gives a brief technical overview of TypeDB’s type system and query model.
In short, TypeDB’s data model based on a type system (with single-inheritance subtyping and trait-based referencing). Its query model builds on this type system, and extends it by allowing users to chain together queries and data operations in data pipelines.
Type system
Data and direct types
All data in TypeDB has a type. Due to subtyping, data may have more than one type (for example, an entity of type admin
may also be of type user
). However, all data in TypeDB’s type system has a unique unambigous direct type.
-
For data instances, i.e. data stored in the database, the direct type is determined at the time of creation of the data. For example,
insert $x isa admin
will create an entity of direct typeadmin
, whereasinsert $x isa user
will create an entity of direct typeuser
. -
For data literals (like
1
,1.0
,true
) the direct type is identified based on syntax: for example1
will be aninteger
, whereas1.0
is adouble
. (Note that when working with expressions of a certain type, implicit and explicit casts may be applied.)
Type taxonomy
TypeDB’s type system distinguishes types as follows.
-
User-defined types. Types defined in the schema of the database.
-
Entity types, specified using the keyword
entity
.Instances of entity types, simply called entities, can be created indepdently, i.e., without reference to other data instance.
-
Relation types, specified using the keyword
relation
, and are required to have at least one associated role type specified with the keywordrelates
.Instances of relation types may reference (or "link") zero or more data instances for each associated role type, called their players of that role (these data instance must have types with corresponding player trait).
For example, a friendship relations may link two friends (implemented by, say, two user entities).
Relations without role players will be removed (no “dangling relations”).
-
Attribute types, specified using the keyword
attribute
, and required to have an associated value type (either structured or primitive) specified with the keywordvalue
.Instances of attribute types carry an associated value of the associated value type. They may also reference other data instances, called their owners (these data instances must have direct types with the corresponding owner trait). Two attributes are considered equal if they have the same associate value and have the same (origin) type.
Attributes without owners will be removed (no “dangling attributes”, unless the attribute type is marked as independent).
-
Structured value types, specified using the keyword
struct
and used in the context of defining attribute types via the keywordvalue
.Coming soon.
-
-
Built-in types (and type operators).
-
Primitive value types, used in the context of defining attribute types via the keyword
value
.Two instances of a value type are equal exactly when their literal values are equal (as checked with equality comparison
==
). -
List types, used by the syntax
[<T>]
for a non-list type<T>
(which can be either built-in or user-defined).Coming soon.
-
Type traits
Entity types can have traits, i.e. play roles or own attribute types. This allows us to create connections of entities with other data.
Relation types can also have traits, i.e. play roles or own attribute types. This, for example, allows the creation of nested relations (i.e., relations playing roles in other relations).
Since entity and relation types together make up “first-class” types in TypeDB’s type system (when it comes to type traits) they are, together, also referred to as object types, whose instances are data objects.
Attributes cannot have traits; they are merely typed associations of values to objects.
Definitions
User-defined types, their traits, and their subtyping structure, are created and modified through definition statements in schema queries.
Data pipelines
Concept rows
A concept row is an association of named variables (the “column names”) to concepts, i.e., type labels, data instances, or literal values:
{ $var_1=concept, $var_2=concept, ..., $var_n }
When working with optional variables in data pipelines, concept assignment may be left empty, indicated by ()
, e.g.:
{ $var_1=type_label, $var_2=3.1415, $var_3=(), $var_4=0x001 }
Stages
Data pipelines are chains of stages: each stage gets an input stream of concepts row, and outputs itself a stream of concept rows. The first stage in a pipeline gets the unit row {}
as its single input (note: this is not an empty stream).
There are two classes of stages:
-
Stateful stages, like
match
orinsert
, are stages whose execution results depend on the state of (i.e. the data in) the database. -
Stateless stages, like
reduce
orselect
, only depend on the input stream and can thus be executed without querying the database.
Queries vs types
The match
stage is the only data pipeline stage which filters and retrieves data using TypeQL patterns: patterns are a direct counterpart to TypeDB’s type system and expresses typing and reference constraints of data.
For the curious reader: TypeDB adheres to the principle of “queries as types” (an analog of propositions as types). This means, patterns in
may be thought of as a struct (or “product”) type:
Similarly, a TypeQL disjunction
may be thought of as a enum (or “sum”) type:
TypeQL’s patterns provide an elegant way of producing yet more general composite types in this way. |
Variables categories
Stages in TypeQL data pipeline will contain statements with variables. Any variable in a stage belongs to exactly one of four categories.
-
Instance variables, represent data instances from your database. Instance variables are identified by their usage in statements. For example:
match $someone isa $user;
implies that
$someone
isa an instance variable. -
Value variables, represent values that are computed as part of your query. Value variables are identified by their usage in statements. For example:
match let $computed_val = $other_val + 1;
implies that
$computed_val
is a value variable. -
Type variables, represent type labels. As before, they are identified by their usage in statements. For example:
match $something isa $type;
implies that
$type
is a type variable. -
List variables.
Coming soon.
Variable modes
Each variable in a stage can appear in one of the following three modes.
-
Input (or “bound”) variables are variables that are bound in a preceding stage. For example,
match $x isa user, has age 33; limit 1; match friendship ($y, $x); # $x is an input for this stage
-
Output variables are unbound variables that are part of the present stage’s output (at least in one
or
-branch of the query). For example, in the querymatch $x isa user; { $x isa VIP; } or { $x isa admin, has access-level $lvl; $lvl > 9000; };
both
$u
and$lvl
are returnable variables.Optional variables are a subclass of returnable variables: these or variables that may be left empty in the stages output (you may think of this as assigning them to the unit
()
of the unit type, which is implicit in our type system).Further optionality features coming soon!
-
Internal variables are variables that are not returned by the stage. The prototypical example is a
not
query, likematch $x isa user; not { friendship ($x, $y); }
where the variable
$y
is internal. The class of internal variable also includes anonymous variables (indicated by$_
).An important aspect of internal variables is they do not lead to result duplication. For example,
match $x isa user; friendship ($x, $_);
will return exactly the set of users (instead of a copy of the same user for each of their friends).
Query evaluation
Queries are evaluated in two stages:
-
Type inference. TypeDB will double-check which combinations of types are valid given the constraints in your query. If a non-type variable cannot be assigned a type, the query will fail at this stage with a type inference error.
-
Stage-by-stage evaluation. Stage in the data pipeline are evaluated one after the other. In each case, for each output row from the previous stage the present stage, will produce n output rows (n may be zero), and pass each such row, after augmenting it with its own input, as an input row to the next stage.