Query Variables and Patterns
TypeQL is a declarative language: it focuses on describing the schema, data, or query through patterns, rather than specifying how to perform the query.
-
constraints;
-
negated patterns;
-
optional patterns;
-
disjunctions of patterns.
TypeQL conjunctions are commutative & composable, meaning the pattern ordering is irrelevant. You can (almost always) add or remove constraints, while the resulting conjunction remains valid for querying. This makes TypeQL easy to write and easy to debug!
Constraints & Variables
The basic building block of a pattern is a constraint between one or more concepts. A concept is a type, an instance of a type, or a value.
The concepts in a constraint can be specified directly, as we do when defining the schema:
define
entity person; # 'person' is an entity type.
attribute name; # 'name' is an attribute type.
name value string; # 'name' holds string values.
person owns name; # 'person' instances can own 'name' instances.
Or, we can use variables as placeholders for concepts, and query the database for concepts which "match" the pattern.
match # Find $e and $n such that:
entity $e; # $e is an entity type.
attribute $n; # $n is an attribute type.
$n value string; # $n holds string values.
$e owns $n; # $e instances can own instances of $n.
Since instances aren’t named the same way types are, they are always bound through variables and constraints.
match
# find the instances $n, $p, $e such that:
## $n is a 'name' with value "John"
$n isa name; $n == "John";
## $p is a `person`, who has the attribute $n
$p isa person; $p has $n;
## $e is an 'employment', where $p plays the 'employee' role
$e isa employment; $e links (employee: $p);
TypeQL provides shorthands to write queries in more concise, natural ways. Using these, the above query can be rewritten to:
We will continue writing out the constraints for illustrative purposes. |
Matching & inserting patterns
Match
What does it mean to "match" a pattern? A conjunctive pattern is "satisfied" if each variable can be mapped to some concept so that every constraint in the pattern holds true in the database, and every subpattern is also satisfied.
Each such mapping from variables to concepts is an answer to the pattern. TypeDB will return all such mappings which satisfy the pattern.
Since satisfying a constraint means the variables in it must be mapped to some concept, we say the variable is "bound" by that constraint.
A variable is bound in a conjunction if it is bound by any constraint or in any subpattern. |
Insert
We use the same syntax for inserting "patterns" as we did for matching patterns in the data.
The difference between insert
clause patterns and match
clause patterns is:
-
match
patterns search the database and bind the variables to the matching concepts; -
insert
clauses specify the concepts to bind to the variables and create those patterns in the database.
Notice the above query & comments fit naturally:
insert
## create the instances $n, $p, $e such that:
## $n is a 'name' with value "John"
$n isa name; $n == "John";
## $p is a `person`, who has the attribute $n
$p isa person; $p has $n;
## $e is an 'employment', where $p plays the 'employee' role
$e isa employment; $e links (employee: $p);
Variables are also bound to concepts when they are inserted by an |
A quick note on variables
Variable names are unique within a pipeline - every usage of a variable with a certain name refers to the same variable.
In the following sections, the term "local" is used to mean a variable is bound in that pattern but not in its parent. This contrasts with the usage in programming languages where it refers to the scope of a variable name. As mentioned above, the scope of variable names in TypeQL is the pipeline.
Disjunctions, negations & optionals
We often want to write more complex patterns than pure conjunctions, which mean all the subpatterns must be true. This section describes the meaning of disjunctive, negated, and optional patterns. It also discusses visibility of variables present in each of these subpatterns.
We use the following data as our running example:
insert
$james isa person, has name "James", has username "@james";
$typedb isa company, has name "TypeDB", has username "@typedb";
$emp isa employment, links (employer: $typedb, employee: $james);
$john isa person, has name "John", has username "@john";
$sky-high isa school, has name "Sky High School", has username "@sky-high";
$edu isa education, links (institute: $sky-high, attendee: $john);
$jeff isa person, has name "Jeff", has username "@jeff";
$shut-shop isa company, has name "Shut Shop", has username "@shut-shop";
Disjunctions
Any number of patterns can be combined in a disjunction using the or
keyword.
An answer to any pattern in the disjunction is an answer to the disjunction.
A variable is "bound" in a disjunction if it is bound in every branch of the disjunction. Variables which are present in & bound by only some of the branches of the disjunction (and not present outside it) are considered "local" to that disjunction.
match $p isa person, has name $p-name; { $emp isa employment, links (employer: $company, employee: $p); $company has name $org-name; } or { $edu isa education, links (institute: $institute, attendee: $p); $institute has name $org-name; };
Here, $emp
, $company
, $edu
, $institute
are local to their patterns.
$p
occurs outside the disjunction.
$org-name
is present in every-branch and returned by the disjunction.
The query returns:
--------------- $p | isa person, iid 0x1e00030000000000000004 $p-name | isa name "John" $org-name | isa name "Sky High School" --------------- $p | isa person, iid 0x1e00030000000000000005 $p-name | isa name "James" $org-name | isa name "TypeDB" --------------- Finished. Total rows: 2
Negations
A pattern is negated by wrapping it in a not
block.
An answer "satisfies" the negation if and only if it cannot be (part of) an answer to the negated pattern.
Hence, answers are "eliminated" by negations if the pattern inside is matched.
This resembles the idea of "negation as failure" from logic programming.
Any variable that is present both outside and inside the negation is considered input to the negation and must be bound before matching the negated pattern. This requires that the variable is bound in the parent conjunction.
Variables that are present only inside the negation are considered local to the negation. A negation without any inputs is extremely unusual.
Local variables are not bound by the negation, since the negation is satisfied only if there are no answers to the negated pattern. |
The following query finds person
s who are not involved in any employment
relation with an employer
.
match
$p isa person, has name $p-name;
not {
$e isa employment, links (employer: $c, employee: $p);
};
Here, $p
is an input to the negation. $e, $c
are local. The query returns:
------------- $p-name | isa name "Jeff" $p | isa person, iid 0x1e00030000000000000007 ------------- $p | isa person, iid 0x1e00030000000000000004 $p-name | isa name "John" ------------- Finished. Total rows: 2
As a second example, consider the query below:
match
$p isa person, has name $p-name;
$c isa company, has name $c-name;
not {
$e isa employment, links (employer: $c, employee: $p);
};
Here, $c
is also an input variable.
This query finds pairs of person
& company
such that the person does not work at the company.
------------- $p | isa person, iid 0x1e00030000000000000004 $p-name | isa name "John" $c | isa company, iid 0x1e00050000000000000002 $c-name | isa name "TypeDB" ------------- $p | isa person, iid 0x1e00030000000000000007 $p-name | isa name "Jeff" $c | isa company, iid 0x1e00050000000000000002 $c-name | isa name "TypeDB" ------------- $p | isa person, iid 0x1e00030000000000000004 $p-name | isa name "John" $c | isa company, iid 0x1e00050000000000000003 $c-name | isa name "Shut Shop" ------------- $p | isa person, iid 0x1e00030000000000000005 $p-name | isa name "James" $c | isa company, iid 0x1e00050000000000000003 $c-name | isa name "Shut Shop" ------------- $p | isa person, iid 0x1e00030000000000000007 $p-name | isa name "Jeff" $c | isa company, iid 0x1e00050000000000000003 $c-name | isa name "Shut Shop" ------------- Finished. Total rows: 5
Optionals
A pattern is made optional by wrapping it in a try
block.
A variable that occurs only inside a try
block is an optional variable.
The other variables can be considered input to the pattern. Optional variables can only originate in one try
block to avoid ambiguity.
If the entire optional pattern matches the data, it returns an answer per match.
If it has no matches (i.e., any part fails), it produces a single answer with every optional variable bound to None
.
match
$p isa person, has name $p-name;
try {
$e isa employment, links (employer: $c, employee: $p);
$c has name $c-name;
};
Here, $e, $c, $c-name
are optional variables.
$p
is input to the optional pattern.
The query returns:
-------------
$p | isa person, iid 0x1e00030000000000000005
$p-name | isa name "James"
$e | isa employment, iid 0x1f00090000000000000002
$c | isa company, iid 0x1e00050000000000000002
$c-name | isa name "TypeDB"
-------------
$p | isa person, iid 0x1e00030000000000000004
$p-name | isa name "John"
$e |
$c |
$c-name |
-------------
$p | isa person, iid 0x1e00030000000000000007
$p-name | isa name "Jeff"
$e |
$c |
$c-name |
-------------
Finished. Total rows: 3
Optional variables are bound by the optional block and not local to it. |
Notes on variables
Variables in an answer
Notice that a variable that is "bound" in a conjunction is guaranteed to be bound in ALL execution paths - i.e., regardless of which branch is taken in each disjunction. An answer to a pattern contains only those variables bound in the root conjunction.
Invalid patterns: unbound negation inputs
Notice we can write a pattern where the input variable to a negation is not bound in the parent. For example:
match
$p1 isa person; $p2 isa person;
{
$emp1 isa employment, links (employer: $company, employee: $p1);
} or {
$edu1 isa education, links (institute: $institute, attendee: $p2);
};
not { $emp2 isa employment, links (employer: $company, employee: $p2); };
not { $edu2 isa employment, links (institute: $institute, attendee: $p2); };
At first glance, this looks like a reasonable query:
we query for persons $p1
and $p2
who neither worked for the same company,
nor attended the same institute.
However, you can see that the input variables for the negations ($company
and $institute
)
are not bound in the parent conjunction. Hence, this is an invalid query.
DNF
The best way to think about these requirements is to convert the query to Disjunctive Normal Form by rewriting the pattern using "distributivity" and examining each branch:
A; { B; } or { C; };
becomes {A; B} or {A; C};
In this case, we get the pattern:
match
{
$p1 isa person; $p2 isa person;
$emp1 isa employment, links (employer: $company, employee: $p1);
not { $emp2 isa employment, links (employer: $company, employee: $p2); };
not { $edu2 isa employment, links (institute: $institute, attendee: $p2); };
} or {
$p1 isa person; $p2 isa person;
$edu1 isa education, links (institute: $institute, attendee: $p2);
not { $emp2 isa employment, links (employer: $company, employee: $p2); };
not { $edu2 isa education, links (institute: $institute, attendee: $p2); };
};
Although this could now be a valid logic query,
the first branch requires that $p2
did not attend any institute,
and the second branch requires that $p2
was not employed by any employer.
This is clearly not what we meant. Hence, we flag these as invalid TypeQL queries.
There is, of course, a way to express the intended query: The correct query
|
Invalid patterns: Disjoint variable re-use
Consider another case of questionable query composition:
match
$p1 isa person; $p2 isa person;
{
$emp1 isa employment, links (employer: $company, employee: $p1);
} or {
$edu1 isa education, links (institute: $institute, attendee: $p2);
};
{
$emp2 isa employment, links (employer: $company, employee: $p2);
} or {
$edu2 isa education, links (institute: $institute, attendee: $p2);
};
Ideally, this would be a query to find two persons $p1
and $p2
who
were either employed by the same company, or attended the same institute.
The DNF quickly reveals the mistake:
match
{
$p1 isa person; $p2 isa person;
$emp1 isa employment, links (employer: $company, employee: $p1);
$emp2 isa employment, links (employer: $company, employee: $p2);
} or {
$p1 isa person; $p2 isa person;
$edu1 isa education, links (institute: $institute, attendee: $p2);
$emp2 isa employment, links (employer: $company, employee: $p2);
} or
{
$p1 isa person; $p2 isa person;
$emp1 isa employment, links (employer: $company, employee: $p1);
$edu2 isa education, links (institute: $institute, attendee: $p2);
} or {
$p1 isa person; $p2 isa person;
$edu1 isa education, links (institute: $institute, attendee: $p2);
$edu2 isa education, links (institute: $institute, attendee: $p2);
};
You can see the query we meant to write in two of those branches:
match
$p1 isa person; $p2 isa person;
{
$emp1 isa employment, links (employer: $company, employee: $p1);
$emp2 isa employment, links (employer: $company, employee: $p2);
} or {
$edu1 isa education, links (institute: $institute, attendee: $p2);
$edu2 isa education, links (institute: $institute, attendee: $p2);
};
The problem lies in the other two branches.
match
{
$p1 isa person; $p2 isa person;
$emp1 isa employment, links (employer: $company, employee: $p1);
$edu2 isa education, links (institute: $institute, attendee: $p2);
} or {
$p1 isa person; $p2 isa person;
$edu1 isa education, links (institute: $institute, attendee: $p2);
$emp2 isa employment, links (employer: $company, employee: $p2);
};
This will return any persons $p1
& $p2
when
either (1) $p1
is employed by any and $p2
attended any institute;
or (2) $p2
is employed by any company and $p1
attended any institute.
Notice $company
is "local" to both the first and second disjunctions (The same is the case for $institute
).
TypeQL throws a "disjoint variable re-use" error for such cases.
The select statement
select
is exceptional in that it will free up all variable names except those of the selected variables.
This is a workaround to the uniqueness requirement on variables names within a pipeline.