Advanced: Invalid Patterns

This section dives into the semantics of TypeQL, using the Disjunctive Normal Form to explain why Disjoint variable reuse makes a pattern invalid.

Schema for the examples
#!test[schema, commit]
define
attribute name, value string;
relation employment, relates employee, relates employer;
relation education, relates institute, relates attendee;
entity person owns name, plays employment:employee, plays education:attendee;
entity company, owns name, plays employment:employer;
entity school, owns name, plays education:institute;

Disjoint variable reuse

This section demonstrates some of the cases which can arise due to disjoint variable reuse

Unbound negation inputs

Notice we can write a pattern where the input variable to a negation is not bound in the parent. For example:

#!test[read, fail_at=runtime]
match
$p1 isa person; $p2 isa person;
{
  $emp1 isa employment, links (employer: $company, employee: $p1);
} or {
  $edu1 isa education, links (institute: $institute, attendee: $p2);
};

not { $emp2 isa employment, links (employer: $company, employee: $p2); };
not { $edu2 isa employment, links (institute: $institute, attendee: $p2); };

At first glance, this looks like a reasonable query: we query for persons $p1 and $p2 who neither worked for the same company, nor attended the same institute. However, you can see that the input variables for the negations ($company and $institute) are not local to the negation, but also not bound in the parent conjunction.

Disjunctive Normal Form

The best way to think about these requirements is to convert the query to Disjunctive Normal Form by rewriting the pattern using "distributivity" and examining each branch:

A; { B; } or { C; }; becomes {A; B} or {A; C};

In this case, we get the pattern:

#!test[read, fail_at=runtime]
match
{
  $p1 isa person; $p2 isa person;
  $emp1 isa employment, links (employer: $company, employee: $p1);
  not { $emp2 isa employment, links (employer: $company, employee: $p2); };
  not { $edu2 isa employment, links (institute: $institute, attendee: $p2); };
} or {
  $p1 isa person; $p2 isa person;
  $edu1 isa education, links (institute: $institute, attendee: $p2);
  not { $emp2 isa employment, links (employer: $company, employee: $p2); };
  not { $edu2 isa education, links (institute: $institute, attendee: $p2); };
};

Although this could now be a valid logic query, the first branch requires that $p2 did not attend any institute, and the second branch requires that $p2 was not employed by any employer. This is clearly not what we intended to write. Hence, we flag these as invalid TypeQL queries.

There is, of course, a way to express the intended query:

The correct query
#!test[read, fail_at=runtime]
$p1 isa person; $p2 isa person;
not {
  $emp1 isa employment, links (employer: $company, employee: $p1);
  $emp2 isa employment, links (employer: $company, employee: $p2);
};
not {
  $edu1 isa education, links (institute: $institute, attendee: $p2);
  $edu2 isa education, links (institute: $institute, attendee: $p2);
};

Reusing branch local variables in disjunctions

Consider another case of questionable query composition:

#!test[read, fail_at=runtime]
match
$p1 isa person; $p2 isa person;
{
  $emp1 isa employment, links (employer: $company, employee: $p1);
} or {
  $edu1 isa education, links (institute: $institute, attendee: $p2);
};
{
  $emp2 isa employment, links (employer: $company, employee: $p2);
} or {
  $edu2 isa education, links (institute: $institute, attendee: $p2);
};

Ideally, this would be a query to find two persons $p1 and $p2 who were either employed by the same company, or attended the same institute.

The DNF quickly reveals the mistake:

#!test[read, fail_at=runtime]
match
{
  $p1 isa person; $p2 isa person;
  $emp1 isa employment, links (employer: $company, employee: $p1);
  $emp2 isa employment, links (employer: $company, employee: $p2);
} or {
  $p1 isa person; $p2 isa person;
  $edu1 isa education, links (institute: $institute, attendee: $p2);
  $emp2 isa employment, links (employer: $company, employee: $p2);
} or
{
  $p1 isa person; $p2 isa person;
  $emp1 isa employment, links (employer: $company, employee: $p1);
  $edu2 isa education, links (institute: $institute, attendee: $p2);
} or {
  $p1 isa person; $p2 isa person;
  $edu1 isa education, links (institute: $institute, attendee: $p2);
  $edu2 isa education, links (institute: $institute, attendee: $p2);
};

You can see the query we meant to write in two of those branches:

#!test[read]
match
$p1 isa person; $p2 isa person;
{
  $emp1 isa employment, links (employer: $company, employee: $p1);
  $emp2 isa employment, links (employer: $company, employee: $p2);
} or {
  $edu1 isa education, links (institute: $institute, attendee: $p2);
  $edu2 isa education, links (institute: $institute, attendee: $p2);
};

The problem lies in the other two branches.

#!test[read]
match
{
  $p1 isa person; $p2 isa person;
  $emp1 isa employment, links (employer: $company, employee: $p1);
  $edu2 isa education, links (institute: $institute, attendee: $p2);
} or {
  $p1 isa person; $p2 isa person;
  $edu1 isa education, links (institute: $institute, attendee: $p2);
  $emp2 isa employment, links (employer: $company, employee: $p2);
};

This will return any persons $p1 & $p2 when either (1) $p1 is employed by any and $p2 attended any institute; or (2) $p2 is employed by any company and $p1 attended any institute.

Notice $company is "internal" to both the first and second disjunctions (The same is the case for $institute). TypeQL throws a "disjoint variable re-use" error for such cases.

Single origin for optionals

Optional variables can only be used in only one try block in a given match clause. Consider:

#!test[read, fail_at=runtime]
match
  try { $x isa person, has name "John"; };
  try { $x isa person, has name "James"; };

The reason this is banned is similar to the Unbound negation inputs case above: It is unclear if $x is an input or either of the try blocks, or an optional variable.

The same variable can be used in try blocks in the next stage. This is because the variable is bound - either to a concept, or to None - and is an 'input' to the block. The block simply fails if it tries to use a variable bound to None in a constraint.

The semantics of try blocks dictate the equivalence of

try { P }; and { P } or { not { P' }; };

where, P' is the pattern obtained by renaming all the 'optional' variables in P with fresh ones. Rewriting the above query with this equivalence (assuming $x to be 'optional' and renaming it, although this is ambiguous)

#!test[read, fail_at=runtime]
match
{ $x "John"; } or not { $x_1 "John"; };
{ $x "James"; } or not { $x_2 "James"; };

Taking the DNF gives us 4 cases, and some very unintuitive behaviour:

#!test[read, fail_at=runtime]
match
{ $x "John"; $x "James"; } or               # persons named both James and John.
{ $x "John"; not { $x_2 "James"; }; } or    # If nobody is named James, then persons named John
{ not { $x_1 "John"; };  $x "James"; } or   # If nobody is named John, then persons named James
{ not { $x_1 "John"; };  not { $x_2 "James"; }; }; # If nobody is named either, then a single answer `None`
# If there is one person named John, and a different person named James, then no answers