TypeDB Fundamentals
Modular database programming using functions
In TypeDB 3.0, functions provide powerful abstractions of query logic, and a cornerstone of the functional database programming model. Functions calls can be nested, recursive, negated, and they natively embed into TypeQL’s declarative patterns.
At a glance: syntax essentials
Functions will bring several syntactic novelties to TypeQL, including:
- The keyword
fun
for defining functions. Each function has a type signature(input) -> output
, whereinput
is a list of (typed) variables andoutput
a valid return type. - A new “stream” type
{T1, T2, ..}
describing streams of (potentially partial) tuples(t1, t2, ...)
of typet1 : T1, t2 : T2, ...
. return
statements for specifying the data returned by a function.- An
in
statement for accessing elements in the return stream of a function.
Defining functions at query-level
Functions can be defined directly at the level of queries using a with
clause (we can have one or more such clauses, each defining a single functions, but they must all precede the query pipeline). Let’s define a simple function to aid us with a query for finding employees of a given company and country that earn more than their company’s average salary in that country.
with fun mean_salary($c: company, $n: nationality) -> double?:
match
(employer: $c, employee: $emp) isa employment, has salary $sal;
$emp has nationality $n;
return mean($sal);
match
$e_corp isa company, has name "E-Corp";
$e isa person, has nationality $nation;
$mean_sal = mean_salary(country=$nation, company=$e_corp);
(employer: $c, employee: $e) isa employment, has salary >= $mean_sal;
fetch {
"above-average-earner": {
"name": $e.name,
"birthday": $e.birthday
}
}
Function definition
Looking at the above query, note the input of the function is a list of typed variables ($c: company, $n: country)
. In general, the types could actually be omitted (and we plan to implemented this in the future) in which case TypeDB’s type inference engine could be used to infer types of input from the body of the patterns. This would drastically increases our users freedom in creating functions.
Next, note that the return statement calls upon the stream reducing function mean
, which computes the average of a given variable’s values across the entire stream. This means the function will return a either a single double
value or be undefined (if the stream is empty or there are zero occurrences of the variable in the stream, which can happen for optional variables). The return type is defined accordingly as double?
: an optional boolean.
Function call
Next, in the match
clause of our pipeline, we call the function as mean_salary(country=$nation, company=$e_corp)
. This uses named-argument call syntax. As an alternative, a positional call syntax, à la mean_salary($e_corp, $nation)
, would also be possible. The returned data of the function is assigned to a variable $mean_sal
, which can then be used as usual in patterns.
Our example returns a single result; in our case this is a single double?
, but in general it could be a tuple of results (e.g. with return type double?, int, name[]
). We speak of single-return function, which is the first of two case of function return types. The second case is that of functions returning streams; we speak of stream-return functions. We will see an example of the latter case shortly.
Failure on empty return
Importantly, if no average salary for a company is returned by our mean_salary
function (this can happen since the return is an optional boolean) then our pattern will fail: indeed, no results that can be assigned to the variable $mean_sal
, which is not an optional variable in our pattern.
To avoid such failure, we can use the anonymous variable $_
. For example, when a function returns tuple mixing optional and non-optional types (e.g., mean_salary_and_employee_count(...) -> double?, int
), we can indicate to TypeDB that we don’t need and care about the optional return by calling the function as $_, $count =
. In this way, even if there are no employees (so the count is mean_salary_and_employee_count(...)
0
), the pattern will match the left-hand side $_, $count
of the assignment.
Where and where not to call a function
We remark that functions can only be called in the following three situations:
- As part of a query pattern in a
match
clause (so that’s the case above!), - As part of a condition to-be-checked in an
assert
clause, - As part of the constructing the final JSON output in a
fetch
clause (see our pipeline fundamentals on this and the previous point).
In particular, functions can not be called in query clauses that have an effect in the database itself (such as insert
, delete
, and put
). This is because the execution of the function and the effect that these statements have on the database may interfere with one another!
Defining functions at schema-level
It is often the case that there are fundamental “features” of our data that can be programmatically extracted. In this case, it make sense to define functions that do exactly that, and make these definition part of the schema for our data model. This is achieved using a usual define
query in a schema
transaction. Let’s see an example!
define
customer sub entity, owns age, owns license_type;
age sub attribute, value int;
license_type sub attribute, value string;
vehicle_type sub attribute, value string;
...
fun can_rent($customer: customer, $vehicle: vehicle) -> bool:
match
$customer has age >= 25, has license_type $t;
$vehicle has vehicle_type $t;
return check;
The function defined above, again, is a single-returning function—the return type is bool
. Like mean
, the call to check
performs a stream reduction: in fact, this is the simplest kind of reduction as it only “checks whether a stream is non-empty”.
It’s also worth highlighting how the vehicle_type
attribute $t
is matched to a license_type
attribute—an implicit to-string conversion is happening in the background. (Pre-3.0, we’d have had to write $c has license_type $t1; $v has vehicle_type $t2; $t1 == $t2;
instead!)
Calling a schema-level function
Let’s see how to use the above schema-level function in action, with the following hypothetical query:
match
$request(requester: $c, requested: $v) isa rental_request;
can_rent(customer= $c, vehicle= $v) == true;
insert
$request has status "approved" @replace;
It’s easy to read the query and it does the obvious: it searches through car rental requests, where the car requested is of a type that the requester is in fact allowed to rent, and then marks the request as approved.
The match
clause
In the match
clause, we see our schema-defined function in action. Our function appears as the left-hand side of a equality comparison ==
whose right-hand side is a boolean, so the types check out! We believe this should not need any further explanation.
Updating the DB state
In the final insert
clauses we then update the state of our database with the matches we found. We will revisit the usage of the @replace
annotation here in a the section on constraint annotations!
From rules to functions
In a way, the previous two examples nicely illustrated that functions can elegantly replace (pre-3.0) rules. But let us say this more explicitly: functions replace rules! Let’s see another example of this. This time, we will finally consider a stream-returning function.
The rule way (“the old way”)
In TypeQL 2.x, we may have written a function of the following form.
define rule extend_surnames_by_marriage:
when {
$x has surname $name;
($x, $y) isa marriage;
} then {
$y has surname $name;
}
When inference is “switched on”, this rule ensures that, when querying for surnames of a person we automatically consider the surnames of their (potential) spouses as well. If we don’t want to consider these extensions of the type of surnames, we can leave inference “switched off”.
The function way (“this is the way”)
With functions we can easily achieve the same. But there no longer is an “inference switch”, and so we will have to introduce an additional concept (let’s call it “extended surnames”) to distinguish between the version of surnames with or without inference switched on.
define fun extended_surname($x: person) -> {surname}:
match
{ $x has surname $name }
or
{ ($x, $y) isa marriage; $y has surname $name; };
return { $name } ;
The return of this function is a stream of surnames, indicated by writing {surname}
. (In general, we may have streams of tuples as explained earlier, e.g., {name, birthday, bool}
).
In TypeDB 3.0, there is no need for an inference toggle. Depending on whether we may want inference to be “toggled” on or off, we simply write our queries accordingly as follows:
# inference "off"
match $x has surname $n;
# inference "on"
match $n in extended_surname($x);
In the second case, when using our stream-returning function extended_surname
, note that we use the keyword in
for unwinding the elements of a stream. The resulting pattern expresses “$name
is a member of the stream extended_surname($x)
”. Simple enough!
Differentiating single-return and stream-return functions
Note the syntactic difference between single-return and stream-return functions in the above: extended_surname
returns a stream of results and we use the keyword in
to match a variable with a result from that stream.
The keyword also works when function return streams of tuples. In this case we would write, e.g., $nam, $rel in names_and_relatives($person)
. The corresponding return statement would look like return { $x, $y };
.
In contrast, for single-return functions, we always use the assignment operation =
instead of in
. This similarly works when the function returns tuples of results, e.g., we could have $mean, $deviation = salary_mean_and_dev($ecorp)
. The corresponding return statement would look like return mean($x), std($x);
in this case.
Next, we look at a more advanced use case of functions: recursion.
Recursion with function
TypeDB is crafted for handling complex, interconnected data. Frequently, this involves querying for arbitrarily sized connections, which requires (on way or the other) the recursive construction of results. This naturally fits into the functional database paradigm.
Let’s see a simple example of how to construct flight paths for multi-leg flights. Assuming appropriate types to be defined in our schema, we define a schema-level function as follows:
define function multi_leg_flight($start: city, $end: city, $max_hops: int) -> {flight[]}:
match
$first_leg isa flight, links (dept: $start, dest: $dest);
{ $dest == $end; $path = [ $first_leg ]; } or
{ $dest != $end; $path = [ $first_leg ] + $other_legs;
$other_legs in multi_leg_flight($dest, $end, $max_hops - 1); };
$max_hops >= 1;
return { $path };
Note that this function is stream-returning: it returns the streams of “flight paths” which we model as lists of flights, flight[]
. The paths will run between the cities $start
and $end
with a given (positive) maximum number of $max_hops
hops.
Constructing flight paths
Let’s look at the function’s body. Besides matching a $first_leg
flight from the $start
city to some destination $dest
, the function has two or
-cases.
- First, we consider the case the
$dest
is in fact already our$end
destination in which case we have found a (single-leg)$path
from$start
to$end
as requested. - In the second case,
$dest
is not our$end
destination. In this case, our$path
is the concatenation of the$first_leg
and the$other_legs
, where we recursively search for the$other_legs
to cover the remaining journey.
It’s worth emphasizing that our syntax here is declarative; the order of statements does not mattern in the function body. For example, we could have placed the pattern constraint $max_hops >= 1;
at the beginning of the pattern.
Querying a recursive function
While recursive searches are never effortless, the above syntax closely approximates how we would describe “what a multi-leg flight is” in natural languages. And here is how you could use such a function to get all the possible multi leg flights in your flight database from Milan to Santiago de Chile in 3 hops of less, excluding those which stop in London.
match
$milan isa city, has name "Milan";
$santiago isa city, has name "Santiago";
$london isa city, has name "London";
$legs in multi_leg_flight(start=$milan, end=$santiago, max_hops=3);
not { $leg in $legs; $leg links ($london); };
fetch {
"legs": [
match $leg = $legs[$i];
fetch {
"flight-number": $leg.flight_no;
"leg-number": $i;
}
]
}
That’s concise and readable!
Summary
Functions are fundamental part of the TypeDB’s functional database programming model, and they fit naturally into the pattern-based syntax of TypeQL. Functions allow users to write queries in a modular fashion, thereby minimize code repetition, and they allow expressing complex logic via recursive and branching calls.
What do you think about TypeDB’s functional approach? We are excited to hear your opinions and suggestions!