Officially out now: The TypeDB 3.0 Roadmap

TypeDB Fundamentals

Modular database programming using functions



This article is part of our TypeDB 3.0 preview series. Sign up to our newsletter to stay up-to-date with future updates and webinars on the topic!

In TypeDB 3.0, functions provide powerful abstractions of query logic, and a cornerstone of the functional database programming model. Functions calls can be nested, recursive, negated, and they natively embed into TypeQL’s declarative patterns.

At a glance: syntax essentials

Functions will bring several syntactic novelties to TypeQL, including:

  • The keyword fun for defining functions. Each function has a type signature (input) -> output, where input is a list of (typed) variables and output a valid return type.
  • A new “stream” type {T1, T2, ..} describing streams of (potentially partial) tuples (t1, t2, ...) of type t1 : T1, t2 : T2, ....
  • return statements for specifying the data returned by a function.
  • An in statement for accessing elements in the return stream of a function.

Defining functions at query-level

Functions can be defined directly at the level of queries using a with clause (we can have one or more such clauses, each defining a single functions, but they must all precede the query pipeline). Let’s define a simple function to aid us with a query for finding employees of a given company and country that earn more than their company’s average salary in that country.

with fun median_salary($c: company, $n: nationality) -> double?: 
  match 
    (employer: $c, employee: $emp) isa employment, has salary $sal;
    $emp has nationality $n;
  return median($sal);
match
  $e_corp isa company, has name "E-Corp";
  $e isa person, has nationality $nation;
  $median_sal = median_salary(country=$nation, company=$e_corp);
  (employer: $c, employee: $e) isa employment, has salary >= $median_sal;
fetch $e as "above-median-earner": name, birthday;

Function definition

Looking at the above query, note the input of the function is a list of typed variables ($c: company, $n: country). In general, the types could actually be omitted (and we plan to implemented this in the future) in which case TypeDB’s type inference engine could be used to infer types of input from the body of the patterns. This would drastically increases our users freedom in creating functions.

Next, note that the return statement calls upon the stream reducing function median, which computes the average of a given variable’s values across the entire stream. This means the function will return a either a single double value or be undefined (if the stream is empty or there are zero occurrences of the variable in the stream, which can happen for optional variables). The return type is defined accordingly as double?: an optional boolean.

Function call

Next, in the match clause of our pipeline, we call the function as median_salary(country=$nation, company=$e_corp). This uses named-argument call syntax. As an alternative, a positional call syntax, à la median_salary($e_corp, $nation), would also be possible. The returned data of the function is assigned to a variable $median_sal, which can then be used as usual in patterns.

Our example returns a single result; in our case this is a single double?, but in general it could be a tuple of results (e.g. with return type double?, int, name[]). We speak of single-return function, which is the first of two case of function return types. The second case is that of functions returning streams; we speak of stream-return functions. We will see an example of the latter case shortly.

Failure on empty return

Importantly, if no average salary for a company is returned by our median_salary function (this can happen since the return is an optional boolean) then our pattern will fail: indeed, no results that can be assigned to the variable $median_sal, which is not an optional variable in our pattern.

To avoid such failure, we can use the anonymous variable $_. For example, when a function returns tuple mixing optional and non-optional types (e.g., median_salary_and_employee_count(...) -> double?, int), we can indicate to TypeDB that we don’t need and care about the optional return by calling the function as $_, $count = median_salary_and_employee_count(...). In this way, even if there are no employees (so the count is 0), the pattern will match the left-hand side $_, $count of the assignment.

Where and where not to call a function

We remark that functions can only be called in the following three situations:

  1. As part of a query pattern in a match clause (so that’s the case above!),
  2. As part of a condition to-be-checked in an assert clause,
  3. As part of the constructing the final JSON output in a fetch clause (see our pipeline fundamentals on this and the previous point).

In particular, functions can not be called in query clauses that have an effect in the database itself (such as insert, delete, and put). This is because the execution of the function and the effect that these statements have on the database may interfere with one another!

Defining functions at schema-level

It is often the case that there are fundamental “features” of our data that can be programmatically extracted. In this case, it make sense to define functions that do exactly that, and make these definition part of the schema for our data model. This is achieved using a usual define query in a schema transaction. Let’s see an example!

define 
  customer sub entity, owns age, owns license_type;
  age sub attribute, value int;
  license_type sub attribute, value string;
  vehicle_type sub attribute, value string;
  ...

  fun can_rent($customer: customer, $vehicle: vehicle) -> bool:
    match 
      $customer has age >= 25, has license_type $t;
      $vehicle has vehicle_type $t;
    return check;

The function defined above, again, is a single-returning function—the return type is bool. Like median, the call to check performs a stream reduction: in fact, this is the simplest kind of reduction as it only “checks whether a stream is non-empty”.

It’s also worth highlighting how the vehicle_type attribute $t is matched to a license_type attribute—an implicit to-string conversion is happening in the background. (Pre-3.0, we’d have had to write $c has license_type $t1; $v has vehicle_type $t2; $t1 == $t2; instead!)

Calling a schema-level function

Let’s see how to use the above schema-level function in action, with the following hypothetical query:

match
  $request(requester: $c, requested: $v) isa rental_request;
  can_rent(customer= $c, vehicle= $v) == true;
insert
  $request has status "approved" @replace;

It’s easy to read the query and it does the obvious: it searches through car rental requests, where the car requested is of a type that the requester is in fact allowed to rent, and then marks the request as approved.

The match clause

In the match clause, we see our schema-defined function in action. Our function appears as the left-hand side of a equality comparison == whose right-hand side is a boolean, so the types check out! We believe this should not need any further explanation.

Updating the DB state

In the final insert clauses we then update the state of our database with the matches we found. We will revisit the usage of the @replace annotation here in a the section on constraint annotations!

From rules to functions

In a way, the previous two examples nicely illustrated that functions can elegantly replace (pre-3.0) rules. But let us say this more explicitly: functions replace rules! Let’s see another example of this. This time, we will finally consider a stream-returning function.

The rule way (“the old way”)

In TypeQL 2.x, we may have written a function of the following form.

define rule extend_surnames_by_marriage:
  when {
    $x has surname $name;
    ($x, $y) isa marriage;
  } then { 
    $y has surname $name; 
  }

When inference is “switched on”, this rule ensures that, when querying for surnames of a person we automatically consider the surnames of their (potential) spouses as well. If we don’t want to consider these extensions of the type of surnames, we can leave inference “switched off”.

The function way (“this is the way”)

With functions we can easily achieve the same. But there no longer is an “inference switch”, and so we will have to introduce an additional concept (let’s call it “extended surnames”) to distinguish between the version of surnames with or without inference switched on.

define fun extended_surname($x: person) -> {surname}: 
  match
    { $x has surname $name }
    or
    { ($x, $y) isa marriage; $y has surname $name; };
  return { $name } ;

The return of this function is a stream of surnames, indicated by writing {surname}. (In general, we may have streams of tuples as explained earlier, e.g., {name, birthday, bool}).

In TypeDB 3.0, there is no need for an inference toggle. Depending on whether we may want inference to be “toggled” on or off, we simply write our queries accordingly as follows:

# inference "off"
match $x has surname $n;

# inference "on"
match $n in extended_surname($x);

In the second case, when using our stream-returning function extended_surname, note that we use the keyword in for unwinding the elements of a stream. The resulting pattern expresses “$name is a member of the stream extended_surname($x)”. Simple enough!

Differentiating single-return and stream-return functions

Note the syntactic difference between single-return and stream-return functions in the above: extended_surname returns a stream of results and we use the keyword in to match a variable with a result from that stream.

The keyword also works when function return streams of tuples. In this case we would write, e.g., $nam, $rel in names_and_relatives($person). The corresponding return statement would look like return { $x, $y };.

In contrast, for single-return functions, we always use the assignment operation = instead of in. This similarly works when the function returns tuples of results, e.g., we could have $median, $deviation = salary_med_and_dev($ecorp). The corresponding return statement would look like return median($x), std($x); in this case.

Next, we look at a more advanced use case of functions: recursion.

Recursion with function

TypeDB is crafted for handling complex, interconnected data. Frequently, this involves querying for arbitrarily sized connections, which requires (on way or the other) the recursive construction of results. This naturally fits into the functional database paradigm.

Let’s see a simple example of how to construct flight paths for multi-leg flights. Assuming appropriate types to be defined in our schema, we define a schema-level function as follows:

define function multi_leg_flight($start: city, $end: city, $max_hops: int) -> {flight[]}:
  match 
    $first_leg isa flight, links (dept: $start, dest: $dest);
    { $dest == $end; $path = [ $first_leg ]; } or
    { $dest != $end; $path = [ $first_leg ] + $other_legs;
      $other_legs in multi_leg_flight($dest, $end, $max_hops - 1); };
    $max_hops >= 1;
  return { $path };

Note that this function is stream-returning: it returns the streams of “flight paths” which we model as lists of flights, flight[]. The paths will run between the cities $start and $end with a given (positive) maximum number of $max_hops hops.

Constructing flight paths

Let’s look at the function’s body. Besides matching a $first_leg flight from the $start city to some destination $dest, the function has two or-cases.

  1. First, we consider the case the $dest is in fact already our $end destination in which case we have found a (single-leg) $path from $start to $end as requested.
  2. In the second case, $dest is not our $end destination. In this case, our $path is the concatenation of the $first_leg and the $other_legs, where we recursively search for the $other_legs to cover the remaining journey.

It’s worth emphasizing that our syntax here is declarative; the order of statements does not mattern in the function body. For example, we could have placed the pattern constraint $max_hops >= 1; at the beginning of the pattern.

Querying a recursive function

While recursive searches are never effortless, the above syntax closely approximates how we would describe “what a multi-leg flight is” in natural languages. And here is how you could use such a function to get all the possible multi leg flights in your flight database from Milan to Santiago de Chile in 3 hops of less, excluding those which stop in London.

match
  $milan isa city, has name "Milan";
  $santiago isa city, has name "Santiago";
  $london isa city, has name "London";
  $legs in multi_leg_flight(start=$milan, end=$santiago, max_hops=3);
  not { $leg in $legs; $leg links ($london); };
fetch $legs;

That’s concise and readable!

Summary

Functions are fundamental part of the TypeDB’s functional database programming model, and they fit naturally into the pattern-based syntax of TypeQL. Functions allow users to write queries in a modular fashion, thereby minimize code repetition, and they allow expressing complex logic via recursive and branching calls.

What do you think about TypeDB’s functional approach? We are excited to hear your opinions and suggestions!

Share this article

TypeDB Newsletter

Stay up to date with the latest TypeDB announcements and events.

Subscribe to Newsletter

Further Learning

The TypeDB 3.0 Roadmap

The upcoming release of version 3.0 will represent a major milestone for TypeDB: it will bring about fundamental improvements to the architecture and feel, and incorporate pivotal insights from our research and user feedback.

Read article

The Theory of TypeQL

TypeQL is directly built on the principles of modern programming language theory, which distill how we intuitively interact with computers using simple, yet formal, structures: types.

Read article

The PERA Model

Learn about the individual components of the Polymorphic Entity-Relation-Attribute model, and how these components can be efficiently and naturally accessed through TypeQL.

Read article

Feedback