TypeDB Blog

The TypeDB 3.0 Roadmap

TypeDB 3.0 is a major milestone, delivering significant updates to the architecture, and capabilities based on our research and user feedback.

Dr. Christoph Dorn

May 24, 2024

TypeDB is the first database of its kind, replacing multi-layered tech stacks with a single, high-level, and declarative database programming model. TypeDB has been crafted to put developers’ focus back on user intent and business logic, with a query language that values conciseness and readability and can easily serve everyone from senior developer to data analyst.

The upcoming release of version 3.0 will further streamline and refine TypeDB’s database programming model, extending it’s type system with powerful new constructs, and drastically improving developer experience across the board. This transformation will be driven by three major changes:

Functions will replace rules as a more powerful, modular, and generalizable construct. They will firmly bridge reasoning and querying in our type system.
Structured value types and list types will extend our type system, becoming first-class citizens of TypeDB’s query language, TypeQL.
Query clauses will become typed and composable, which will enable users to build powerful data pipelines for handling complex workflows.

In today’s blog post, we will have a first look at these and other deep changes for the upcoming release of TypeDB 3.0.

A new “rusty” codebase

Before we get going, another fundamental transition is happening in the background: TypeDB 3.0 will be written in Rust, completely replacing the Java codebase and JRE. For us, Rust is not only a language built for safety and performance, but it is also a language that shares many of the principles of TypeDB. As such, the decision to transition to Rust was not a difficult one.

Rust is gaining popularity along many fronts, seeing increasing adoption, for example, in the development of the Linux kernel. This momentum means a lot of the tooling around the language has drastically improved over the past few years, making now a great time for our transition to Rust!

Roadmap overview

Now, let’s jump into version 3.0’s new features and learn a bit more about its streamlined Functional Database Programming model — we will provide useful links to more detailed explanations as we go along.

As a small disclaimer: when we say “3.0” we are not exclusively referring to the very first version of TypeDB (which will be released as 3.0.0), but rather to the larger vision and program of bringing TypeDB’s new functional/declarative/typed database programming model to life. We foresee the following rough order of events in this program:

The first versions of 3.0 will ship with functions for query logic, the new pipeline model for cosntructing queries, and refined constraint language for the underlying data model.
The next two core features, shipped after that in 3.0.x, will integrate structured and optional values.
Finally, in 3.1.x, we plan to our extend of our type system to include lists and list variables, which bring bring exciting opportunities of dynamically sized custom datatypes to TypeDB.

Let’s look at these components in detail (but not in order).

Functions for modularizing query logic

Functions can be thought of as pieces of query logic that can be called (even recursively) from within other queries. Importantly, such function calls can be nested, parallel, negated, and they naturally embed into TypeQL’s declarative pattern syntax. In practical terms, functions are described with a match pattern, from which the function may then return either a stream of results or a single computed result.

A quick example

As an example, let’s define a function in our schema for retrieving the average salary of employees at a given company.

define fun mean_salary($c: company) -> double:
  match
    (company: $c, employee: $_) isa employment, has salary $s;
  return mean($s);

Based on this function definition, here’s a simple query for retrieving employees who earn above their companies mean salary.

match 
  (company: $c, employee: $e) isa employment, has salary >= mean_salary($c);
fetch {
  "high-earning-employee": $e.name
};

Pre-3.0, even with the usage of rules, this question had to broken up into two separate queries — now it can be run in a single query, and all the logic is in one place!

Functions open up a new dimension of expressivity for TypeDB’s database model, which will allow users to drastically simplify code complexity. For further examples and insights about functions, check out the corresponding fundamentals article in our Functional Database Programming series.

Lists for data series and serialized results

TypeDB 3.0 will introduce lists, enabling users to effortlessly manage data series. Lists can be variabilized, and worked with directly from within TypeQL patterns. Moreover, streams can be collected into lists, which also allows users to reason over (ordered) query results from within their patterns.

Lists integrate tightly with TypeDB’s type system. A list may hold either values or concepts, but all its members must be of a single type T (we can call it a “T-list”).

At the level of schemas, lists integrate in two ways. Relations can relate lists of role players, and entities or relations can own lists of attributes. At the level of queries, lists may be declaratively described in query patterns, and streams of query results aggregated into single lists as described above. This mechanism is extremely powerful, as it allows us to access the (ordered) results of a query through a single variable $list in our pattern.

A quick example

Order is a ubiquitous feature in data: series of measurements, ledgers of records, paths through nodes, and compositions of relations are all common examples of ordered data. In TypeDB 3.0, we can leverage lists to natively work with such ordered data. As an example, let’s create a simple traveled_route type in our schema, comprising a number of stops:

define 
  traveled_route relates stop[], relates traveler;
  city plays traveled_route:stop;
  employee plays traveled_route:traveler;

We can then start recording the routes traveled by our employees as follows:

insert
  employee has name "Maria";
  city has name "London";
  city has name "Berlin";
  ...
  (stop[]: [$london, $berlin, ...], traveler: $maria) isa traveled_route;

Let’s try to find all of those routes which do not make a stop in Berlin, and visit London immediately after Paris. For those routes, we’ll list their cost, the traveling employee’s name, and the stop number of Paris on the route:

match
  $route isa traveled_route, links (stop[]: $stops);
  $stops[$stop_number] has name "Paris";
  $stops[$stop_number + 1] has name "London";
  not { $berlin in $stops, has name "Berlin"; };
  $route links (traveler: $employee);
fetch {
  $route: cost,
  "traveling employee": $employee.name,
  "Paris stop number": $stop_number,
}

If you wanted to dive deeper into lists, check out the corresponding fundamentals article in our Functional Database Programming series.

Structured value types for compound attributes

Real-world applications often require a plethora of differently structured values to work with, which means databases need to provide and maintain a large range of different value types. Structured values, or structs for short, are values composed from other existing values. They provide a general solution to the problem: they let users define precisely the structured values they need.

A quick example

A new type of structured values can be introduced at the schema-level, comprising a tuple of named fields. For example:

define
  struct dated_coordinate:
    longitude value double,
    latitude value double,
    date value datetime;

We can then use structs like any other value type. For example, let’s define a gps_history attribute, whose values will be dated coordinates as defined by our struct above:

define 
  attribute gps_history, value dated_coordinate;
  entity transit_item, owns gps_history[];

Values of structs can be accessed using constructor syntax: { field: val, ... }. For example, we could insert data into the GPS history list of an item as follows:

insert
  transit_item 
    has description "important letter",
    has gps_history[] [
      { longitude: 52.5200, latitude: 13.4050, date: 2024-05-27T23:07:51 },
      { longitude: 48.8575, latitude: 2.3514, date: 2024-05-28T02:37:04 },
      { longitude: 51.5072, latitude: 0.1276, date: 2024-05-28T06:02:31 }
    ];

The query inserts a list of three dated_coordinate values into the list type gps_history[] owned by a transit_item.

To see more example of structs check out the corresponding fundamentals article in our Functional Database Programming series.

Advanced constraint language

Constraints are a key part of any data model: they provide fine-grained ways of organizing and maintaining our data. Since TypeDB’s base data model is already highly expressive, constraints have so far played a mostly secondary role. But with the release of TypeDB 3.0 we will greatly expand the constraint system in TypeDB 3.0, providing a variety of advanced mechanisms for modeling with constraints.

Importantly, this addresses the long-standing pain point of automatically cleaning up “dangling” attributes (i.e. attributes not owned by any entity and or relation)! This clean-up will happen by default in TypeDB 3.0, but the old behaviour, which retains such attributes, can be recovered with a new annotation @independent as we will now see.

A quick example

The following schema showcases several new constraint annotations:

define
  entity employee,
    owns office_number @card(1..1), 
    plays print_permission:user;
  entity printer,
    plays print_permission:device;
  attribute office_number @independent, value string;
  relation print_permission @cascade, relates user, relates device;

Let’s go through the constraints in order, and see what they do:

The cardinality constraint @card(n..m) indicates that a type must have between n and m elements. For our example, the annotation @card(1..1) above means each employee will have exactly one office_number — this overwrites the TypeDB default of @card(0..1) for owned attribute types.
The @independent annotation on office_number indicates that office numbers are an independent attribute: even if no employee has a certain office number, we want to keep that number in our database (even empty offices are offices!) — this overwrites the default behavior of attributes without owners being cleaned up automatically.
Finally, the @cascade annotation indicates that if a deletion causes relations to have insufficient role players (the default is @card(1..1) for each role), then these relations should be cleaned up rather than blocking the deletion. In our case, this ensures that print_permissions are cleaned up automatically when we remove either the device or the user from our system.

To illustrate the last point, let’s run the following:

match $larry isa employee, has name "Lazy Larry";
delete $larry;

This query would automatically remove all print permissions linked to Larry. It is equivalent to the more verbose query:

match 
  $larry isa employee, has name "Lazy Larry";
  $permission isa print_permission, links (user: $larry);
delete 
  $larry;
  $permission;

To learn more about constraints in TypeDB 3.0 check out the corresponding fundamentals article in our Functional Database Programming series.

Pipelines for composable querying

In TypeDB 3.0, we’ve re-thought how queries fit into our overarching type system from first principles.

You’ve already met a related case above: functions are typed and may return “T-streams” of results. Queries, similarly, should return streams of results. A key difference is the format of these results: functions only return streams of positional tuples (t1: T1, t2: T2, ...) (where the ts are elements of the Ts). In contrast, in a query, results are not positional, but they are associated to variable names. That is, the output type of a query is a stream of maps ($x1 -> t1: T1, $x2 -> t2: T2, ...) where the $xs are named variables.

To summarize, while functions return tuple streams (which, as types, may look like {person, name, bool}), queries operate on map streams (which, as types, may look like {$x -> person, $n -> name, $b -> bool}). Based on this systematic perspective, data queries can now become composable just like functions!

A quick example

To illustrate how this all works, let’s have a look at the following query pipeline.

with fun costliest_printer($employee: employee) -> printer:
  match 
    ($printer, $employee) isa print_permission;
    $printer has cost_per_page $cost;
  sort $cost desc;
  return first $printer;
match
  $printer isa printer, has office_number $n, has newly_installed true;
  $employee isa employee, has office_number $n;
put ($employee, $printer) isa print_permission;
update $printer has newly_installed false;
match 
  let $high_cost_printer = costliest_printer($employee);
  $high_cost_printer has printer_name $name;
  not { $printer is $high_cost_printer; };
  $employee has contact $address;
insert 
  $notice isa queued_email, has recipient $address, 
  has content "Do you still need the printer " + $name + "?";

Step 1: `with` preamble

The pipeline begins with a with clause: this is itself not a part of the query pipeline, but rather a preamble that can be used to define any auxiliary functions. For our example, we are defining a function costliest_printer which returns the costliest printer that a given employee can print on.

Step 2: `match` clause

The first query clause in our pipeline is the subsequent match clause. It checks for any newly installed printers, and employee in the same office as the newly installed printer

Step 3: `put` clause

This is followed by a put clause, which adds a permission for each employee in the same office as the newly installed printers to print on that printer.

The newly introduced put query clause provides powerful functionality here. In a nutshell, the query clause put <statements> works as follows. First, we run match <statements> (interpreting <statements> as a pattern to be matched). If we find any results, the pipeline continues as if put were a match. However, if no results were found, then we run insert <statements> before continuing with our pipeline.

Step 4: `update` clause

The query then sets the “newly installed” status of these printers to false, updating the (potentially existing) previous status. This is achieved through our new update clause, and works for owned an attribute types with upper cardinality constraint 1: using update we can perform an insert which, if we exceed the specified cardinality, instead updates the existing attribute!

Step 5: `match` number #2

For all employees that have just gotten (or already had) access to a newly installed printer, the query checks whether the costliest printer among all printers they currently have access to is different from the newly installed one.

Step 6: `insert` number #2

Finally, we add an email to our email queue inquiring whether employees still need access to the costliest printer.

The example illustrates how pipelines will enable a powerful compositional interaction-style with TypeDB: our example gives a concise high-level representation of a complex database workflow!

To learn more about query pipelines check out the corresponding fundamentals article in our Functional Database Programming series.

Flexible patterns via optional variables

Finally, TypeDB’s declarative patterns will be even more flexible by introducing the novel concept of optional variables (closely mirroring the functional programming concept of option types). A optional variable need not appear in all the solution to a pattern.

This means, results in the output stream of a query may not all have the same number of variables. To efficiently and intuitively handle this optionality in streams, TypeQL provides the keywords or (giving several options for a pattern) and the new keyword try (extending a pattern only if possible).

A quick example

As a quick example, let’s think about the following query:

match
  $p isa printer;
  { $p has system_status $status; } or { $p has logging_status false; };
  try { let $open_jobs = get_queue($p); };
insert
  try { (printer: $p, queue[]: $open_jobs) isa job_queue; };
  try { 
    $job_log (printer: $p) isa log, 
      has status $status, 
      has job_length length($open_jobs);
  };

Let’s focus on the match clause first. The pattern will retrieve either printers which have a set status $status, or just printers if they have their logging status set to false. The variable $status is thus optional: it may not appear in all the answers! Similar, the (list) variable $open_jobs is optional as it is wrapped in a try clause: the variable will only be assigned if there are any jobs.

In the subsequent insert clause (and similarly for delete clauses), we must account for optional variables by wrapping any usage of them in a try block. For our example. We insert a new job_queue only if the variable $open_jobs has been set. Similarly, we only record a new log if both the $status variable and the $open_jobs variables have been set.

To learn more about optionality, check out the corresponding fundamentals article in our Functional Database Programming series.

Developing with TypeDB 3.0

Let’s wrap up this blog post with a few general comments about how 3.0 will change the “feel” of TypeDB for developers.

Refined design principles

For 3.0 we made many of the underlying design principles of TypeQL sharp. This, for example, includes a unified treatment of variables (no more distinction of $x vs. ?x!), a consistent keyword language (you may have noticed the new keyword links in earlier example!), and a rigorous theory of variable bindings in the setting of declarative querying. If you want to learn more about the deeper principles, check out the our fundamentals article on the topic.

Transaction model changes

As a practical change, we will simplify our transaction model in TypeDB 3.0. Instead of the previous four combinations of schema-read, schema-write, data-read, data-write there will now be three transaction types and no sessions at all:

schema for defining the schema of your database.
write for writing data instances to your database.
read for read-only access to your database.

Driver API changes

We are in the process of finalizing changes to our Driver API. You can expect these changes to reflect the compositional structure of query pipelines which we discussed above, which will yield a simple but extremely general API structure.

Simultaneously, we will slowly transition away from maintaining our specialized Concept API. As a result, define queries will have become more powerful: they will allow users to rename types and move them within the existing inheritance hierarchy. We haven’t talked about this much here, but it will be a (relatively speaking) minor change.

Key takeaways

To summarize, TypeDB 3.0 will bring several game-changing innovations:

Modular querying via functions, as part of a refined functional database programming model.
Handling data series and serialized query results via lists.
Defining custom value data structures via structs.
Composing queries into pipelines, enabling user to build complex workflows.
Allowing patterns to become more flexible through the introduction of optional variables.
Making the language more consistent and principled, which will also be reflected in the Driver API.

We will keep this blog post updated as we go along, and very much look forward to your feedback!

Services

Platform

Tools

Learn

Community

Learn

Content

TypeDB Blog

The TypeDB 3.0 Roadmap

A new “rusty” codebase

Roadmap overview

Functions for modularizing query logic

A quick example

Lists for data series and serialized results

A quick example

Structured value types for compound attributes

A quick example

Advanced constraint language

A quick example

Pipelines for composable querying

A quick example

Step 1: with preamble

Step 2: match clause

Step 3: put clause

Step 4: update clause

Step 5: match number #2

Step 6: insert number #2

Flexible patterns via optional variables

A quick example

Developing with TypeDB 3.0

Refined design principles

Transaction model changes

Driver API changes

Key takeaways

Share this article

TypeDB Newsletter

Further Learning

Step 1: `with` preamble

Step 2: `match` clause

Step 3: `put` clause

Step 4: `update` clause

Step 5: `match` number #2

Step 6: `insert` number #2