Officially out now: The TypeDB 3.0 Roadmap

TypeDB Blog

The TypeDB 3.0 Roadmap

The upcoming release of version 3.0 will represent a major milestone for TypeDB: it will bring about fundamental improvements to the architecture and feel, and incorporate pivotal insights from our research and user feedback.

Dr. Christoph Dorn

TypeDB is the first database of its kind, replacing multi-layered tech stacks with a single, high-level, and declarative database programming model, which directly focuses on user intent and business logic.

The upcoming release of version 3.0 will further streamline and refine TypeDB’s innovative database programming model, extending TypeDB’s type system with powerful new constructs, and drastically improving developer experience across the board. This transformation will be driven by three major changes:

  1. Functions will replace rules as a more powerful, modular, and generalizable construct. They will firmly bridge reasoning and querying in our type system.
  2. Structured value types and list types will extend our type system, becoming first-class citizens of TypeDB’s query language, TypeQL.
  3. Query clauses will become typed and composable, which will enable users to build powerful data pipelines for handling complex workflows.

In today’s blog post, we will have a first look at these and other deep changes for the upcoming release of TypeDB 3.0.

A new “rusty” codebase

Before we get going, another fundamental transition is happening in the background: TypeDB 3.0 will be written in Rust, completely replacing the Java codebase and JRE. For us, Rust is not only a language built for safety and performance, but it is also a language that shares many of the principles of TypeDB. As such, the decision to transition to Rust was not a difficult one.

Rust is gaining popularity along many fronts, seeing increasing adoption, for example, in the development of the Linux kernel. This momentum means a lot of the tooling around the language has drastically improved over the past few years, making now a great time for our transition to Rust!

Roadmap overview

Now, let’s jump into version 3.0’s new features and learn about its streamlined Functional Database Programming model — we will provide useful links to more detailed explanations as we go along.

Functions for modular querying

We’ve already mentioned functions. In a nutshell, functions can be thought of as query templates that can be called from within other queries. Importantly, such function calls can be nested, recursive, negated, and they naturally embed into TypeQL’s declarative pattern syntax. In practical terms, functions are described with a match pattern, from which the function may then return either a stream of results or a single computed result.

A quick example

As an example, let’s define a function in our schema for retrieving the average salary of employees at a given company.

define fun median_salary($c: company) -> double? :
    (company: $c, employee: $_) isa employment, has salary $s;
  return median($s); 

Based on this function definition, here’s a simple query for retrieving employees who earn above their companies median salary.

  (company: $c, employee: $e) isa employment, has salary >= median_salary($c);
  $e as "high-earning-employee": name, DOB;

Pre-3.0, even with the usage of rules, this question had to broken up into two separate queries — now it can be run in a single query, and all the logic is in one place!

Functions open up a new dimension of expressivity for TypeDB’s database model, which will allow users to drastically simplify code complexity. For further examples and insights about functions, check out the corresponding fundamentals article in our Functional Database Programming series.

Lists for data series and serialized results

TypeDB 3.0 will introduce lists, enabling users to effortlessly manage data series. Lists can be variabilized, and worked with directly from within TypeQL patterns. Moreover, streams can be collected into lists, which also allows users to reason over (ordered) query results from within their patterns.

Lists integrate tightly with TypeDB’s type system. A list may hold either values or concepts, but all its members must be of a single type T (we can call it a “T-list”).

At the level of schemas, lists integrate in two ways. Relations can relate lists of role players, and entities or relations can own lists of attributes. At the level of queries, lists may be declaratively described in query patterns, and streams of query results aggregated into single lists as described above. This mechanism is extremely powerful, as it allows us to access the (ordered) results of a query through a single variable $list in our pattern.

A quick example

Order is a ubiquitous feature in data: series of measurements, ledgers of records, paths through nodes, and compositions of relations are all common examples of ordered data. In TypeDB 3.0, we can leverage lists to natively work with such ordered data. As an example, let’s create a simple traveled_route type in our schema, comprising a number of stops:

  traveled_route relates stop[], relates traveler;
  city plays traveled_route:stop;
  employee plays traveled_route:traveler; 

We can then start recording the routes traveled by our employees as follows:

  $maria isa employee, has name "Maria";
  $london isa city, has name "London";
  $berlin isa city, has name "Berlin";
  (stop[]: [$london, $berlin, ...], traveler: $maria) isa traveled_route;

Let’s try to find all of those routes which do not make a stop in Berlin, and visit London immediately after Paris. For those routes, we’ll list their cost, the traveling employee’s name, and the stop number of Paris on the route:

  $route isa traveled_route, links (stop[]: $stops);
  $stops[$stop_number] has name "Paris";
  $stops[$stop_number + 1] has name "London";
  not { $berlin in $stops, has name "Berlin"; };
  $route links (traveler: $employee);
  $route: cost;
  $employee as "traveling employee": name;
  $stop_number as "Paris stop number";

If you wanted to dive deeper into lists, check out the corresponding fundamentals article in our Functional Database Programming series.

Structured value types for compound attributes

Real-world applications often require a plethora of differently structured values to work with, which means databases need to provide and maintain a large range of different value types. Structured values, or structs for short, are values composed from other existing values. They provide a general solution to the problem: they let users define precisely the structured values they need.

A quick example

A new type of structured values can be introduced at the schema-level, comprising a tuple of named fields. For example:

  struct dated_coordinate:
    longitude value double,
    latitude value double,
    date value datetime;

We can then use structs like any other value type. For example, let’s define a gps_history attribute, whose values will be dated coordinates as defined by our struct above:

  gps_history sub attribute, value dated_coordinate;
  transit_item sub entity, owns gps_history[];

Values of structs can be accessed using constructor syntax: { field: val, ... }. For example, we could insert data into the GPS history list of an item as follows:

  $item isa transit_item, 
    has description "important letter",
    has gps_history[] [
      { longitude: 52.5200, latitude: 13.4050, date: 2024-05-27T23:07:51 },
      { longitude: 48.8575, latitude: 2.3514, date: 2024-05-28T02:37:04 },
      { longitude: 51.5072, latitude: 0.1276, date: 2024-05-28T06:02:31 }

The query inserts a list of three dated_coordinate values into the list type gps_history[] owned by a transit_item.

To see more example of structs check out the corresponding fundamentals article in our Functional Database Programming series.

Advanced constraint language

Constraints are a key part of any data model: they provide fine-grained ways of organizing and maintaining our data. Since TypeDB’s base data model is already highly expressive, constraints have so far played a mostly secondary role. But with the release of TypeDB 3.0 we will greatly expand the constraint system in TypeDB 3.0, providing a variety of advanced mechanisms for modeling with constraints.

A quick example

The following schema showcases several new constraint annotations:

  employee sub entity,
    owns office_number @card(1, 1), 
    plays print_permission:user;
  printer sub entity,
    plays print_permission:device;
  office_number sub attribute @independent, value string;
  print_permission sub relation @cascade, relates user, relates device;

Let’s go through the constraints in order, and see what they do:

  • The cardinality constraint @card(n, m) indicates that a type must have between n and m elements. For our example, the annotation @card(1, 1) above means each employee will have exactly one office_number — this overwrites the TypeDB default of @card(0, 1) for owned attribute types.
  • The @independent annotation on office_number indicates that office numbers are an independent attribute: even if no employee has a certain office number, we want to keep that number in our database (even empty offices are offices!) — this overwrites the default behavior of attributes without owners being cleaned up automatically.
  • Finally, the @cascade annotation indicates that if a deletion causes relations to have insufficient role players (the default is @card(1, 1) for each role), then these relations should be cleaned up rather than blocking the deletion. In our case, this ensures that print_permissions are cleaned up automatically when we remove either the device or the user from our system.

To illustrate the last point, let’s run the following:

match $larry isa employee, has name "Lazy Larry";
delete $larry isa employee;

This query would automatically remove all print permissions linked to Larry. It is equivalent to the more verbose query:

  $larry isa employee, has name "Lazy Larry";
  $permission isa print_permission, links (user: $larry);
  $larry isa employee;
  $permission isa print_permission;

To learn more about constraints in TypeDB 3.0 check out the corresponding fundamentals article in our Functional Database Programming series.

Pipelines for composable querying

In TypeDB 3.0, we’ve re-thought how queries fit into our overarching type system from first principles.

You’ve already met a related case above: functions are typed and may return “T-streams” of results. Queries, similarly, should return streams of results. A key difference is the format of these results: functions only return streams of positional tuples (t1: T1, t2: T2, ...) (where the ts are elements of the Ts). In contrast, in a query, results are not positional, but they are associated to variable names. That is, the output type of a query is a stream of maps ($x1 -> t1: T1, $x2 -> t2: T2, ...) where the $xs are named variables.

To summarize, while functions return tuple streams (which, as types, may look like {person, name, bool}), queries operate on map streams (which, as types, may look like {$x -> person, $n -> name, $b -> bool}). Based on this systematic perspective, data queries can now become composable just like functions!

A quick example

To illustrate how this all works, let’s have a look at the following query pipeline.

with fun costliest_printer($employee: employee) -> printer? :
    ($printer, $employee) isa print_permission;
    $printer has cost_per_page $cost;
  sort $cost desc;
  return first($printer);
  $printer isa printer, has office_number $n, has newly_installed true;
  $employee isa employee, has office_number $n;
put ($employee, $printer) isa print_permission;
insert $printer has newly_installed false @replace;
  $high_cost_printer = costliest_printer($employee), has printer_name $name;
  not { $printer is $high_cost_printer; };
  $employee has contact $address;
  $notice isa queued_email, has recipient $address, 
  has content "Do you still need the printer " + $name + "?";
Step 1: with preamble

The pipeline begins with a with clause: this is itself not a part of the query pipeline, but rather a preamble that can be used to define any auxiliary functions. For our example, we are defining a function costliest_printer which returns the costliest printer that a given employee can print on.

Step 2: match clause

The first query clause in our pipeline is the subsequent match clause. It checks for any newly installed printers, and employee in the same office as the newly installed printer

Step 3: put clause

This is followed by a put clause, which adds a permission for each employee in the same office as the newly installed printers to print on that printer.

The newly introduced put query clause provides powerful functionality here. In a nutshell, the query clause put <statements> works as follows. First, we run match <statements> (interpreting <statements> as a pattern to be matched). If we find any results, the pipeline continues as if put were a match. However, if no results were found, then we run insert <statements> before continuing with our pipeline.

Step 4: insert with @replace

The query then sets the “newly installed” status of these printers to false, replacing the (potentially existing) previous status. This is achieved through our new @replace annotation, and works for owned an attribute types with upper cardinality constraint 1: using @replace we indicate that if an insert were to exceed the specified cardinality, we instead replace the existing attribute!

Step 5: match number #2

For all employees that have just gotten (or already had) access to a newly installed printer, the query checks whether the costliest printer among all printers they currently have access to is different from the newly installed one.

Step 6: insert number #2

Finally, we add an email to our email queue inquiring whether employees still need access to the costliest printer.

The example illustrates how pipelines will enable a powerful compositional interaction-style with TypeDB: our example gives a concise high-level representation of a complex database workflow!

To learn more about query pipelines check out the corresponding fundamentals article in our Functional Database Programming series.

Flexible patterns via optional variables

Finally, TypeDB’s declarative patterns will be even more flexible by introducing the novel concept of optional variables (closely mirroring the functional programming concept of option types). A optional variable need not appear in all the solution to a pattern.

This means, results in the output stream of a query may not all have the same number of variables. To efficiently and intuitively handle this optionality in streams, TypeQL provides the keywords or (giving several options for a pattern) and the new keyword try (extending a pattern only if possible).

A quick example

As a quick example, let’s think about the following query:

  $p isa printer;
  { $p has system_status $status; } or { $p has logging_status false; };
  try { $open_jobs = get_queue($printer); };
  try { (printer: $p, queue[]: $open_jobs) isa job_queue; };
  try { 
    $job_log (printer: $p) isa log, 
      has status $status, 
      has job_length length($open_jobs); };

Let’s focus on the match clause first. The pattern will retrieve either printers which have a set status $status, or just printers if they have their logging status set to false. The variable $status is thus optional: it may not appear in all the answers! Similar, the (list) variable $open_jobs is optional as it is wrapped in a try clause: the variable will only be assigned if there are any jobs.

In the subsequent insert clause (and similarly for delete clauses), we must account for optional variables by wrapping any usage of them in a try block. For our example. We insert a new job_queue only if the variable $open_jobs has been set. Similarly, we only record a new log if both the $status variable and the $open_jobs variables have been set.

To learn more about optionality, check out the corresponding fundamentals article in our Functional Database Programming series.

Developing with TypeDB 3.0

Let’s wrap up this blog post with a few general comments about how 3.0 will change the “feel” of TypeDB for developers.

Refined design principles

For 3.0 we made many of the underlying design principles of TypeQL sharp. This, for example, includes a unified treatment of variables (no more distinction of $x vs. ?x!), a consistent keyword language (you may have noticed the new keyword links in earlier example!), and a rigorous theory of variable bindings in the setting of declarative querying. If you want to learn more about the deeper principles, check out the our fundamentals article on the topic.

Transaction model changes

As a practical change, we will simplify our transaction model in TypeDB 3.0. Instead of the previous four combinations of schema-read, schema-write, data-read, data-write there will now be three transaction types:

  1. schema for defining the schema of your database.
  2. write for writing data instances to your database.
  3. read for read-only access to your database.

Driver API changes

We are in the process of finalizing changes to our Driver API. You can expect these changes to reflect the compositional structure of query pipelines which we discussed above, which will yield a simple but extremely general API structure.

Simultaneously, we will slowly transition away from maintaining our specialized Concept API. As a result, define queries will have become more powerful: they will allow users to rename types and move them within the existing inheritance hierarchy. We haven’t talked about this much here, but it will be a (relatively speaking) minor change.

Key takeaways

To summarize, TypeDB 3.0 will bring several game-changing innovations:

  • Modular querying via functions, as part of a refined functional database programming model.
  • Handling data series and serialized query results via lists.
  • Defining custom value data structures via structs.
  • Composing queries into pipelines, enabling user to build complex workflows.
  • Allowing patterns to become more flexible through the introduction of optional variables.
  • Making the language more consistent and principled, which will also be reflected in the Driver API.

We will keep this blog post updated as we go along, and very much look forward to your feedback!

Share this article

TypeDB Newsletter

Stay up to date with the latest TypeDB announcements and events.

Subscribe to Newsletter

Further Learning

Functions (3.0 Preview)

Functions provide powerful abstractions of query logic, which can be nested, recursed, or negated, and they natively embed into TypeQL's declarative patterns.

Read article

The PERA Model

Learn about the individual components of the Polymorphic Entity-Relation-Attribute model, and how these components can be efficiently and naturally accessed through TypeQL.

Read article

TypeDB's Core Features

Explore TypeDB's core features in detail, including validated polymorphic data models and declarative querying, and learn about their impact on database engineering.

Read article