Officially out now: The TypeDB 3.0 Roadmap

Lesson 1: Introduction to TypeDB

The fast advances of modern, high-level declarative programming models in parallel with the rise of distributed and multithreaded systems make it harder and harder to continue piling layers of abstractions on the relational model. TypeDB tackles this issue by re-thinking database systems from first principles, building on a new declarative language paradigm, a polymorphic data model, a powerful logic engine, and a new query execution model.

Complexity will only increase in the years ahead, essentially requiring new declarative programming models focused on intent, the user, and business logic.
— Amin Vahdat (VP/GM at Google)
Coming of age in the fifth epoch of distributed computing, 2024

Modern query language

While SQL is a declarative language, it requires users to write their query declarations in terms of certain low-level operations to manipulate tables. For example, consider the following query, which queries for folder locations of media files, that could be either image or video files, and returns the folder’s name and the (image or video) formats of the files in the folder.

SELECT folder.name AS folder_name,
       media.image_format AS file_format
FROM file_locations loc
    INNER JOIN folders folder ON folder.ID=loc.location_id
    INNER JOIN image_files media ON media.ID=loc.file_id
UNION
SELECT folder.name AS folder_name,
       media.video_format AS file_format
FROM file_locations loc
    INNER JOIN folders folder ON folder.ID=loc.location_id
    INNER JOIN video_files media ON media.ID=loc.file_id

The query uses a sequence of JOIN and UNION operations that are to be performed on our tables. While the above query is still pretty simplistic, the "operation-by-operation" coding style can lead to lengthy, repetitive, and hard-to-maintain queries, which allows subtle errors to propagate in unexpected and undeclared ways. This operation-focused approach to writing queries (which is also found across many other database languages) does not focus on the intent of the query.

TypeDB takes a different approach: it focuses on intent at every step of the query. A query’s intent, in particular, includes the types of data it queries, the relations between data, and literal attributes of data. In TypeQL, TypeDB’s query language, the above SQL query could take the following form.

match
$folder isa folder, has folder_name $name;
(location: $folder, file: $media) isa file_location;
{ $ext isa image_format; } or { $ext isa video_format; };
$media isa media_file, has $ext;
fetch
$name as folder_name;
$ext as file_format;

Notice how the TypeQL query captures the intent of our earlier SQL query simply by stating each variable’s intended function. No sequences of operations are needed: in fact, the statements in the match clause can be given in any order. Under the hood, this high-level, fully declarative approach of TypeQL’s language is powered by a robust type system and a type-inference engine. This is why we also refer to TypeQL as a type-theoretic query language!

The design of TypeQL is based on four main pillars, which we’ll learn about in this course. They closely reflect fundamental advances in modern high-level programming models over the past decades.

  • Declarative: TypeQL allows users to directly express the intent of their query, without the need to specify sequences of operations.

  • Safe: Users work within the type system of TypeDB, which alerts users about subtle errors before they happen, and ensures the high-level semantic integrity of their data at all times.

  • Expressive: TypeDB’s type system is flexible and adaptable. Polymorphism is a native feature of TypeDB’s type system and so are logical rules, which allow us to untangle otherwise complex database tasks into a handful of simple declarations.

  • Maintainable: Together, the type system and declarative query language strongly simplify the long-term maintenance of database applications. For example, queries can automatically adapt to schema changes, as we will see shortly!

Programmable databases

In order to allow users to craft fully programmable database applications, TypeDB comprises several key components that build on its modern type system and declarative language. These components address the business logic and behavior of applications, continuous application upgrades, and native integration with existing languages.

Database logic

Pre-computing data is a core feature of many database applications. TypeDB takes this one step further, by providing a powerful rule-inference engine that dynamically computes data only as needed. Conditions for inference are declaratively defined by rules, which naturally fit into TypeDBs type system. For example, the following rule states that any media file (whether an image, video, or something else) with a file size greater than 1GB will be marked as unavailable to download.

define
rule disable_download_of_large_files:
    when {
        $m isa media_file;
        $m has size_kb > 1024 ^ 2;
    } then {
        $m has download_status "unavailable";
    };

Notice how the same declarative language is used for the definition of rules as we previously saw in our TypeQL query. Rules can be used to augment query results in real-time, and can be applied in sequence, in parallel, and recursively with other rules. TypeDB’s rule-inference engine will automatically figure out in which order rules should be applied.

Schema continuity

Continuous integration is an important concern for most modern application architectures. TypeDB’s approach accounts for this process, and makes structural updates to the database particularly easy. One aspect of this concern is modifying the database schema. Continuing our earlier example, consider the following query which defines a new type audio_file as a subtype of media_file, and gives it ownership of an audio_format attribute which collects the file formats of audio files in our database.

define
audio_file sub media_file, owns audio_format;

This query will extend our type hierarchy. Previously only image_file and video_file were kinds of media_file, and this query now adds a new audio_file type as another kind of media_file. More precisely, we say audio_file is a subtype of media_file.

What does this mean for our earlier query? Well, let’s revisit that query and rewrite the match clause to be of the following simpler form.

match
$folder isa folder, has folder_name $name;
(location: $folder, file: $media) isa file_location;
$media isa media_file, has file_format $ext;
fetch
$name as folder_name;
$ext as file_format;

Here, file_format is now a joint super-attribute of audio_format, video_format, and audio_format. Notice how the resulting query is completely agnostic to the kinds of media files we are considering: it simply states $media is a media_file and has the file_format $ext. So, since audio files are media files, and since audio files have audio formats which are file formats, audio files will automatically be considered by the above query! No query refactoring is needed, even if we write the query before introducing new media file types and file formats.

To summarize: by letting declarative queries adapt to schema changes, TypeDB manages to avoid a large class of pitfalls that we’d usually encounter when making structural changes to our database.

Programmatic migrations

Queries, like the Define query above, provide a high-level, declarative approach to database operations. TypeDB also provides access to a programmatic and object-centric layer of such operations. This gives advanced control to developers, and can be used in a programming language of their choice. For example, the above creation of a new audio_file type could be alternatively achieved with the following Python code using TypeDB’s Python driver.

transaction: TypeDBTransaction
media_file = transaction.concepts.get_entity_type("media_file").resolve()
audio_format = transaction.concepts.get_attribute_type("audio_format").resolve()
audio_file = transaction.concepts.put_entity_type("audio_file").resolve()
audio_file.set_supertype(transaction, media_file).resolve()
audio_file.set_owns(transaction, audio_format).resolve()

Using the programmatic route, refactoring the details of our schema also becomes easy: for example, changing the label audio_file to raw_audio_file, can be achieved with the following call.

audio_file.set_label(transaction, "raw_audio_file").resolve()

Stateful data objects

The programmatic way of interacting with a TypeDB database extends all the way down to the data-level. In fact, here, TypeDB introduces a new stateful data object paradigm. As a quick example of this, consider the following basic Insert query which creates a new audio file object with file format "mp3".

insert
$new_audio isa audio_file, has audio_format "mp3";

The very same data insert can be achieved by manipulating data objects directly from your application code. For example, the above query could take the following form using TypeDB’s Python driver.

new_audio = media_file.create(transaction).resolve()
mp3_format = audio_format.put(transaction, "mp3").resolve()
new_audio.set_has(transaction, mp3_format).resolve()

There are, of course, many further operations that TypeDB’s data objects support, and those above merely provide a first taste!

Resilient architecture

The rise of distributed computing has brought many incredible advances, as well as many hard challenges. TypeDB is architected to work natively in the realm of distributed systems. It integrates concurrent computation at various levels, and guarantees data integrity at all steps of its execution model.

Native concurrency

Once a query has been written, we still have to execute it. TypeDB takes care of several steps in this process, based on a custom execution model. While we will touch on all aspects of this model in more detail throughout this course, let us give a brief overview of the main ideas at play.

  • TypeDB batches individual queries into transactions. To enable concurrent transactions, TypeDB uses snapshot isolation, meaning a user can freely operate on their data throughout the duration of a transaction without worrying about race conditions. Transactions can then be committed in order to be persisted in the database, at which point data integrity will be verified and invalid transactions rejected.

  • Transactions themselves are organized into sessions, which determine what type of transactions can be performed, such as reads or writes. For the duration of a session, communication between the user and the database is upheld. Sessions enable tighter control over concurrent operations: for example, in data-read sessions, transactions can always be run in parallel.

Within each transaction, the execution of queries by TypeDB may further involve the following steps:

  1. Queries by the user are first type-checked by TypeDB’s type-inference engine, which validates them against the database’s schema.

  2. A query plan is drawn up, deciding on the order and parallelization of data traversals.

  3. TypeDB’s rule-inference engine, based on a concurrent actor model, augments stored data with data materialized based on user-defined rules.

  4. The results of read queries are streamed to make them available to the user as soon as possible.

Modern security

To round things up, let us also briefly point to the security features that TypeDB implements, though they will not feature prominently in this course.

  • In-flight encryption: TypeDB Cloud supports modern encryption, to keep your data safe from prying eyes.

  • User and role management: Not all users will have the same privileges in your organization and the same will apply to your database. Using sessions types, such restrictions can be effectively imposed.

  • ACID guarantees: TypeDB provides users with ACID guarantees. This includes the use of a write-ahead log, which ensures no data is lost in case of unexpected crashes.

Summary and outlook

In this lesson, we have gotten a bird’s eye view on the TypeDB landscape. In particular, we’ve learned how TypeDB is based on a novel, declarative, type-centric data model that focusses on user intent, maintainability, and direct implementation of business logic. The result is a high-level programmable database that makes many common engineering tasks as simple as they should be. In the following lessons, we will dive a bit deeper into these features. Starting with more basic database operations and working our way up to advanced querying techniques, we will explore TypeDB in much more depth!

Further learning

Learn how current databases lack the expressivity to natively model polymorphism, leading to key challenges in database engineering.

Learn about TypeDB’s core features, including polymorphic data models and declarative querying, and about their impact on database engineering.

Learn about the unification of paradigms backed by modern type-theoretic mathematics, laying a novel foundation for modern databases.