Lesson 1: Introduction to TypeDB

The fast advances of modern, high-level declarative programming models in parallel with the rise of distributed and multithreaded systems make it harder and harder to continue piling layers of abstractions on the relational model. TypeDB tackles this issue by re-thinking database systems from first principles, building on a new declarative language paradigm, a polymorphic data model, a powerful logic engine, and a new query execution model.

Complexity will only increase in the years ahead, essentially requiring new declarative programming models focused on intent, the user, and business logic.
— Amin Vahdat (VP/GM at Google)
Coming of age in the fifth epoch of distributed computing, 2024

Modern query language

While SQL is a declarative language, it requires users to write their query declarations in terms of certain low-level operations to manipulate tables. For example, consider the following query, which queries for folder locations of media files, that could be either image or video files, and returns the folder’s name and the (image or video) formats of the files in the folder.

SELECT folder.name AS folder_name,
       media.image_format AS file_format
FROM file_locations loc
    INNER JOIN folders folder ON folder.ID=loc.location_id
    INNER JOIN image_files media ON media.ID=loc.file_id
UNION
SELECT folder.name AS folder_name,
       media.video_format AS file_format
FROM file_locations loc
    INNER JOIN folders folder ON folder.ID=loc.location_id
    INNER JOIN video_files media ON media.ID=loc.file_id

The query uses a sequence of JOIN and UNION operations that are to be performed on our tables. While the above query is still pretty simplistic, the "operation-by-operation" coding style can lead to lengthy, repetitive, and hard-to-maintain queries, which allows subtle errors to propagate in unexpected and undeclared ways. This operation-focused approach to writing queries (which is also found across many other database languages) does not focus on the intent of the query.

TypeDB takes a different approach: it focuses on intent at every step of the query. A query’s intent, in particular, includes the types of data it queries, the relations between data, and literal attributes of data. In TypeQL, TypeDB’s query language, the above SQL query could take the following form.

match
  $folder isa folder, has folder_name $name;
  file_location (location: $folder, file: $media);
  { $ext isa image_format; } or { $ext isa video_format; };
  $media isa media_file, has $ext;
fetch {
  "folder_name": $name,
  "file_format": $ext
};

Notice how the TypeQL query captures the intent of our earlier SQL query simply by stating each variable’s intended function. No sequences of operations are needed: in fact, the statements in the match clause can be given in any order. Under the hood, this high-level, fully declarative approach of TypeQL’s language is powered by a robust type system and a type-inference engine. This is why we also refer to TypeQL as a type-theoretic query language!

The design of TypeQL is based on four main pillars, which we’ll learn about in this course. They closely reflect fundamental advances in modern high-level programming models over the past decades.

  • Declarative: TypeQL allows users to directly express the intent of their query, without the need to specify sequences of operations.

  • Safe: Users work within the type system of TypeDB, which alerts users about subtle errors before they happen, and ensures the high-level semantic integrity of their data at all times.

  • Expressive: TypeDB’s type system is flexible and adaptable. Polymorphism is a native feature of TypeDB’s type system and so are functions, which allow us to untangle otherwise complex database tasks into a handful of simple declarations.

  • Maintainable: Together, the type system and declarative query language strongly simplify the long-term maintenance of database applications. For example, queries can automatically adapt to schema changes, as we will see shortly!

Programmable databases

In order to allow users to craft fully programmable database applications, TypeDB comprises several key components that build on its modern type system and declarative language. These components address the business logic and behavior of applications, continuous application upgrades, and native integration with existing languages.

Modular logic

Many databases offer only clunky ability to modularize common or important database operations. TypeDB exposes re-use through a familiar programming-language construct: functions.

define
fun large_files() -> { media_file }:
  match
    $m isa media_file;
    $m has size_kb > 1024 ^ 2;
  return { $m };

Notice how the same declarative language is used for the definition of functions as we previously saw in our TypeQL query. Functions can capture simple or complex logic, and recursively interact with other functions. Function execution is automatically optimized like any other query.

Schema continuity

Continuous extension is an important concern for most modern application architectures. TypeDB’s approach accounts for this process, and makes structural updates to the database particularly easy. One aspect of this concern is modifying the database schema. Continuing our earlier example, consider the following query which defines a new type audio_file as a subtype of media_file, and gives it ownership of an audio_format attribute which collects the file formats of audio files in our database.

define
entity audio_file sub media_file, owns audio_format;

This query will extend our type hierarchy. Previously only image_file and video_file were kinds of media_file, and this query now adds a new audio_file type as another kind of media_file. More precisely, we say audio_file is a subtype of media_file.

What does this mean for our earlier query? Well, let’s revisit that query and rewrite the match clause to be of the following simpler form.

match
  $folder isa folder, has folder_name $name;
  file_location (location: $folder, file: $media);
  $media isa media_file, has file_format $ext;
fetch {
  "folder_name": $name,
  "file_format": $ext
};

Here, file_format is now a joint super-attribute of audio_format, video_format, and audio_format. Notice how the resulting query is completely agnostic to the kinds of media files we are considering: it simply states $media is a media_file and has the file_format $ext. So, since audio files are media files, and since audio files have audio formats which are file formats, audio files will automatically be considered by the above query! No query refactoring is needed, even if we write the query before introducing new media file types and file formats.

To summarize: by letting declarative queries adapt to schema changes, TypeDB manages to avoid a large class of pitfalls that we’d usually encounter when making structural changes to our database.

Atomic migrations

TypeDB’s snapshot-isolated ACID transactions give the schema mutation mechanism even more power. Schema changes are not always add-only, but also require removing or rearranging parts of the schema. TypeDB’s extension, type-based validation system means that you can move type hierarchies, rearrange attribute ownerships, detach or specialize relationships, and much more.

For example, given the following type hierarchy:

define
  entity resource, owns resource-id;
  entity file sub resource, owns name;
  entity folder sub file; # inherit owns name

Let’s say we want to arrive at the following schema instead:

define
  entity resource, owns resource-id;
  entity file sub resource, owns name;
  entity folder sub resource, owns folder-name;

We need to mutate the schema but also rewrite any existing folders that own name to instead own folder-name.

This can be done atomically, in 1 schema transaction, in 3 simple steps:

  1. Define the new attribute type

    define entity folder, owns folder-name;
  2. Rewrite the old folders with names into folder-names

    match $f isa folder, owns name $name;
    delete $name of $f;
    insert $f has folder-name == $name;
  3. Update the schema

    redefine entity folder, sub resource;

The system will automatically revalidate relevant data to ensure owned attributes and linked relations are still valid, cardinalities are respected, and the schema is intact on commit.

Resilient architecture

The rise of distributed computing has brought many incredible advances, as well as many hard challenges. TypeDB is architected to work natively in the realm of distributed systems. It integrates concurrent computation at various levels, and guarantees data integrity at all steps of its execution model.

Native concurrency

Once a query has been written, we still have to execute it. TypeDB takes care of several steps in this process, based on a custom execution model. While we will touch on all aspects of this model in more detail throughout this course, let us give a brief overview of the main ideas at play.

  • TypeDB batches individual queries into transactions. To enable concurrent transactions, TypeDB uses snapshot isolation, meaning a user can freely operate on their data throughout the duration of a transaction without worrying about race conditions. Transactions can then be committed in order to be persisted in the database, at which point data integrity will be verified and invalid transactions rejected.

Within each transaction, the execution of queries by TypeDB may further involve the following steps:

  1. Queries by the user are first compiled by TypeDB, including performing type-checking, which validates them against the database’s schema.

  2. Some static optimization is done to rewrite parts of the query into more efficient forms.

  3. A query plan is drawn up, deciding on the order of execution of parts of the query, based on statistical optimization.

  4. The results of read queries are executed lazily and incrementally to make them available to the user as soon as possible. Write queries are eagerly executed to avoid intermediate states.

Modern security

To round things up, let us also briefly point to the security features that TypeDB implements, though they will not feature prominently in this course.

  • In-flight encryption: TypeDB supports modern encryption, to keep your data safe from prying eyes.

  • User and role management: Not all users will have the same privileges in your organization and the same will apply to your database.

  • ACID guarantees: TypeDB provides users with ACID guarantees up to snapshot isolation. This includes the use of a write-ahead log, which ensures no data is lost in case of unexpected crashes.

Summary and outlook

In this lesson, we have gotten a bird’s eye view on the TypeDB landscape. In particular, we’ve learned how TypeDB is based on a novel, declarative, type-centric data model that focuses on user intent, maintainability, and direct implementation of business logic. The result is a high-level programmable database that makes many common engineering tasks as simple as they should be. In the following lessons, we will dive a bit deeper into these features. Starting with more basic database operations and working our way up to advanced querying techniques, we will explore TypeDB in much more depth!

Further learning

Learn how current databases lack the expressivity to natively model polymorphism, leading to key challenges in database engineering.

Learn about TypeDB’s core features, including polymorphic data models and declarative querying, and about their impact on database engineering.

Learn about the unification of paradigms backed by modern type-theoretic mathematics, laying a novel foundation for modern databases.