Officially out now: The TypeDB 3.0 Roadmap

TypeDB Fundamentals

A Quick Overview of TypeDB's Data Model


The TypeDB database is based on a novel, highly expressive, and highly typed data model called the polymorphic entity-relation-attribute (PERA) model. The PERA model combines three distinct themes in database and programming language design into a single, simple model.

  1. First, the PERA model builds on core ideas from conceptual and semantic data modeling, which ensures that the model is structurally close to, and just as intuitive as, natural language.
  2. Second, the model integrates polymorphic types, which organizes concepts into inheritance type hierarchies and models the “behavioral traits” of these types through interfaces, which other types depend on.
  3. And third, the PERA model is accessible through a robust type-theoretic querying paradigm, in which composite types act as declarative queries. The result is a novel high-level query language: TypeQL.

In this article, we will give a brief overview over these components, and the core concepts and of TypeDB’s language, TypeQL, which is tailored to the PERA model. We also include links to companion articles in which you can explore the topics in more detail.

The bigger picture

To get us started, let us briefly review the larger picture, including the theory and vision, the pragmatic motivation, and the practical application of TypeDB as a high-level database for building robust modern applications—this will help us to put our discussion of the PERA model into context.

The theory and vision for the PERA model

TypeDB’s design, following the emerging trend of modern languages providing high-level “zero-cost” abstractions, is firmly rooted in the theory of types. This stands in stark contrast to common existing databases which are based on classical predicate-based and imperative thinking. In these classical approaches, types only arise as an afterthought, which means that many of the benefits of typed programming cannot be fully realized. Type systems provide an expressive way of declaring and controlling the behavior of programs. This makes applications safer, easier to compose, and easier to maintain.

In our article on the type-theoretic paradigm for modern databases we discuss how type theory allows us to re-think databases: types are declarations of “domains of data” and thus a great starting point to design a truly declarative query language. Moreover, since types provide the primary framework for polymorphism, and since dependencies in data can be elegantly represented with type dependencies, we argued that there is really no way around type theory for building the next generation of databases.

The practical motivation of type-theoretic DBs

Modern programming languages, often driven by scientific and open-source movements, have evolved dramatically over the last several decades. In contrast, databases seem to be stuck with, by now, rather aged paradigms; this phenomenon may be partially explained by the more commercialized environment that drives database engineering, or possibly by the much higher requirements for robustness, maturity, and theoretical understanding of database systems. As a result of these different speeds of development, today we deal on the regular with mismatches between modern type- and object-oriented programming on one hand and database languages on the other.

In our article on the need for a polymorphic database we discussed these mismatches and shortcomings in more detail, and the negative consequences that they have for guaranteeing the semantic integrity, modularity, and maintainability of database applications. We also observed that attempts to implicitly translate between high-level programming languages and database language (e.g. via ORMs) often result in costly abstractions, in the form of non-optimal queries or overhead compute for generating object representations. These mismatches compound at scale, with custom-build solutions providing a rather expensive last resort for many customers.

Implementing the vision in TypeDB

TypeDB puts the theory and vision of type-theoretic databases to use in order to address the aforementioned pain points. As a database, TypeDB builds directly on a polymorphic conceptual data model, comprising entity, relation, and attribute types, as well as their inheritance hierarchies and interfaces. TypeDB comes with a high-level, type-theoretic query language, TypeQL, that lets users directly interact with the PERA data model.

The type system of TypeDB ensures semantic integrity of data, and its language is compositional, modular, and intuitive, which ensures that even highly complex applications remain maintainable and can be fearlessly modified at any level of granularity without having to introduce breaking changes. All this and more is discussed in detail in our fundamentals article introducing TypeDB.

The data model, from first principles

Keeping the larger picture of TypeDB, its theory, and its motivation in mind, let’s see what goes into the design of its data model.

From named concepts to types

Common to all data model is the idea of a “concepts”. Conceptualization describes the process of introducing named concepts (like “person”, “car”, or “marriage”), which are understood by the user, in order to categorize our data: this means that we identify specific data or data structures as instances belonging to the concepts.

There are various examples of how conceptualization comes into play in common database paradigm:

  • In relational databases, our data is organized in two ways: tables are named concepts (say employee) whose data instances are rows. But the entries of each row are themselves instances of named concepts (say, name and age), namely, the columns of the table.
  • In graph databases, data is organized into nodes and edges, both of which are categorized by named concepts referred to as node labels (e.g. person) and edge labels (e.g. knows), respectively. Often we also attach further properties to these concept instances, which are themselves organized by named concepts (“property keys” like name).
  • In document databases, our named concepts are collections (say employee) which contain data instances called documents. Each document itself is organized and structured using further named concepts, namely, key-value pairs in which named keys (say, friend) point to values or sub-documents.

As these examples illustrate, named concepts are the essence of organizing data. Type Theory abstracts the idea of concept into the notion of type. This extends to the observation that some concepts depend on others via the powerful notion of dependent types (for example, it makes no sense to add a value to column without having a row for the value to live in! In this way, the column concepts depends on the table concept).

Taking this type-theoretic perspective on concepts is an important step: it opens the flexible and expressive toolbox of type theory to our data model and query language. For a more in-depth discussion of this step, check out our fundamentals article on concepts and types!

Key model components

Based on the type-theoretic perspective on concepts outlined above, the key component of the PERA model organize into a simple 2-by-2 matrix. On the y-axis we record whether our types are independent or dependent (recall, e.g., column values dependent on rows and so columns were an example of a dependent types in the relational data model). On the x-axis we distinguish our types based on their data instances: instances may either be literal values (e.g., integers, strings, booleans, …) or abstract objects. As a result we obtain the following table:

ObjectsValues
Independententity types(independent) attribute type
Dependentrelation types(dependent) attribute types
The 2 x 2 matrix of concept types

Let’s see some simple examples:

  • The type of persons p could be modeled as an entity type. A person p doesn’t really correspond to a single “literal value”, but rather to an abstract object, so this choice is pretty clear.
  • The type of marriages m between persons p1 and p2 is a natural example of a relation type. The fact that we added the terms “between persons …” gives a strong indicator that there is a dependency here: in order to create a marriage instance we must refer to existing instances p1 and p2 of type person.
  • The type of dates d of an event e is an example of a (dependent) attribute type. Indeed, its instance should be simple values: dates!

Type dependencies therefore play a special role in the PERA model. In fact, a core innovation of the model, enabled by its type-theoretic approach, is that all dependencies are polymorphic: e.g., dates depend on an event, and this event could be many things (e.g. a marriage or a birth or something else which has dates entirely). This flexibility is crucial when comparing the PERA model to other, less structured data models, like the graph and document model. To find out more about the PERA modeling primitives, check out our accompanying fundamentals article on the PERA model components!

Comparison to other data model

So how does the PERA model really stack up against other traditional data models? The short answer is that This is unsurprising given the generalist starting point of the model: concepts feature in any sensible data model, and the PERA model simply abstracts this by the notion of types.

In fact, it is easy to develop translation of other data models into the PERA model. The key step is simply to exhibit to following aspects of any given model:

  1. The concepts used by the model such as, for example, tables and columns in the relational model, node and edge labels (and property keys) in the labeled-property graph model, and collections and nested keys in the document model.
  2. The dependencies between concepts, i.e., which concepts can only be instantiated with reference to which other concepts. For example, in the relational model we saw that column entries structurally depend on rows of tables.

Having understood concept and concept dependencies in other models, means we can straight-forwardly translate them into the PERA model. For an in-depth comparison of the PERA model with other data models along these lines, check out our accompanying fundamentals article on model comparison!

But this basic translation of data models and data structures doesn’t really do justice to the novelty of the PERA model. The data model underlying the PERA model is merely one of three pillars of TypeDB, which stands between two further pillars: firstly, the query language TypeQL which implements the “queries as types” paradigm, and, secondly, the role of type functions (a.k.a. rules) for constructing views and inferring data “on-the-fly”.

The query language and inference engine

The polymorphic conceptual framework provided by the PERA model is a large step up from the existing data modeling paradigms in terms of expressivity. Equipped with a novel polymorphic and type-theoretic toolset, we can take this line of thinking even further. Let us briefly mention two directions of exciting developments enabled by the PERA model.

  1. The type-theoretic underpinnings of the PERA model also lend themselves to design a query language, which led to the inception of TypeQL. Indeed, so far we’ve only seen how to use TypeQL to define schema and insert data into our database. However, TypeQL also naturally acts as a query language for data retrieval, tailored to support polymorphism.
  2. Type theory is a natural framework for work with logical derivations. TypeDB integrates a powerful rule engine, which can be used to construct inferred data and views, and which is also natively accessible through TypeQL.

We now give a small taste of these ideas, but defer more in-depth discussion of either point to the future.

Querying with interface polymorphism

Data retrieval

Queries in TypeDB use the same syntax as schema definition and data insertion statements, but all objects, values or types in these statements may now be variablized. As an example, consider the following simple query

match
$eng_team isa team, has name "Engineering";
(team: $eng_team, member: $m) isa team_membership;
$m has name $n;
get $m, $n;

Let us inspect the query line by line:

  1. The first line introduces a variable $eng_team of type team. The object(s) represented by that variable are required to have the name "Engineering";
  2. The second line, reusing the variable $eng_team, introduces another variable $m. This variable is required to be (to represent objects that are) in team membership relation with $eng_team.
  3. In the last line of the match clause, we introduce yet another variable, $n, which is a name of $m.

In summary, the query matches and returns all employees in the engineering team and their names!

Schema modification

Now, consider extending our employee-team schema from earlier as follows.

define
contractor sub entity, 
    has name,
    plays team_membership:member;

This adds to our schema the concept of contractor. Like employees, contractors can own names, and be counted as members of teams. So what happens to our earlier query? It still works exactly the same! This time around, the query matches and returns all employees or contractors in the engineering team and their names!

Our example illustrates one way in which interface polymorphism can be extremely useful. Interfaces often lead to conceptually simpler and more flexible code, e.g. when emulating a foreign key to multiple tables.

Querying with inheritance polymorphism

The question of how to represent and model inheritance are long known issues for practical databases. In fact, during the rise of object-oriented programming, it led to the inception of a whole new type of object-oriented databases, though the approach, arguably, turned out to be not simple and “conceptual” enough in order to scale and compete with other database paradigms. Based on its stype-theoretical foundations, the PERA model integrates inheritance via subtyping in a simple and pure form.

A polymorphic query example

As an example consider the following variation of our earlier schema of employees and teams from the previous section:

define
person sub entity,
    owns name;
employee sub person,
    owns EmployeeID,
    plays team_membership:member;
director sub employee,
    plays team_leadership:leader;
team_membership sub relation,
    relates member,
    relates team;
team_leadership sub team_membership,
    relates leader as member;
name sub attribute, value string;
EmployeeID sub attribute, value string;

Now we may consider the following query.

match
$e isa employee, has EmployeeID $id;
(team: $t, leader: $e) isa team_leadership;
get $e, $id, $t;

As before, let’s go through the query line by line:

  1. The first line defines $e to be a variable representing an object of type employee, and a variable $id representing an employee ID of the employee $e. Note, since director is a subtype of employee, $e could, in particular, represent a director object.
  2. In the second, we declare that $e is the leader of a team $t.

The query thus returns all employees $e with ID $id who lead some team $t. Note, based on the inheritance structure defined in our schema, this surely means that the query will only return directors for the variable $e!

Rules for inferring new data on-the-fly

A final aspect of the PERA model concerns the ability to reason over data, meaning we may infer new instances of concepts from existing ones. This can be thought of as a (much more general) form of, say, constructing views in the relational model. The PERA model naturally incorporates rules due to its type-theoretic foundations: in fact, in TypeQL, rules use essentially the same syntax as queries or data specifications!

As a short and simple example, consider the following rule:

define director-team-members-are-leader:
when {
    $d isa director;
    (team: $t, member: $d) isa team_membership;
} then {
    (team: $t, leader: $d) isa team_leadership;
}

The rule states simply that, whenever a director is a member of a team, then they will also be considered a leader of that same team.

Reasoning with rules can be sequentially chained or reasoning can branch over multiple derivations of instances. This makes reasoning a powerful tool that can capture complex application logic in concise form.

Summary

The PERA model is compatible with most existing database paradigms individually, but it provides a substantially more elegant and unified approach to the organization of data than any single one of them. Moreover, it integrates modern polymorphic programming paradigms through its type-theoretic perspective on conceptual modeling. This addresses several pragmatic pain points of modern database engineering, increases application robustness, and reduces maintenance costs. TypeDB implements the PERA model, and comes with an intuitive query language, TypeQL, which makes it easy to get started!

Share this article

TypeDB Newsletter

Stay up to date with the latest TypeDB announcements and events.

Subscribe to Newsletter

Further Learning

TypeDB's Data Model

Learn about the conceptual Polymorphic Entity-Relation-Attribute model that backs TypeDB and TypeQL, and how it subsumes and surpasses previous database models.

Watch lecture

TypeQL ACM Paper

Read the preprint of the paradigm-defining paper behind TypeQL, due to be published in Proceedings of the ACM on Management of Data.

Read paper

Why Polymorphism

Current database technology has structural shortcomings, causing a long list of issues that developers should not need to live with.

Read article

Feedback