TypeDB Fundamentals
Type Theory as the Unifying Foundation for Modern Databases
If the term “type theory” sounds exciting to you, chances are you are either a theoretical computer scientist or a mathematician. But with several high-profile successes of the field over the last decade or two, it is becoming evident that type theory has a lot to teach us about designing the foundations of practical modern technology. To quote Bob Harper:
Types are the central organizing principle of the theory of programming languages. Language features are manifestations of type structure. The syntax of a language is governed by the constructs that define its types, and its semantics is determined by the interactions among those constructs. The soundness of a language design—the absence of ill-defined programs—follows naturally.
We are already seeing many powerful type theoretical ideas crop up in languages like, say, Rust (including, e.g., the affine type system that governs borrowing, or pattern matching definitions that are used to destructure algebraic types).
Transferring these fundamental ideas from mathematics to practical computer science can yield substantial benefits in terms of expressivity, interoperability, and safety of our systems and applications. As a central organizational principle of (programming) languages, it can make it much easier to understand the behavior of even the most complex systems.
So what about database languages? How could we integrate modern type-theoretic ideas as we see it happening with programming languages? As a matter of fact, type-theoretic thinking has so far found little to no application in the field, at least when it comes to the design of database models and query languages.
Traditional models are built on rigid foundations that are not easily amenable to integrating type-theoretical language. This leaves much room for improvement and research at the intersection of programming and query languages. But it also means that we must re-think databases from first principles: with type-theoretic programming language at our disposal, how would we re-think and re-design the foundations of modern databases in the best possible way?
Through our work at TypeDB, we want to give a first comprehensive and practical answer to this question, which just recently started to draw more and more interest from both academia and industry. In the next few posts, we will take you on a whirlwind tour through the classical mathematical foundations for databases, modern type theory, conceptual database modeling, and a novel synthesis of all of these ideas.
How type theory powers languages
The first protagonist of our story is the field of type theory. Type theory re-envisions how formal languages are created, putting emphasis on how types interact (e.g. via functions between them) and making these interactions first class-citizens of our language. In contrast, traditional approach to formal languages were phrased in the language of classical logic in which only truth-valued predicates were allowed to be first-class citizens. This perspective underlies classical logic programming languages (e.g. Prolog) and the relational data model.
Towards the second half of the 20th century, mathematicians realized that interactions of mathematical structures are really most of what they care about, and it turned out that type theory provided the right approach to building the foundational language to capture this. A particularly important kind of interaction is that of type dependency: the construction of some types may have functional dependencies on others. This simple idea turns out to be fundamental in describing how mathematical structures are formed.
While crafting a language for mathematics is one of the purest applications of (dependent) type theory, its structured approach to language lends itself to many other application as well: type theory had a huge influence on programming language development, in particular, in the development of the functional programming paradigm. For a quick refresher on type theory and its relation to the above topics read our fundamentals article on type theory.
Polymorphic-enhanced conceptual data models
The second protagonist of our story is the classical modeling framework of databases: the entity-relationship (ER) model. Like type theory, the central ideas of ER modeling were introduced in the second half of the 20th century. The goal of ER modeling was to act as a “universal” framework in which to understand inter-dependent data. As Chen, the original author of the model, explains, the ER model was directly inspired by study of the structure of natural language.
To this day, ER modeling, as well as more general semantic modeling techniques, are taught as part of any database foundations lecture course. While many more “specific” data models have been build in the meantime (like the relational, graph, or document model), the ER model is still a powerful conceptual layer in which these domain-specific models can be compared.
For those of us who haven’t had a full lecture course on ER modeling yet, we’ve written up an opinionated summary of some of its core ideas in a dedicated fundamentals article on the ER-model. The article also explains how polymorphism may enter the picture in conceptual modeling!
The confluence of database languages and type theory
A key issue of the conceptual modeling layer is, of course, that it is a high-level language that has no direct, simple “algebraic” counterpart. In contrast, such formal algebraic languages are well-known for other, less expressive data models: e.g., the relational model has a formal language based on relation algebra, the labeled property graph model has a formal language based on graphs, and the document model has a formal language based on trees.
Relations (a.k.a. predicates), graphs, and trees are all fundamental structures in classical mathematics that have been studied since a long time. As we mentioned earlier, the focus as modern mathematics shifted towards how these structure interact and depend on one another. It turns out, this shift of perspective very much applies to the ER model as well: using dependent types, the ER model can be elegantly “understood” in algebraic terms.
A key realization for this construction is that the primitives of the ER model, including entities, relations, and attributes, have type-theoretic analogues. This perspective draws an elegant parallel between Chen’s original ideas about natural language and modern type theory. This lends itself to not only develop a formal language for the ER model (which, in fact, is even more expressive than the original model), but also to a completely new paradigm for querying data via constructing types.
The “queries as types” paradigm is a main building block for our query language TypeQL. Check out our fundamentals article on type-theoretic querying for a more in-depth introduction to this powerful idea!
Summary
Type systems are in fashion for a good reason. They “guide” the avid programmer to write safe (and working!) programs, without constraining their creativity. We believe that this very much also applies to database languages, however, the connection of database languages and type theory hasn’t been well-studied previously.
Through the development of TypeDB we aim to finally bring types to the database masses. This is certainly no small feat, and required the development of a completely new querying paradigm. This new approach to database queries has been published already in peer-reviewed research, but its development will surely continue for a while (with the help of our active community). After all, SQL has had a small head-start.
While we are still in the early days of our TypeDB journey, we are excited to finally share all our thoughts on the “unified theory of databases” with you. There will be much more to come, so if you’d like to stay in touch, please do so: subscribe to our TypeDB newsletter, join us on Discord, and follow us on Github!