New ACM paper, free-tier cloud, and open-source license

TypeDB Blog

The theory of TypeQL: pioneering typeful query languages

TypeQL is directly built on the principles of modern programming language theory, which distill how we intuitively interact with computers using simple, yet formal, structures: types.

Dr. Christoph Dorn

TypeQL: A Type-Theoretic & Polymorphic Query Language, C. Dorn and H. Pribadi
Accepted for publication at ACM SIGMOD/PODS 2024
Download PDF

Types for databases

Despite breaking with the traditions of long-established database query languages, TypeQL often feels deeply familiar even to first-time users of the language—this is because TypeQL is directly build on the principles of modern programming language theory which distill how we intuitively interact with computers using simple, yet powerful, structures: types. To quote from Harper’s 2012 book on the “Practical Foundations for programming languages”:

Types are the central organizing principle of the theory of programming languages. Language features are manifestations of type structure. The syntax of a language is governed by the constructs that define its types, and its semantics is determined by the interactions among those constructs. The soundness of a language design—the absence of ill-defined programs—follows naturally.

R. Harper

TypeQL is not alone in its mission of bringing the theory of types and the practice of industrial-strength languages together—typeful languages (a term coined by L. Cardelli) are trending. Types enable high-level, declarative constructions of programs using subtyping and polymorphism; they add regularity to languages and thereby clarify many programming issues long before they arise; and they are backed by a rigorous mathematical theory which accommodates low-level optimizations. This balance enables modern typeful programming languages (like Rust, OCaml, Haskell) to be fully-featured and performant languages, while at the same time bringing a maximal degree of declarativity and safety. The requirement of having to be precise about declaring the types in our code is a small price to pay for knowing “for sure” that our successfully type-checked program will stick precisely to the structure that we declared for it.

The theory and practice of TypeQL

Database languages have, so far, benefitted very little from the progress towards performant, high-level declarative languages based on types, neither in science nor in industry. TypeQL, the query language of our database TypeDB, aims to finally bring the key benefits of typeful programming languages to databases. After some intense research in our lab last year, the theory synthesizing typeful programming and database languages is now written out in detail in our most recent paper, which brings together many interesting topics from classical database theory and modern programming language theory—here it is:

“TypeQL: A Type-Theoretic and Polymorphic Query Language”, C. Dorn & H. Pribadi [pdf]

(final version to be published in Proceedings of the ACM on Management of Data)

The paper explores a novel way of integrating typeful programming with performant database design, promoting traditional relations to so-called dependent types. This key step is based on a deep correspondence of classical logic and modern type theory known as the Propositions as Types paradigm. To quote from Wadler’s same-named paper:

Such a synthesis is offered by the principle of Propositions as Types, which links logic to computation. At first sight it appears to be a simple coincidence—almost a pun—but it turns out to be remarkably robust, inspiring the design of automated proof assistants and programming languages, and continuing to influence the forefronts of computing.

P. Wadler

In our paper, we evolve this fundamental principle further: we turn Propositions as Types into Queries as Types. This means that in TypeQL the user constructs types in order to formulate queries—of course, all this happens “under the hood”, packaged into TypeQL’s near-natural query language. The Queries as Types paradigm is where TypeQL strongly differs from existing QL paradigms:

  • Traditionally, database language have taken a somewhat more algebraic (or operational) viewpoint: for example, in SQL queries boil down sequences of operations that transform tables into other tables.
  • This similarly applies to prominent document database query languages, which operate on documents through “aggregation pipelines”.
  • Purely declarative, pattern-based approaches do exist, for example, in the realm of graph databases, but all existing approaches lack a sophisticated type system.

The PERA model

TypeQL is underpinned by powerful type system, which includes the expected key components of a typeful programming language such as subtyping and polymorphism (note that in TypeQL-lingo we speak of ‘interfaces’ where others may speak of ‘typeclasses’ or ‘traits’), but adapts these components directly to the setting of databases. Indeed, in databases, language cannot be overly complicated to avoid convoluting storage and retrieval of data, and so TypeQL’s type system breaks down its data-storing types down into only three key kinds:

  1. Independent types that hold unique object identifiers, a.k.a. entities,
  2. Dependent types that hold unique object identifiers, a.k.a. relations,
  3. Independent types that hold literal values, a.k.a. global attributes,
  4. Dependent types that hold literal values, a.k.a. owned attributes.

The resulting data model of TypeQL is called the PERA model, the polymorphic entity-relation-attribute model, and in our paper we show that the model is straight-forwardly described in the framework of type theory. Its simple ingredients combine to yield highly expressive model, unifying core aspects of (essentially all!) existing popular data models. This highlights a central and important aspect of our work: the PERA model is a unifying generalization of existing database paradigms—for the interested reader, we illustrate how this unification works with a range of detailed examples in Appendix A of our paper.

Epilogue: what’s next?

Our terminology of “entities”, “relations”, and “attributes” is, of course, a direct reflection of classical conceptual entity-relationship modeling—importantly, we view these concepts through a fresh, purely type-theoretic lens, which highlights an interesting convergence of classical, natural-language inspired thinking in databases and the modern theoretical framework of types. In fact, as we mention in our paper, this convergence goes even further: dependent types and subtypes mix very well, giving rise to a new type system feature which we call type functions that neatly generalizes Datalog-like reasoning. This, too, is an exciting contribution to database language theory—and you can expect to hear more about this idea soon!

More generally, as computer scientists, we are excited by the potential of typeful database programming for shedding new light on a plethora of open questions and recent scientific work concerning database, knowledge representation, and reasoning systems, including, for example:

… and more! If these topics sounds interesting to you, don’t hesitate to be in touch.

Share this article

TypeDB Newsletter

Stay up to date with the latest TypeDB announcements and events.

Subscribe to Newsletter

Further Learning

TypeQL ACM Paper

Read the preprint of the paradigm-defining paper behind TypeQL, due to be published in Proceedings of the ACM on Management of Data.

Read paper

Polymorphic Data Model

Learn about the key ingredients of TypeDB's polymorphic entity-relation-attribute model, which unifies existing database paradigms into a single streamlined framework.

Watch lecture

Introducing Type Theory

In the last 40 years since relational algebra and SQL, databases have yet to provide more powerful abstractions to express more advanced logic.

Read article