TypeDB Blog
The theory of TypeQL: pioneering typeful query languages
Types for databases
Despite breaking with the traditions of long-established database query languages, TypeQL often feels deeply familiar even to first-time users of the language—this is because TypeQL is directly build on the principles of modern programming language theory which distill how we intuitively interact with computers using simple, yet powerful, structures: types. To quote from Harper’s 2012 book on the “Practical Foundations for programming languages”:
Types are the central organizing principle of the theory of programming languages. Language features are manifestations of type structure. The syntax of a language is governed by the constructs that define its types, and its semantics is determined by the interactions among those constructs. The soundness of a language design—the absence of ill-defined programs—follows naturally.
R. Harper
TypeQL is not alone in its mission of bringing the theory of types and the practice of industrial-strength languages together—typeful languages (a term coined by L. Cardelli) are trending. Types enable high-level, declarative constructions of programs using subtyping and polymorphism; they add regularity to languages and thereby clarify many programming issues long before they arise; and they are backed by a rigorous mathematical theory which accommodates low-level optimizations. This balance enables modern typeful programming languages (like Rust, OCaml, Haskell) to be fully-featured and performant languages, while at the same time bringing a maximal degree of declarativity and safety. The requirement of having to be precise about declaring the types in our code is a small price to pay for knowing “for sure” that our successfully type-checked program will stick precisely to the structure that we declared for it.
The theory and practice of TypeQL
Database languages have, so far, benefitted very little from the progress towards performant, high-level declarative languages based on types, neither in science nor in industry. TypeQL, the query language of our database TypeDB, aims to finally bring the key benefits of typeful programming languages to databases. After some intense research in our lab last year, the theory synthesizing typeful programming and database languages is now written out in detail in our most recent paper, which brings together many interesting topics from classical database theory and modern programming language theory—here it is:
“TypeQL: A Type-Theoretic and Polymorphic Query Language”, C. Dorn & H. Pribadi, Download PDF
(final version to be published in Proceedings of the ACM on Management of Data)
The paper explores a novel way of integrating typeful programming with performant database design, promoting traditional relations to so-called dependent types. This key step is based on a deep correspondence of classical logic and modern type theory known as the Propositions as Types paradigm. To quote from Wadler’s same-named paper:
Such a synthesis is offered by the principle of Propositions as Types, which links logic to computation. At first sight it appears to be a simple coincidence—almost a pun—but it turns out to be remarkably robust, inspiring the design of automated proof assistants and programming languages, and continuing to influence the forefronts of computing.
P. Wadler
In our paper, we evolve this fundamental principle further: we turn Propositions as Types into Queries as Types. This means that in TypeQL the user constructs types in order to formulate queries—of course, all this happens “under the hood”, packaged into TypeQL’s near-natural query language. The Queries as Types paradigm is where TypeQL strongly differs from existing QL paradigms:
- Traditionally, database language have taken a somewhat more algebraic (or operational) viewpoint: for example, in SQL queries boil down sequences of operations that transform tables into other tables.
- This similarly applies to prominent document database query languages, which operate on documents through “aggregation pipelines”.
- Purely declarative, pattern-based approaches do exist, for example, in the realm of graph databases, but all existing approaches lack a sophisticated type system.
The PERA model
TypeQL is underpinned by powerful type system, which includes the expected key components of a typeful programming language such as subtyping and polymorphism (note that in TypeQL-lingo we speak of ‘interfaces’ where others may speak of ‘typeclasses’ or ‘traits’), but adapts these components directly to the setting of databases. Indeed, in databases, language cannot be overly complicated to avoid convoluting storage and retrieval of data, and so TypeQL’s type system breaks down its data-storing types down into only three key kinds:
- Independent types that hold unique object identifiers, a.k.a. entities,
- Dependent types that hold unique object identifiers, a.k.a. relations,
- Independent types that hold literal values, a.k.a. independent attributes,
- Dependent types that hold literal values, a.k.a. dependent attributes.
The resulting data model of TypeQL is called the PERA model, the polymorphic entity-relation-attribute model, and in our paper we show that the model is straight-forwardly described in the framework of type theory. Its simple ingredients combine to yield highly expressive model, unifying core aspects of (essentially all!) existing popular data models. This highlights a central and important aspect of our work: the PERA model is a unifying generalization of existing database paradigms—for the interested reader, we illustrate how this unification works with a range of detailed examples in Appendix A of our paper.
Epilogue: what’s next?
Our terminology of “entities”, “relations”, and “attributes” is, of course, a direct reflection of classical conceptual entity-relationship modeling—importantly, we view these concepts through a fresh, purely type-theoretic lens, which highlights an interesting convergence of classical, natural-language inspired thinking in databases and the modern theoretical framework of types. In fact, as we mention in our paper, this convergence goes even further: dependent types and subtypes mix very well, giving rise to a new type system feature which we call type functions that neatly generalizes Datalog-like reasoning. This, too, is an exciting contribution to database language theory—and you can expect to hear more about this idea soon!
More generally, as computer scientists, we are excited by the potential of typeful database programming for shedding new light on a plethora of open questions and recent scientific work concerning database, knowledge representation, and reasoning systems, including, for example:
- Algorithms for efficient incremental view maintenance for rich query languages and reasoning systems,
- Building complex real-world database benchmarks and that make use of the semantic expressivity of types,
- Integrating neural and symbolic AI methods for information retrieval.
… and more! If these topics sounds interesting to you, don’t hesitate to be in touch.