Officially out now: The TypeDB 3.0 Roadmap

TypeDB Fundamentals

How Type Theory Powers Formal Languages


What are the fundamental building blocks of language? Mathematics has been trying to answer this question for a long time. A first contender for an answer emerged in the early 20th century in the form of Set Theory, the study of sets and their relations (predicates over sets). The predicate-centric perspective was powerful and provided inspiration for many subsequent scientific developments, including relational databases and logic programming. Set Theory, however, was still too rigid to capture much of mathematicians cared about: transformations of structure (e.g. programs acting on an input data structure) could only be indirectly described.

Type Theory is a powerful alternative to Set Theory, which considers type operations and type constructs as first-class citizen. Type Theory has been hugely influential in programming language research, in particular, for the functional programming paradigm. While the subject is a vast area of research nowadays, in this article will try to shed some light on some of its key ideas.

Classical formal languages

The first successful attempts at nailing down the formal foundations of mathematics emerged at the beginning of the 20th century in the work of Russell and Whitehead, even though there was no “precise statement of the syntax of the formalism” at that time (as Goedel pointed out). Nonetheless, early work was paving the way for what was to come: mathematicians identified sets and relations (also called predicates by mathematicians) as a starting point from which all other mathematics was to be constructed by certain axiomatic set operations (like unions or powersets).

For example, functions F(x) = y (say, Date-of-Event(e) = d) where conceived as specific relations F(x,y) in which, for all x, there exists exactly one y such that F(x,y) holds true. In contrast, general relations R(x,y) (say, Marriage-of-Persons(p1,p2)) need not have this property.

This line of thinking was, most certainly, hugely influential in the inception of relational algebra by Codd in the 1970s. Codd precisely envisioned relations as predicates, and all information was to be encoded in such relations. The key algebraic operations provided by Codd, moreover, were inspired by those of predicate logic: Codd’s main theorem precisely states that relational algebra is just as expressive as domain-independent predicate logic.

While expressive, the relational algebra approach is also extremely restrictive: the idea that ‘everything is a set or relation’ wasn’t satisfactory to many computer scientists, as many data structures are not natively modeled by sets and relations, and the search for other database paradigms continued. In fact, even vanilla SQL does not follow the relational paradigm exactly: in contrast to Codd’s set semantics, modern RDMSs work with bag semantics, which allows the row (x,y)to appear multiple times in a relation R . This choice, of course, reflects the needs of practical data modeling.

But what do mathematicians think of set theory? Did the theory of sets and relations solve the foundations of mathematics once and for all? The short answer is “no”. Indeed, while mathematicians have spent considerable effort in making set-theoretical foundations practical, the reality of mathematical practice turned out to be very far from “just sets and relations”.

Very similar to the concerns of many computer scientists, mathematicians found that most of the ideas they care about cannot be natively modeled by sets and relations. As a result, the search for better foundations of mathematics soon continued, and we made substantial progress on the question in the latter part of the 20th century. This success was based on a rather different line of thought than what was originally envisioned by set theorists.

Moving on to compositional systems

Type theory is about types and their interactions. In contrast to just sets, this adds a new level of descriptiveness to our thinking: for example, in type theory, functions between types become first-class citizens as they capture how types interact. Types and their functions now organize into a new kind of structure: a category (not a set)!

Nowadays, almost all areas of modern mathematics work in the language of category theory. Categories provide a bird’s eye perspective on type interaction, and on how interactions compose. This has led some mathematicians to call category and type theory the “compositional foundation” of mathematics.

Types are about composability

B. Milewksi, “Category theory for Programmers”

Importantly, not just mere functions of types compose in type theory. The birds-eye perspective taken by type theorists provided us with a clear picture of the integration of classical logic (say, predicate logic) and type theory: this is now known as the propositions-as-types paradigm.

In this paradigm, classical predicates become so-called dependent types: while predicates R(x,y) describe when a relation is true depending on the given x and y, a dependent type R(x,y) describes which terms z : R(x,y) belong to some type depending on x and y. Not only does this immediately resolve our earlier issue with “bag semantics” (we can have multiple “row” terms z1, z2, z3, ... in a “relation” R(x,y) for the same x and y), but it now also allows us to compose dependencies. Indeed, while classical predicates depend on sets, dependent types depend on, well, types again!

We can use this iteratively and consider types depending on types depending on types… etc. This leads to composed dependent expressions such as z : R(x,y); w : S(z); . From a more “functional” perspective, this composition could also be written as w : S(z : R(x,y)). Such composed dependencies turn out to be an extremely natural and ubiquitous data structure, as we will soon learn. As a first example, consider the composition d : Date-of-Event(m : Marriage-of-Persons(p1, p2))… but more on this soon!

Dependent types, and the propositions-as-types paradigm, are key ingredients of modern type-theoretic “compositional” mathematics and programming language research. The approach has celebrated many recent successes. To quote Philip Wadler:

Such a synthesis is offered by the principle of Propositions as Types, which links logic to computation. At first sight it appears to be a simple coincidence—almost a pun—but it turns out to be remarkably robust, inspiring the design of automated proof assistants and programming languages, and continuing to influence the forefronts of computing.

To most of us, all this still probably sounds a bit abstract. As we will now learn, it is not: each of us already uses dependent types on a daily basis.

Crash course in type theory

If you have read until here, have never heard of “type theory” before, and have no idea what the previous section was going on about, do not give up just yet. Here is a crash in type theory, not for mathematicians, but for English speakers.

What is a type?

Types appear in many different domains in computer science, ranging from conceptual data modeling to programming languages to formal proof assistants. As a starting point for our journey we consider the following broad definition of what a type is:

A type is a description of a domain that a variable can range over.

The definition is general enough to encompass interesting examples from a variety of domains. Let’s focus on examples from both programming languages and natural language. The former comes with familiar examples, such as the following.

  • Consider the typing x : Integer, stating that a variable x has the type of integers Integer. This type precisely describes the domain that x can range over, namely that x can be any integral number like 1, -2, or 13.
  • Similarly, the typing x : String tells us that x will range over the domain of strings, such as "abc", "John Doe", or "typedb.com".
  • The typing x : Divisor-of(y) is more interesting, here the domain that x can range over depends on a variable y : Integer. This precisely is a dependent type! If we initialize the variable to, say, y = 26 then this yields an independent domain description of x, namely that x can be among the integers 1, 2, 13, and 26.

Types in natural language

Our definition of what a type is also (in a more loose sense) applies to natural language. Let’s look at some examples.

  • Consider the typing x : Person. This again describes the domain that x can range over, namely, x can be any object identified as a “person”. You might argue that this domain is not as “universally” described as, say, the domain of integers, but let us fix a universe and time to make it precise (where of course, by a universe, we will mean a database!).
  • Similarly, the typing x : Pet-of-Person(y) is a valid description of a domain for x, but (as in our earlier case of divisors of an integer variable) our description here contains a variable y (which we take to be of type Person). So this is a dependent type. For any given person y , x will range over the domain of objects identified as “pets of person y“.
  • Just for the sake of mention, the standard example from database theory: consider the type x : Marriage-of-Persons(y,z) . This, again, is a dependent type, now depending on two variables y and z. The variable x ranges over the domain of objects identified as “marriages between y and z”. We make two important observations. First, “objects” can refer to abstract concepts like “marriage”. Second, recall that types can contain more than one object: in particular, the same two persons y and z may have had multiple marriages (it happens!).

So, what should we take away from the above? Well, first of all that the basic idea behind types and dependent types is really not that abstract. And secondly that dependent are, in fact, very much ubiquitous. Indeed, as we have just learned, whenever you say “John is one of my best friends”, you are secretly initializing the dependent type Best-friend-of-Person(y) with y set to yourself, and John a term in the resulting type!

An example of type composition

The real power of type theory derives from the fact that term and type expressions compose. This need not be explained in complicated mathematical terms either. It can be understood by analogy with natural language.

We have just learned that, from the perspective of natural language, types are descriptions of domains for variables… such descriptions compose. One fundamental way of composition, which we have already met in a previous example, is the chaining of descriptions. For example, let’s go back to our earlier typing d : Date-of-Event(m : Marriage-of-Persons(p1 : Person, p2 : Person)): this composes the descriptions, “p1 and p2 are persons”, “m is a marriage of p1 and p2” and “d is a (or the) date of m” into a single combined description “d is a date of the marriage m of persons p1 and p2”. Type theorists have a fancy term for this composed type: you are looking at a dependent pair type. Type theorists may write this as:

Σ[ p1 ∈ Person ] Σ[ p2 ∈ Person ] Σ[ m ∈ Marriage p1 p2 ] Σ[ d ∈ Date m ] ⊤

But, in all honesty, that looks pretty hard to read! And it need not be: getting ahead of ourselves, here is a “near-natural language” pattern in the query language TypeQL, which does the very same thing:

$p1 isa person;
$p2 isa person;
$m ($p1, $p2) isa marriage;
$m has date $d;

Reading this is easy as cake!

Besides sequentially combining descriptions, we may also parallelize them, and this yields a second important way of composing types. For example, the type x : Cat (“x is a cat”) and type x : Dog (“x is a dog”) can be “parallelized” in natural language by the description “x is a cat or a dog”. A type theorist, would speak of a sum type and write x : Cat + Dog. Again, in terms of intuitivity and readability, we believe that we can do much better than that. Here is a TypeQL pattern that expresses this type:

{ $x isa cat; } or { $x isa dog; };

At this point, you have already understood a substantial part of the idea behind “composability of types”. The fact that most of our explanation can be expressed in natural language goes to show, well, how natural the approach taken by type theory is.

Functions and dependencies

A fundamental class of type operations involves types of functions between types. In type theory, functions are first-class citizen that describe how types “interact”, i.e., how we can pass from one type to another . In dependent type theory, function types are generalized by the notion of dependent function types. These form a key ingredient of modern mathematical proof assistants.

Type theorists write dependent function types using fancy symbols like Π (in a way, dependent pair types, Σ, are only “half” of dependent type theory, with the other half being dependent function types). But they can be understood in rather elementary terms: elements of dependent functions types are functions, but the output type of that function may dependent on its input.

For example, consider a function that, given two persons p1 and p2, returns the (by-date) first of any of the (zero or more) marriages between them. The output type of this function is Option<Marriage(p1,p2)>, where we used an option type in our notation. This output type, indeed, depends on p1 and p2. The example demonstrate dependent functions are an innate part of natural language, albeit implicitly so.

Let’s remark that, while important for representing type interactions, function types are “much bigger” in size than pair types, which makes them less suitable for efficiently describing data with them. This is why, for the purposes of database design (and similarly in the everyday usage of natural language), function types most appear in first-order… i.e., it is rare for us to consider functions of functions of functions …. (PS: this is why in our database TypeDB we only work with first-order functions.)

Summary

The preceding examples were consciously chosen to lie in familiars domain of natural language. Indeed, while Type Theory is usually considered an abstract research area in mathematics and theoretical computer science, most of its key ideas are directly reflected in how natural language is structured. At the same time, by making type constructs and interactions first class citizens of its framework, type theory becomes a immensely more powerful tool for studying language than classical predicate-based logic. In our work at TypeDB, we make use of modern type-theoretic principle in developing novel, highly expressive query languages and database paradigms.

Share this article

TypeDB Newsletter

Stay up to date with the latest TypeDB announcements and events.

Subscribe to Newsletter

Further Learning

Type Theory Fundamentals

Discover the powerful unification of paradigms backed by modern type-theoretic mathematics, laying a novel foundation for modern database applications.

Read article

Type Theory CIKM Paper

Higher levels of abstraction including, in particular, the integration of high-level programming and reasoning techniques, will pave the way forward for future knowledge management systems.

Read paper

The Theory of TypeQL

TypeQL is directly built on the principles of modern programming language theory, which distill how we intuitively interact with computers using simple, yet formal, structures: types.

Read article

Feedback