New ACM paper, free-tier cloud, and open-source license

TypeDB Fundamentals

Polymorphism-Enhanced Conceptual Data Models


When we built TypeDB, we set out to craft a generalist database, that can natively express and translate between diverse data and models. There is a long-established subfield of classical database theory that has worked precisely towards the goal of “generalism” for a long time: the field of conceptual modeling and, in particular, of entity-relationship (ER) modeling.

Both are vast fields of research by now, but we will focus on their core components. In this article we give a very brief overview of these components for both conceptual and ER modeling. We will also see that type theory provides a powerful perspective on these fields. The type-theoretic perspective then enables a further improvement of our models via the direct integration of polymorphism.

The categorization of language

The idea of “entity-relation-attribute” (ERA) models, which underlies most conceptual modeling approaches as well, was born out of the observation that language can be usefully categorized into three categories. Up to slight simplifications, the categories may be summarized as follows.

The first category of language comprises common nouns, such as “person”, “car”, or “marriage”. These describe independent domains (and thus types!) of objects. Specific objects, such as a specific person p, are referred to as ‘proper nouns’. A conceptual modeler would call a proper noun an entity, and speak of an entity type when referring to a well-described domain of individual entities.

Next up, we have verbs, which connect nouns by some relationship, as captured e.g. by the usual “subject-verb-object” statements. For example, we could say that “a person marries another person”. A conceptual modeler would speak of a relation between these objects; in contrast, a relation type would be an abstract description of the domain of specific relations (e.g. the domain of all “marriages”).

In the third category, we find qualities of nouns or verbs: in the former case, we speak of adjectives, and in the latter, of adverbs. Speaking of a “young person” is an example of the former, while “a person marries another person today” is an example of the latter. In conceptual modeling, these qualities of specific entities and relations are called attributes, whereas the abstract description of the domain of specific attributes would be an attribute type (such as the “dates of an event”).

The above three categories are summarized in the table below.

Natural LanguageConceptual modeling
noun
(a person)
entity
(the person p)
verb
(a person marries a person)
relation
(the marriage m between
person p1 and p2)
adjective/adverb
(a person marries a person today)
attribute
(the date d of the event e)
From natural language to concept types

But how does this relate to our earlier discussion of type theory? Well, the above table gives a first hint by introducing “variables” as part of our example descriptions in the conceptual modeling column. Let’s see in detail how type theory comes back into play.

The type-theoretic perspective

Looking at the conceptual modeling examples in the previous table, we find a striking structural similarity to our earlier discussion of types and type dependencies. Indeed, the “conceptual distinction” into entities, relations, and attributes made in conceptual modeling can also be recovered from a purely type-theoretic point of view.

Entity and relation types are types containing objects, meaning “abstract” concepts that themselves do not have intrinsic structure. In a database, we often represent individual objects only by “addresses” or (in database theory lingo) by object identifiers. Entity types are independent types (like “person”), while relation types are dependent types like (”marriage of persons p1 and p2”).

Attribute types are types containing values, meaning quantifiable concepts with intrinsic structure (such as integers, whose algebraic structure allows for various forms of representation like sequences of binary bits, or decimal digits). Attribute types are traditionally considered to be dependent types (like “date of event e”).

The next table summarizes the correspondence between conceptual modeling and type theory.

Conceptual modelingType theory
entity type
(containing the person p)
type without dependencies
containing objects
relation type
(containing the marriages m
between persons p1 and p2)
type with dependencies
containing objects
attribute type
(containing the dates d of events e)
type with dependencies
containing values
The type-theoretic perspective on concept types

While attributes are traditionally considered to be dependent on other concepts (namely the concepts that a given attribute associates a quality to), in the 2 by 2 matrix of “independent/dependent” types and types containing “objects/values”, this leaves a gap: what are the independent analogs of attributes? This turns out to encompass a natural class of concepts too!

There is no good term for these “independent attributes” in classical modeling, but the rigorous perspective of type-theoretic mathematics leads us to fill in gaps in the language. In fact, the resulting notion of “independent attributes” can be very useful in the context of a database, as it allows for the easy storage (and manipulation) of “global constants”, which can be used to directly represent data pertaining to the domain itself (like “the current date”).

The next table illustrates the resulting “completed” table of concepts:

ObjectsValues
Independententitiesglobal constants
Dependentrelationsattributes
The 2 x 2 matrix of concept types

Let us briefly add one important remark. We are not linguists, and we also have no intention of splitting hairs. Whether a concept is independent or not depends on the perspective of the data modeler. Consider, for example, our earlier dependent type Pet-of-Person(p) capturing pet animals of a given person p . We can now identify this as an unary relation type. But other conceptions of that type are possible: you could argue that pets should be considered independent entities (though maybe in this case the type Animal would be a more accurate description of the model’s intent, since the word Pet usually implies some form of ownership by a person).

Putting type polymorphism to use

In our above examples, we have seen an attribute type “date of event e” and a relation type “marriage of persons p1 and p2”. But why is it the case that we can speak of the “date of marriages”? What’s the connection between “dated events” and “marriages” here?

The answer to this question stems from type polymorphism, a ubiquitous idea in type theory and (relatedly) in object-oriented programming. There are three main forms of polymorphism found in the wild (for convenience to the reader, let’s switch to “object-oriented lingo”, but the reader may replace “classes” with “types” as well):

  1. The most well-known form of polymorphism is inheritance polymorphism. This lets classes inherit essential functionality from “parent” classes. This is used to model the “specialization” and “generalization” of concepts.
  2. Interface polymorphism lets classes implement features specified by interfaces. This is used to model specified “traits” or “capabilities” of objects, without capturing “everything” there is to these concepts. In particular, while an object will have a single parent, it may have multiple traits. And it will inherit all traits from its parent.
  3. In parametric polymorphism we write programs (or queries!) in a general fashion without reference to specific classes. This can be used to express statements or questions which are of “general form”, and apply to a variety of classes at the same time.

We can now answer the question we set out to solve: what is the relation of marriages and dated events, precisely? Well, the type Marriage has the trait of being an “event with an associated Date”. So, here, we employ a form of interface polymorphism!

However, all three forms of polymorphism provide powerful functionality for our data model and query language. In our fundamentally re-designed database, we want them all! Indeed, while inheritance hierarchies are an already well-established part of database design, interfaces often allow us to much more directly capture the true logic underlying our data (as we have seen in our above example).

Finally, using parametric polymorphism enables a highly powerful feature of our approach: by writing general “polymorphic” queries, we will “decouple” our queries from our database schema. This drastically minimizes the required maintenance of queries for all types during structural updates to our database. We defer a full explanation of these types of queries to another fundamentals article, however!

Summary

The classical ER model was directly inspired by natural language. However, type theory, a powerful tool for the description of general language, provides another perspective on understanding the primitives of ER modeling. As a result of this shift of perpsective, incorporating other features of modern type systems, in particular, that of polymorphism, becomes straight-forward. TypeDB is directly built on this insight, and we will explore its resulting “ER -inspired” type system in future articles.

Share this article

TypeDB Newsletter

Stay up to date with the latest TypeDB announcements and events.

Subscribe to Newsletter

Further Learning

Type Theory Fundamentals

Discover the powerful unification of paradigms backed by modern type-theoretic mathematics, laying a novel foundation for modern database applications.

Read article

The PERA Model

Learn about the individual components of the Polymorphic Entity-Relation-Attribute model, and how these components can be efficiently and naturally accessed through TypeQL.

Read article

Conceptual Data Model

A conceptual data model is a starting point that must be translated into a logical model. As TypeDB uses the PERA model, the conceptual model can be directly implemented without translation.

Read article

Feedback