Lesson 9.4: Using type hierarchies

In the previous two lessons, we created and developed a data model for book data. In this lesson, we’ll see how we can introduce type hierarchies into the model so that we can leverage inheritance polymorphism in our queries.

Data table

ISBN-13	ISBN-10	Title	Format	Authors	Editors	Illustrators	Other contributors	Publisher	Year	City	State	Country	Page count	Genres	Price	Stock
9780008627843	0008627843	The Hobbit	ebook	J.R.R. Tolkien		J.R.R. Tolkien		Harper Collins	2023	New York City	New York	United States	310	fantasy;fiction	16.99
9780060929794	0060929790	One Hundred Years of Solitude	paperback	Garcia Marquez, Gabriel				Perennial	1998	New York City	New York	United States	458	fiction;historical fiction	6.12	4
9780195153446	0195153448	Classical Mythology	paperback	Lenardon, Robert J.;Morford, Mark P. O.				Oxford University Press	2002	New York City	New York	United States	820	history;nonfiction	34.98	12
9780375801679	0375801677	The Iron Giant	ebook	Hughes, Ted		Davidson, Andrew		Knopf Books for Young Readers	1999	New York City	New York	United States	79	fiction;children’s fiction	33.97
9780387881355	0387881352	Electron Backscatter Diffraction in Materials Science	hardback		Schwartz, Adam J.;Kumar, Mukul;Adams, Brent L.;Field, David P.			Springer	2009	New York City	New York	United States	425	nonfiction;technology	230.37	9
9780393045215	0393045218	The Mummies of Urumchi	paperback	Barber, Elizabeth Wayland				W.W. Norton & Company	1999	New York City	New York	United States	240	history;nonfiction	21.6	1
9780393634563	0393634566	The Odyssey	ebook	Homer			Wilson, Emily	W.W. Norton & Company	2017	New York City	New York	United States	656	fiction;classics	13.99
9780446310789	0446310786	To Kill a Mockingbird	paperback	Harper Lee				Grand Central Publishing	1988	New York City	New York	United States	281	fiction;historical fiction	21.64	16
9780451162076	0451162072	Pet Sematary	paperback	King, Stephen				Signet	1984	New York City	New York	United States	374	horror;fiction	93.22	1
9780500026557	0500026556	Hokusai’s Fuji	paperback	Wada, Kyoko		Katsushika, Hokusai		Thames & Hudson	2024	London		United Kingdom	416	art;nonfiction	24.47	11
9780500291221	0500291225	Great Discoveries in Medicine	paperback		Bynum, William;Bynum, Helen			Thames & Hudson	2023	London		United Kingdom	352	history;nonfiction	12.05	18
9780553212150	055321215X	Pride and Prejudice	paperback	Austen, Jane				Bantam Classics	1983	New York City	New York	United States	295	fiction;historical fiction	17.99	15
9780575104419	0575104414	Dune	ebook	Herbert, Frank				Hachette Book Group	2010	New York City	New York	United States	624	fiction;science fiction	5.49
9780671461492	0671461494	The Hitchhiker’s Guide to the Galaxy	paperback	Adams, Douglas				Pocket	1982	New York City	New York	United States	215	fiction;science fiction	91.47	9
9780679425601	0679425608	Under the Black Flag: The Romance and the Reality of Life Among the Pirates	hardback	Cordingly, David				Random House	1996	New York City	New York	United States	296	history;nonfiction	34.73	13
9780740748479	0740748475	The Complete Calvin and Hobbes	hardback	Watterson, Bill		Watterson, Bill		Andrews McMeel Publishing	2005	Kansas City	Missouri	United States	1451	comics;fiction	128.71	6
9781098108274	1098108272	Fundamentals of Data Engineering	ebook	Reis, Joe;Housley, Matt				O’Reilly Media	2022	Sevastopol	California	United States	450	nonfiction;technology;children’s fiction	47.99
9781489962287	148996228X	Interpretation of Electron Diffraction Patterns	paperback	Keown, Samuel Robert;Andrews, Kenneth William;Dyson, David John				Springer	1967	New York City	New York	United States	199	nonfiction;technology	47.17	15
9781859840665	1859840663	The Motorcycle Diaries: A Journey Around South America	paperback	Guevara, Ernesto			Wright, Ann	Verso	1996	London		United Kingdom	160	biography;nonfiction	14.52	4
9783319398778		Physical Principles of Electron Microscopy: An Introduction to TEM, SEM, and AEM	ebook	Egerton, R.F.				Springer	2016	London		United Kingdom	196	nonfiction;technology	19.5
9798691153570		Business Secrets of The Pharoahs	paperback	Crorigan, Mark				British London Publishing	2020	London		United Kingdom	260	business;nonfiction	11.99	8

Lesson 9.3 schema

define
book sub entity,
    owns isbn-13,
    owns isbn-10,
    owns title,
    owns format,
    owns page-count,
    owns genre,
    owns price,
    owns stock,
    plays contribution:work,
    plays publishing:published;
author sub entity,
    owns name,
    plays contribution:contributor;
editor sub entity,
    owns name,
    plays contribution:contributor;
illustrator sub entity,
    owns name,
    plays contribution:contributor;
other-contributor sub entity,
    owns name,
    plays contribution:contributor;
publisher sub entity,
    owns name,
    plays publishing:publisher;
city sub entity,
    owns name,
    plays publishing:location,
    plays locating:located;
state sub entity,
    owns name,
    plays locating:location,
    plays locating:located;
country sub entity,
    owns name,
    plays locating:location;
contribution sub relation,
    relates contributor,
    relates work;
publishing sub relation,
    relates publisher,
    relates published,
    relates location,
    owns year;
locating sub relation,
    relates located,
    relates location;
isbn-13 sub attribute, value string;
isbn-10 sub attribute, value string;
title sub attribute, value string;
format sub attribute, value string;
page-count sub attribute, value long;
genre sub attribute, value string;
price sub attribute, value double;
stock sub attribute, value long;
name sub attribute, value string;
year sub attribute, value long;

There are two ways we can go about identifying potential type hierarchies in our schemas: bottom-up and top-down. In the bottom-up approach, we identify a set of types that should share a common supertype. In the top-down approach, we identify a single type that should have multiple subtypes. We’ll explore both approaches here.

Bottom-up hierarchy design

We’ll begin with the bottom-up method. The entity types city, state, and country are clearly very alike. Conceptually, these all represent kinds of place, so it makes sense to give them a common supertype place. This supertype should be abstract, as it is not possible to have a place that is not one of these kinds, or some other kind not yet defined in the schema.

define
place sub entity,
    abstract;
city sub place,
    owns name,
    plays publishing:location,
    plays locating:located;
state sub place,
    owns name,
    plays locating:location,
    plays locating:located;
country sub place,
    owns name,
    plays locating:location;

Next, we must consider which properties of the subtypes are actually properties that should be inherited from the supertype. The first is ownership of name. All three subtypes own name, so we can move it to the supertype with no change in expressivity of the current model, however this is a significant change for the development of the model in the future. If we give ownership of name to place, not only will all three current subtypes inherit the ownership, all future subtypes will inherit it as well. For that reason, we should consider if all kinds of places have names, not just the ones in the data model so far. Indeed, we decide that this is the case, so ownership of name is moved to the supertype.

Now we should consider the other interfaces implemented by the subtypes:

publishing:location implemented by city.
locating:located implemented by city and state.
locating:location implemented by state and country.

None of these interfaces are implemented by all three subtypes, so it might seem that we should leave them defined there rather than at the supertype level. Once again though, we need to consider what the behaviour should be if we introduce more subtypes of place. Should all kinds of place be publishing locations? Probably not. The front matter of books always lists the city of publication by convention. Should all kinds of place be locations for other places? That seems more likely. We might later introduce a "region" type that is superior to countries, or a "street" type that is inferior to cities. As such, we’ll move the playing of locating:located and locating:location to the supertype, but keep that of publishing:location exclusive to city.

define
place sub entity,
    abstract,
    owns name,
    plays locating:location,
    plays locating:located;
city sub place,
    plays publishing:location;
state sub place;
country sub place;

Grouping existing types into hierarchies in this way is not always a good idea. As types in the PERA model can only have a single supertype, we must choose our types carefully. The fact the several types exhibit common behaviours is not necessarily an indicator that they are all subtypes of a common supertype, as we can alternatively encode those common behaviours by having the types independently implement the same interfaces. We will see an example of such a case in Lesson 9.5.

A type should only be considered a subtype of another type if every instance of the subtype is necessarily an instance of the supertype. For instance, every city is necessarily a place. If this is not the case, then subtyping is likely a poor modeling choice for the types in question.

Top-down hierarchy design

Next, we’ll try the top-down method. Building type hierarchies top-down can be harder because it is often less obvious that a hierarchy should be present at all. The biggest indicator of a potential top-down hierarchy is when typing information is being stored as data. Let’s examine an example.

Currently, the schema shows that book owns format. If we examine the data table, we see that each book has exactly one format and that there are only a small number of these formats: "paperback", "hardback", and "ebook". As such, it would make sense for these to be subtypes of book. This is a case where typing information has been stored in a way that is structurally indistinguishable from data, as is often the case when working with tabulated data or non-polymorphic databases. Because of the lack of subtyping in other data modeling paradigms, types must be stored as properties in this way, or that data must be divided into separate structures (tables, documents, etc.) per type.

There is an additional hint that this field in the data table represents a type: the stock attribute is absent for any book with the format "ebook". This makes sense of course, as an ebook cannot possibly go out of stock. The exclusivity of the stock property to physical books is another indicator that formats are best modelled as a type hierarchy, as we can then assign the property only to the appropriate subtypes. Based on this, we will create three subtypes of book: paperback, hardback, and ebook, and move ownership of stock down the hierarchy. As a result, we will also make the supertype abstract, as every book must necessarily have a format.

define
book sub entity,
    abstract,
    owns isbn-13,
    owns isbn-10,
    owns title,
    owns page-count,
    owns genre
    owns price,
    plays contribution:work,
    plays publishing:published;
paperback sub book,
    owns stock;
hardback sub book,
    owns stock;
ebook sub book;

In general, the following characteristics are good indicators of types stored as data in a particular field:

The field always contains exactly one value or is empty.
There are a small number of possible values for the field, including being empty.
The presence or absence of other fields depends on the value of this field.

Where the field always contains a value, this is indicative that the common supertype is abstract, as we have seen with book. If the field is sometimes empty, this would indicate that the common supertype is not abstract, and applies as a non-specialised variant covering cases where the field is empty. We will see an example of such a hierarchy shortly.

Conversely, the following would be indicators against a field containing typing information:

The field can contain more than one value.
There are many possible value for the field.
There are no other fields whose presence or absence depends on this field.

An example of a field like this in the current schema would be genre.

Using non-abstract supertypes

In both of the cases we have examined, those of place and book, it made sense to make the supertype abstract. This is primarily driven by semantic considerations, as it is not possible to have a place without a specific type, or a book without a format. However, non-abstract supertypes are very useful for constructing hierarchies in which some instances will have generalized behaviour, and some will have specialized behaviours in addition to the general behaviours. Let’s consider an example.

The current schema includes the types author, editor, illustrator, and other-contributor. Adopting a bottom-up approach to hierarchy design, it is clear these types should have a common supertype contributor, especially seeing as they have identical interface implementations.

define
contributor sub entity,
    abstract,
    owns name,
    plays contribution:contributor;
author sub contributor;
editor sub contributor;
illustrator sub contributor;
other-contributor sub contributor;

However, this hierarchy includes the redundant representation other-contributor. Such a contributor is merely a contributor that is not an author, editor, or illustrator. In other words, this type is a generalised catch-all subtype. But the type that is supposed to represent the generalised behaviour in a type hierarchy is the supertype. As a result, we should make contributor non-abstract and instantiate directly instead of other-contributor.

define
contributor sub entity,
    owns name,
    plays contribution:contributor;
author sub contributor;
editor sub contributor;
illustrator sub contributor;

Now, if there is a contributor that is an author, editor, or illustrator, we can instantiate the appropriate specialized subtype. Meanwhile, if there is a contributor without a specific role listed, we can instantiate the generalized supertype.

Services

Platform

Tools

What is TypeDB?

Try cloud free

Pricing and plans

Learn

Content

Lesson 9.4: Using type hierarchies

Bottom-up hierarchy design

Top-down hierarchy design

Using non-abstract supertypes