New ACM paper, free-tier cloud, and open-source license

Lesson 9.4: Using type hierarchies

In the previous two lessons, we created and developed a data model for book data. In this lesson, we’ll see how we can introduce type hierarchies into the model so that we can leverage inheritance polymorphism in our queries.

Data table
ISBN-13 ISBN-10 Title Format Authors Editors Illustrators Other contributors Publisher Year City State Country Page count Genres Price Stock

9780008627843

0008627843

The Hobbit

ebook

J.R.R. Tolkien

J.R.R. Tolkien

Harper Collins

2023

New York City

New York

United States

310

fantasy;fiction

16.99

9780060929794

0060929790

One Hundred Years of Solitude

paperback

Garcia Marquez, Gabriel

Perennial

1998

New York City

New York

United States

458

fiction;historical fiction

6.12

4

9780195153446

0195153448

Classical Mythology

paperback

Lenardon, Robert J.;Morford, Mark P. O.

Oxford University Press

2002

New York City

New York

United States

820

history;nonfiction

34.98

12

9780375801679

0375801677

The Iron Giant

ebook

Hughes, Ted

Davidson, Andrew

Knopf Books for Young Readers

1999

New York City

New York

United States

79

fiction;children’s fiction

33.97

9780387881355

0387881352

Electron Backscatter Diffraction in Materials Science

hardback

Schwartz, Adam J.;Kumar, Mukul;Adams, Brent L.;Field, David P.

Springer

2009

New York City

New York

United States

425

nonfiction;technology

230.37

9

9780393045215

0393045218

The Mummies of Urumchi

paperback

Barber, Elizabeth Wayland

W.W. Norton & Company

1999

New York City

New York

United States

240

history;nonfiction

21.6

1

9780393634563

0393634566

The Odyssey

ebook

Homer

Wilson, Emily

W.W. Norton & Company

2017

New York City

New York

United States

656

fiction;classics

13.99

9780446310789

0446310786

To Kill a Mockingbird

paperback

Harper Lee

Grand Central Publishing

1988

New York City

New York

United States

281

fiction;historical fiction

21.64

16

9780451162076

0451162072

Pet Sematary

paperback

King, Stephen

Signet

1984

New York City

New York

United States

374

horror;fiction

93.22

1

9780500026557

0500026556

Hokusai’s Fuji

paperback

Wada, Kyoko

Katsushika, Hokusai

Thames & Hudson

2024

London

United Kingdom

416

art;nonfiction

24.47

11

9780500291221

0500291225

Great Discoveries in Medicine

paperback

Bynum, William;Bynum, Helen

Thames & Hudson

2023

London

United Kingdom

352

history;nonfiction

12.05

18

9780553212150

055321215X

Pride and Prejudice

paperback

Austen, Jane

Bantam Classics

1983

New York City

New York

United States

295

fiction;historical fiction

17.99

15

9780575104419

0575104414

Dune

ebook

Herbert, Frank

Hachette Book Group

2010

New York City

New York

United States

624

fiction;science fiction

5.49

9780671461492

0671461494

The Hitchhiker’s Guide to the Galaxy

paperback

Adams, Douglas

Pocket

1982

New York City

New York

United States

215

fiction;science fiction

91.47

9

9780679425601

0679425608

Under the Black Flag: The Romance and the Reality of Life Among the Pirates

hardback

Cordingly, David

Random House

1996

New York City

New York

United States

296

history;nonfiction

34.73

13

9780740748479

0740748475

The Complete Calvin and Hobbes

hardback

Watterson, Bill

Watterson, Bill

Andrews McMeel Publishing

2005

Kansas City

Missouri

United States

1451

comics;fiction

128.71

6

9781098108274

1098108272

Fundamentals of Data Engineering

ebook

Reis, Joe;Housley, Matt

O’Reilly Media

2022

Sevastopol

California

United States

450

nonfiction;technology;children’s fiction

47.99

9781489962287

148996228X

Interpretation of Electron Diffraction Patterns

paperback

Keown, Samuel Robert;Andrews, Kenneth William;Dyson, David John

Springer

1967

New York City

New York

United States

199

nonfiction;technology

47.17

15

9781859840665

1859840663

The Motorcycle Diaries: A Journey Around South America

paperback

Guevara, Ernesto

Wright, Ann

Verso

1996

London

United Kingdom

160

biography;nonfiction

14.52

4

9783319398778

Physical Principles of Electron Microscopy: An Introduction to TEM, SEM, and AEM

ebook

Egerton, R.F.

Springer

2016

London

United Kingdom

196

nonfiction;technology

19.5

9798691153570

Business Secrets of The Pharoahs

paperback

Crorigan, Mark

British London Publishing

2020

London

United Kingdom

260

business;nonfiction

11.99

8

Lesson 9.3 schema
define
book sub entity,
    owns isbn-13,
    owns isbn-10,
    owns title,
    owns format,
    owns page-count,
    owns genre,
    owns price,
    owns stock,
    plays contribution:work,
    plays publishing:published;
author sub entity,
    owns name,
    plays contribution:contributor;
editor sub entity,
    owns name,
    plays contribution:contributor;
illustrator sub entity,
    owns name,
    plays contribution:contributor;
other-contributor sub entity,
    owns name,
    plays contribution:contributor;
publisher sub entity,
    owns name,
    plays publishing:publisher;
city sub entity,
    owns name,
    plays publishing:location,
    plays locating:located;
state sub entity,
    owns name,
    plays locating:location,
    plays locating:located;
country sub entity,
    owns name,
    plays locating:location;
contribution sub relation,
    relates contributor,
    relates work;
publishing sub relation,
    relates publisher,
    relates published,
    relates location,
    owns year;
locating sub relation,
    relates located,
    relates location;
isbn-13 sub attribute, value string;
isbn-10 sub attribute, value string;
title sub attribute, value string;
format sub attribute, value string;
page-count sub attribute, value long;
genre sub attribute, value string;
price sub attribute, value double;
stock sub attribute, value long;
name sub attribute, value string;
year sub attribute, value long;

There are two ways we can go about identifying potential type hierarchies in our schemas: bottom-up and top-down. In the bottom-up approach, we identify a set of types that should share a common supertype. In the top-down approach, we identify a single type that should have multiple subtypes. We’ll explore both approaches here.

Bottom-up hierarchy design

We’ll begin with the bottom-up method. The entity types city, state, and country are clearly very alike. Conceptually, these all represent kinds of place, so it makes sense to give them a common supertype place. This supertype should be abstract, as it is not possible to have a place that is not one of these kinds, or some other kind not yet defined in the schema.

define
place sub entity,
    abstract;
city sub place,
    owns name,
    plays publishing:location,
    plays locating:located;
state sub place,
    owns name,
    plays locating:location,
    plays locating:located;
country sub place,
    owns name,
    plays locating:location;

Next, we must consider which properties of the subtypes are actually properties that should be inherited from the supertype. The first is ownership of name. All three subtypes own name, so we can move it to the supertype with no change in expressivity of the current model, however this is a significant change for the development of the model in the future. If we give ownership of name to place, not only will all three current subtypes inherit the ownership, all future subtypes will inherit it as well. For that reason, we should consider if all kinds of places have names, not just the ones in the data model so far. Indeed, we decide that this is the case, so ownership of name is moved to the supertype.

Now we should consider the other interfaces implemented by the subtypes:

  • publishing:location implemented by city.

  • locating:located implemented by city and state.

  • locating:location implemented by state and country.

None of these interfaces are implemented by all three subtypes, so it might seem that we should leave them defined there rather than at the supertype level. Once again though, we need to consider what the behaviour should be if we introduce more subtypes of place. Should all kinds of place be publishing locations? Probably not. The front matter of books always lists the city of publication by convention. Should all kinds of place be locations for other places? That seems more likely. We might later introduce a "region" type that is superior to countries, or a "street" type that is inferior to cities. As such, we’ll move the playing of locating:located and locating:location to the supertype, but keep that of publishing:location exclusive to city.

define
place sub entity,
    abstract,
    owns name,
    plays locating:location,
    plays locating:located;
city sub place,
    plays publishing:location;
state sub place;
country sub place;

Grouping existing types into hierarchies in this way is not always a good idea. As types in the PERA model can only have a single supertype, we must choose our types carefully. The fact the several types exhibit common behaviours is not necessarily an indicator that they are all subtypes of a common supertype, as we can alternatively encode those common behaviours by having the types independently implement the same interfaces. We will see an example of such a case in Lesson 9.5.

A type should only be considered a subtype of another type if every instance of the subtype is necessarily an instance of the supertype. For instance, every city is necessarily a place. If this is not the case, then subtyping is likely a poor modeling choice for the types in question.

Top-down hierarchy design

Next, we’ll try the top-down method. Building type hierarchies top-down can be harder because it is often less obvious that a hierarchy should be present at all. The biggest indicator of a potential top-down hierarchy is when typing information is being stored as data. Let’s examine an example.

Currently, the schema shows that book owns format. If we examine the data table, we see that each book has exactly one format and that there are only a small number of these formats: "paperback", "hardback", and "ebook". As such, it would make sense for these to be subtypes of book. This is a case where typing information has been stored in a way that is structurally indistinguishable from data, as is often the case when working with tabulated data or non-polymorphic databases. Because of the lack of subtyping in other data modeling paradigms, types must be stored as properties in this way, or that data must be divided into separate structures (tables, documents, etc.) per type.

There is an additional hint that this field in the data table represents a type: the stock attribute is absent for any book with the format "ebook". This makes sense of course, as an ebook cannot possibly go out of stock. The exclusivity of the stock property to physical books is another indicator that formats are best modelled as a type hierarchy, as we can then assign the property only to the appropriate subtypes. Based on this, we will create three subtypes of book: paperback, hardback, and ebook, and move ownership of stock down the hierarchy. As a result, we will also make the supertype abstract, as every book must necessarily have a format.

define
book sub entity,
    abstract,
    owns isbn-13,
    owns isbn-10,
    owns title,
    owns page-count,
    owns genre
    owns price,
    plays contribution:work,
    plays publishing:published;
paperback sub book,
    owns stock;
hardback sub book,
    owns stock;
ebook sub book;

In general, the following characteristics are good indicators of types stored as data in a particular field:

  • The field always contains exactly one value or is empty.

  • There are a small number of possible values for the field, including being empty.

  • The presence or absence of other fields depends on the value of this field.

Where the field always contains a value, this is indicative that the common supertype is abstract, as we have seen with book. If the field is sometimes empty, this would indicate that the common supertype is not abstract, and applies as a non-specialised variant covering cases where the field is empty. We will see an example of such a hierarchy shortly.

Conversely, the following would be indicators against a field containing typing information:

  • The field can contain more than one value.

  • There are many possible value for the field.

  • There are no other fields whose presence or absence depends on this field.

An example of a field like this in the current schema would be genre.

Using non-abstract supertypes

In both of the cases we have examined, those of place and book, it made sense to make the supertype abstract. This is primarily driven by semantic considerations, as it is not possible to have a place without a specific type, or a book without a format. However, non-abstract supertypes are very useful for constructing hierarchies in which some instances will have generalized behaviour, and some will have specialized behaviours in addition to the general behaviours. Let’s consider an example.

The current schema includes the types author, editor, illustrator, and other-contributor. Adopting a bottom-up approach to hierarchy design, it is clear these types should have a common supertype contributor, especially seeing as they have identical interface implementations.

define
contributor sub entity,
    abstract,
    owns name,
    plays contribution:contributor;
author sub contributor;
editor sub contributor;
illustrator sub contributor;
other-contributor sub contributor;

However, this hierarchy includes the redundant representation other-contributor. Such a contributor is merely a contributor that is not an author, editor, or illustrator. In other words, this type is a generalised catch-all subtype. But the type that is supposed to represent the generalised behaviour in a type hierarchy is the supertype. As a result, we should make contributor non-abstract and instantiate directly instead of other-contributor.

define
contributor sub entity,
    owns name,
    plays contribution:contributor;
author sub contributor;
editor sub contributor;
illustrator sub contributor;

Now, if there is a contributor that is an author, editor, or illustrator, we can instantiate the appropriate specialized subtype. Meanwhile, if there is a contributor without a specific role listed, we can instantiate the generalized supertype.

Provide Feedback