Lesson 9.4: Using type hierarchies
In the previous two lessons, we created and developed a data model for book data. In this lesson, we’ll see how we can introduce type hierarchies into the model so that we can leverage inheritance polymorphism in our queries.
Data table
ISBN-13 | ISBN-10 | Title | Format | Authors | Editors | Illustrators | Other contributors | Publisher | Year | City | State | Country | Page count | Genres | Price | Stock |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9780008627843 |
0008627843 |
The Hobbit |
ebook |
J.R.R. Tolkien |
J.R.R. Tolkien |
Harper Collins |
2023 |
New York City |
New York |
United States |
310 |
fantasy;fiction |
16.99 |
|||
9780060929794 |
0060929790 |
One Hundred Years of Solitude |
paperback |
Garcia Marquez, Gabriel |
Perennial |
1998 |
New York City |
New York |
United States |
458 |
fiction;historical fiction |
6.12 |
4 |
|||
9780195153446 |
0195153448 |
Classical Mythology |
paperback |
Lenardon, Robert J.;Morford, Mark P. O. |
Oxford University Press |
2002 |
New York City |
New York |
United States |
820 |
history;nonfiction |
34.98 |
12 |
|||
9780375801679 |
0375801677 |
The Iron Giant |
ebook |
Hughes, Ted |
Davidson, Andrew |
Knopf Books for Young Readers |
1999 |
New York City |
New York |
United States |
79 |
fiction;children’s fiction |
33.97 |
|||
9780387881355 |
0387881352 |
Electron Backscatter Diffraction in Materials Science |
hardback |
Schwartz, Adam J.;Kumar, Mukul;Adams, Brent L.;Field, David P. |
Springer |
2009 |
New York City |
New York |
United States |
425 |
nonfiction;technology |
230.37 |
9 |
|||
9780393045215 |
0393045218 |
The Mummies of Urumchi |
paperback |
Barber, Elizabeth Wayland |
W.W. Norton & Company |
1999 |
New York City |
New York |
United States |
240 |
history;nonfiction |
21.6 |
1 |
|||
9780393634563 |
0393634566 |
The Odyssey |
ebook |
Homer |
Wilson, Emily |
W.W. Norton & Company |
2017 |
New York City |
New York |
United States |
656 |
fiction;classics |
13.99 |
|||
9780446310789 |
0446310786 |
To Kill a Mockingbird |
paperback |
Harper Lee |
Grand Central Publishing |
1988 |
New York City |
New York |
United States |
281 |
fiction;historical fiction |
21.64 |
16 |
|||
9780451162076 |
0451162072 |
Pet Sematary |
paperback |
King, Stephen |
Signet |
1984 |
New York City |
New York |
United States |
374 |
horror;fiction |
93.22 |
1 |
|||
9780500026557 |
0500026556 |
Hokusai’s Fuji |
paperback |
Wada, Kyoko |
Katsushika, Hokusai |
Thames & Hudson |
2024 |
London |
United Kingdom |
416 |
art;nonfiction |
24.47 |
11 |
|||
9780500291221 |
0500291225 |
Great Discoveries in Medicine |
paperback |
Bynum, William;Bynum, Helen |
Thames & Hudson |
2023 |
London |
United Kingdom |
352 |
history;nonfiction |
12.05 |
18 |
||||
9780553212150 |
055321215X |
Pride and Prejudice |
paperback |
Austen, Jane |
Bantam Classics |
1983 |
New York City |
New York |
United States |
295 |
fiction;historical fiction |
17.99 |
15 |
|||
9780575104419 |
0575104414 |
Dune |
ebook |
Herbert, Frank |
Hachette Book Group |
2010 |
New York City |
New York |
United States |
624 |
fiction;science fiction |
5.49 |
||||
9780671461492 |
0671461494 |
The Hitchhiker’s Guide to the Galaxy |
paperback |
Adams, Douglas |
1982 |
New York City |
New York |
United States |
215 |
fiction;science fiction |
91.47 |
9 |
||||
9780679425601 |
0679425608 |
Under the Black Flag: The Romance and the Reality of Life Among the Pirates |
hardback |
Cordingly, David |
Random House |
1996 |
New York City |
New York |
United States |
296 |
history;nonfiction |
34.73 |
13 |
|||
9780740748479 |
0740748475 |
The Complete Calvin and Hobbes |
hardback |
Watterson, Bill |
Watterson, Bill |
Andrews McMeel Publishing |
2005 |
Kansas City |
Missouri |
United States |
1451 |
comics;fiction |
128.71 |
6 |
||
9781098108274 |
1098108272 |
Fundamentals of Data Engineering |
ebook |
Reis, Joe;Housley, Matt |
O’Reilly Media |
2022 |
Sevastopol |
California |
United States |
450 |
nonfiction;technology;children’s fiction |
47.99 |
||||
9781489962287 |
148996228X |
Interpretation of Electron Diffraction Patterns |
paperback |
Keown, Samuel Robert;Andrews, Kenneth William;Dyson, David John |
Springer |
1967 |
New York City |
New York |
United States |
199 |
nonfiction;technology |
47.17 |
15 |
|||
9781859840665 |
1859840663 |
The Motorcycle Diaries: A Journey Around South America |
paperback |
Guevara, Ernesto |
Wright, Ann |
Verso |
1996 |
London |
United Kingdom |
160 |
biography;nonfiction |
14.52 |
4 |
|||
9783319398778 |
Physical Principles of Electron Microscopy: An Introduction to TEM, SEM, and AEM |
ebook |
Egerton, R.F. |
Springer |
2016 |
London |
United Kingdom |
196 |
nonfiction;technology |
19.5 |
||||||
9798691153570 |
Business Secrets of The Pharoahs |
paperback |
Crorigan, Mark |
British London Publishing |
2020 |
London |
United Kingdom |
260 |
business;nonfiction |
11.99 |
8 |
Lesson 9.3 schema
define
book sub entity,
owns isbn-13,
owns isbn-10,
owns title,
owns format,
owns page-count,
owns genre,
owns price,
owns stock,
plays contribution:work,
plays publishing:published;
author sub entity,
owns name,
plays contribution:contributor;
editor sub entity,
owns name,
plays contribution:contributor;
illustrator sub entity,
owns name,
plays contribution:contributor;
other-contributor sub entity,
owns name,
plays contribution:contributor;
publisher sub entity,
owns name,
plays publishing:publisher;
city sub entity,
owns name,
plays publishing:location,
plays locating:located;
state sub entity,
owns name,
plays locating:location,
plays locating:located;
country sub entity,
owns name,
plays locating:location;
contribution sub relation,
relates contributor,
relates work;
publishing sub relation,
relates publisher,
relates published,
relates location,
owns year;
locating sub relation,
relates located,
relates location;
isbn-13 sub attribute, value string;
isbn-10 sub attribute, value string;
title sub attribute, value string;
format sub attribute, value string;
page-count sub attribute, value long;
genre sub attribute, value string;
price sub attribute, value double;
stock sub attribute, value long;
name sub attribute, value string;
year sub attribute, value long;
There are two ways we can go about identifying potential type hierarchies in our schemas: bottom-up and top-down. In the bottom-up approach, we identify a set of types that should share a common supertype. In the top-down approach, we identify a single type that should have multiple subtypes. We’ll explore both approaches here.
Bottom-up hierarchy design
We’ll begin with the bottom-up method. The entity types city
, state
, and country
are clearly very alike. Conceptually, these all represent kinds of place, so it makes sense to give them a common supertype place
. This supertype should be abstract, as it is not possible to have a place that is not one of these kinds, or some other kind not yet defined in the schema.
define
place sub entity,
abstract;
city sub place,
owns name,
plays publishing:location,
plays locating:located;
state sub place,
owns name,
plays locating:location,
plays locating:located;
country sub place,
owns name,
plays locating:location;
Next, we must consider which properties of the subtypes are actually properties that should be inherited from the supertype. The first is ownership of name
. All three subtypes own name
, so we can move it to the supertype with no change in expressivity of the current model, however this is a significant change for the development of the model in the future. If we give ownership of name
to place
, not only will all three current subtypes inherit the ownership, all future subtypes will inherit it as well. For that reason, we should consider if all kinds of places have names, not just the ones in the data model so far. Indeed, we decide that this is the case, so ownership of name
is moved to the supertype.
Now we should consider the other interfaces implemented by the subtypes:
-
publishing:location
implemented bycity
. -
locating:located
implemented bycity
andstate
. -
locating:location
implemented bystate
andcountry
.
None of these interfaces are implemented by all three subtypes, so it might seem that we should leave them defined there rather than at the supertype level. Once again though, we need to consider what the behaviour should be if we introduce more subtypes of place
. Should all kinds of place be publishing locations? Probably not. The front matter of books always lists the city of publication by convention. Should all kinds of place be locations for other places? That seems more likely. We might later introduce a "region" type that is superior to countries, or a "street" type that is inferior to cities. As such, we’ll move the playing of locating:located
and locating:location
to the supertype, but keep that of publishing:location
exclusive to city
.
define
place sub entity,
abstract,
owns name,
plays locating:location,
plays locating:located;
city sub place,
plays publishing:location;
state sub place;
country sub place;
Grouping existing types into hierarchies in this way is not always a good idea. As types in the PERA model can only have a single supertype, we must choose our types carefully. The fact the several types exhibit common behaviours is not necessarily an indicator that they are all subtypes of a common supertype, as we can alternatively encode those common behaviours by having the types independently implement the same interfaces. We will see an example of such a case in Lesson 9.5.
A type should only be considered a subtype of another type if every instance of the subtype is necessarily an instance of the supertype. For instance, every city is necessarily a place. If this is not the case, then subtyping is likely a poor modeling choice for the types in question. |
Top-down hierarchy design
Next, we’ll try the top-down method. Building type hierarchies top-down can be harder because it is often less obvious that a hierarchy should be present at all. The biggest indicator of a potential top-down hierarchy is when typing information is being stored as data. Let’s examine an example.
Currently, the schema shows that book
owns format
. If we examine the data table, we see that each book has exactly one format and that there are only a small number of these formats: "paperback", "hardback", and "ebook". As such, it would make sense for these to be subtypes of book. This is a case where typing information has been stored in a way that is structurally indistinguishable from data, as is often the case when working with tabulated data or non-polymorphic databases. Because of the lack of subtyping in other data modeling paradigms, types must be stored as properties in this way, or that data must be divided into separate structures (tables, documents, etc.) per type.
There is an additional hint that this field in the data table represents a type: the stock
attribute is absent for any book with the format "ebook". This makes sense of course, as an ebook cannot possibly go out of stock. The exclusivity of the stock property to physical books is another indicator that formats are best modelled as a type hierarchy, as we can then assign the property only to the appropriate subtypes. Based on this, we will create three subtypes of book
: paperback
, hardback
, and ebook
, and move ownership of stock
down the hierarchy. As a result, we will also make the supertype abstract, as every book must necessarily have a format.
define
book sub entity,
abstract,
owns isbn-13,
owns isbn-10,
owns title,
owns page-count,
owns genre
owns price,
plays contribution:work,
plays publishing:published;
paperback sub book,
owns stock;
hardback sub book,
owns stock;
ebook sub book;
In general, the following characteristics are good indicators of types stored as data in a particular field:
-
The field always contains exactly one value or is empty.
-
There are a small number of possible values for the field, including being empty.
-
The presence or absence of other fields depends on the value of this field.
Where the field always contains a value, this is indicative that the common supertype is abstract, as we have seen with book
. If the field is sometimes empty, this would indicate that the common supertype is not abstract, and applies as a non-specialised variant covering cases where the field is empty. We will see an example of such a hierarchy shortly.
Conversely, the following would be indicators against a field containing typing information:
-
The field can contain more than one value.
-
There are many possible value for the field.
-
There are no other fields whose presence or absence depends on this field.
An example of a field like this in the current schema would be genre
.
Using non-abstract supertypes
In both of the cases we have examined, those of place
and book
, it made sense to make the supertype abstract. This is primarily driven by semantic considerations, as it is not possible to have a place without a specific type, or a book without a format. However, non-abstract supertypes are very useful for constructing hierarchies in which some instances will have generalized behaviour, and some will have specialized behaviours in addition to the general behaviours. Let’s consider an example.
The current schema includes the types author
, editor
, illustrator
, and other-contributor
. Adopting a bottom-up approach to hierarchy design, it is clear these types should have a common supertype contributor
, especially seeing as they have identical interface implementations.
define
contributor sub entity,
abstract,
owns name,
plays contribution:contributor;
author sub contributor;
editor sub contributor;
illustrator sub contributor;
other-contributor sub contributor;
However, this hierarchy includes the redundant representation other-contributor
. Such a contributor is merely a contributor that is not an author, editor, or illustrator. In other words, this type is a generalised catch-all subtype. But the type that is supposed to represent the generalised behaviour in a type hierarchy is the supertype. As a result, we should make contributor
non-abstract and instantiate directly instead of other-contributor
.
define
contributor sub entity,
owns name,
plays contribution:contributor;
author sub contributor;
editor sub contributor;
illustrator sub contributor;
Now, if there is a contributor that is an author, editor, or illustrator, we can instantiate the appropriate specialized subtype. Meanwhile, if there is a contributor without a specific role listed, we can instantiate the generalized supertype.