Lesson 9.2: Determining object types
Determining entity types
Data might come from a variety of sources, from tabulated data files to real-time data streams from microservices. In this lesson, we’ll examine how we might go about modeling tabulated book data for the bookstore database using the PERA model, as shown here.
Data table
ISBN-13 | ISBN-10 | Title | Format | Authors | Editors | Illustrators | Other contributors | Publisher | Year | City | State | Country | Page count | Genres | Price | Stock |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9780008627843 |
0008627843 |
The Hobbit |
ebook |
J.R.R. Tolkien |
J.R.R. Tolkien |
Harper Collins |
2023 |
New York City |
New York |
United States |
310 |
fantasy;fiction |
16.99 |
|||
9780060929794 |
0060929790 |
One Hundred Years of Solitude |
paperback |
Garcia Marquez, Gabriel |
Perennial |
1998 |
New York City |
New York |
United States |
458 |
fiction;historical fiction |
6.12 |
4 |
|||
9780195153446 |
0195153448 |
Classical Mythology |
paperback |
Lenardon, Robert J.;Morford, Mark P. O. |
Oxford University Press |
2002 |
New York City |
New York |
United States |
820 |
history;nonfiction |
34.98 |
12 |
|||
9780375801679 |
0375801677 |
The Iron Giant |
ebook |
Hughes, Ted |
Davidson, Andrew |
Knopf Books for Young Readers |
1999 |
New York City |
New York |
United States |
79 |
fiction;children’s fiction |
33.97 |
|||
9780387881355 |
0387881352 |
Electron Backscatter Diffraction in Materials Science |
hardback |
Schwartz, Adam J.;Kumar, Mukul;Adams, Brent L.;Field, David P. |
Springer |
2009 |
New York City |
New York |
United States |
425 |
nonfiction;technology |
230.37 |
9 |
|||
9780393045215 |
0393045218 |
The Mummies of Urumchi |
paperback |
Barber, Elizabeth Wayland |
W.W. Norton & Company |
1999 |
New York City |
New York |
United States |
240 |
history;nonfiction |
21.6 |
1 |
|||
9780393634563 |
0393634566 |
The Odyssey |
ebook |
Homer |
Wilson, Emily |
W.W. Norton & Company |
2017 |
New York City |
New York |
United States |
656 |
fiction;classics |
13.99 |
|||
9780446310789 |
0446310786 |
To Kill a Mockingbird |
paperback |
Harper Lee |
Grand Central Publishing |
1988 |
New York City |
New York |
United States |
281 |
fiction;historical fiction |
21.64 |
16 |
|||
9780451162076 |
0451162072 |
Pet Sematary |
paperback |
King, Stephen |
Signet |
1984 |
New York City |
New York |
United States |
374 |
horror;fiction |
93.22 |
1 |
|||
9780500026557 |
0500026556 |
Hokusai’s Fuji |
paperback |
Wada, Kyoko |
Katsushika, Hokusai |
Thames & Hudson |
2024 |
London |
United Kingdom |
416 |
art;nonfiction |
24.47 |
11 |
|||
9780500291221 |
0500291225 |
Great Discoveries in Medicine |
paperback |
Bynum, William;Bynum, Helen |
Thames & Hudson |
2023 |
London |
United Kingdom |
352 |
history;nonfiction |
12.05 |
18 |
||||
9780553212150 |
055321215X |
Pride and Prejudice |
paperback |
Austen, Jane |
Bantam Classics |
1983 |
New York City |
New York |
United States |
295 |
fiction;historical fiction |
17.99 |
15 |
|||
9780575104419 |
0575104414 |
Dune |
ebook |
Herbert, Frank |
Hachette Book Group |
2010 |
New York City |
New York |
United States |
624 |
fiction;science fiction |
5.49 |
||||
9780671461492 |
0671461494 |
The Hitchhiker’s Guide to the Galaxy |
paperback |
Adams, Douglas |
1982 |
New York City |
New York |
United States |
215 |
fiction;science fiction |
91.47 |
9 |
||||
9780679425601 |
0679425608 |
Under the Black Flag: The Romance and the Reality of Life Among the Pirates |
hardback |
Cordingly, David |
Random House |
1996 |
New York City |
New York |
United States |
296 |
history;nonfiction |
34.73 |
13 |
|||
9780740748479 |
0740748475 |
The Complete Calvin and Hobbes |
hardback |
Watterson, Bill |
Watterson, Bill |
Andrews McMeel Publishing |
2005 |
Kansas City |
Missouri |
United States |
1451 |
comics;fiction |
128.71 |
6 |
||
9781098108274 |
1098108272 |
Fundamentals of Data Engineering |
ebook |
Reis, Joe;Housley, Matt |
O’Reilly Media |
2022 |
Sevastopol |
California |
United States |
450 |
nonfiction;technology;children’s fiction |
47.99 |
||||
9781489962287 |
148996228X |
Interpretation of Electron Diffraction Patterns |
paperback |
Keown, Samuel Robert;Andrews, Kenneth William;Dyson, David John |
Springer |
1967 |
New York City |
New York |
United States |
199 |
nonfiction;technology |
47.17 |
15 |
|||
9781859840665 |
1859840663 |
The Motorcycle Diaries: A Journey Around South America |
paperback |
Guevara, Ernesto |
Wright, Ann |
Verso |
1996 |
London |
United Kingdom |
160 |
biography;nonfiction |
14.52 |
4 |
|||
9783319398778 |
Physical Principles of Electron Microscopy: An Introduction to TEM, SEM, and AEM |
ebook |
Egerton, R.F. |
Springer |
2016 |
London |
United Kingdom |
196 |
nonfiction;technology |
19.5 |
||||||
9798691153570 |
Business Secrets of The Pharoahs |
paperback |
Crorigan, Mark |
British London Publishing |
2020 |
London |
United Kingdom |
260 |
business;nonfiction |
11.99 |
8 |
The first step is to determine what entity types exist in the data. If we examine the data table, it might appear that each row represents a book entity and every value in that row represents an attribute of the book. This would be a perfectly valid model, but it wouldn’t be particularly expressive.
define
book sub entity,
owns isbn-13,
owns isbn-10,
owns title,
owns format,
owns author,
owns editor,
owns illustrator,
owns other-contributor,
owns publisher,
owns year,
owns city,
owns state,
owns country,
owns page-count,
owns genre,
owns price,
owns stock;
For simplicity, throughout most of Lesson 9, we’ll be omitting the attribute type definitions in schema definitions where not required. |
The lack of expressivity causes several problems. Consider, for instance, if we had an author that was listed as having published different books under different names, as often is the case. We would not be able to tell which names (and therefore books) corresponded to the same author. Similarly, if two different authors published books under the same name, we would not be able to tell that they were written by different authors.
In order to distinguish cases like these, we need a notion of identity for authors. The identity of a concept is independent of any properties it has. In this case, an author remains the same author no matter what different names they are known by, and two authors are not the same person simply because they have the same name. Concepts with identities are generally best categorised as belonging to entity types. This would indicate that authors should be entities, and we can apply the same logic to editors, illustrators, and other contributors. We can also apply it to publishers, cities, states, and countries, when considering the following:
-
A publisher may have multiple imprints.
-
Two cities may have the same name.
-
A state or country may change its name.
These facts indicate that these concepts have identities not determined by their names (or any other property), and so should likewise be considered entities.
define
book sub entity,
owns isbn-13,
owns isbn-10,
owns title,
owns format,
owns year,
owns page-count,
owns genre,
owns price,
owns stock;
author sub entity,
owns name;
editor sub entity,
owns name;
illustrator sub entity,
owns name;
other-contributor sub entity,
owns name;
publisher sub entity,
owns name;
city sub entity,
owns name;
state sub entity,
owns name;
country sub entity,
owns name;
Here we’ve made use of interface polymorphism to have several unrelated types implement ownership of name
.
Determining relation types
Now that properties have been moved from book
to other entity types, we need a way of recording the dependencies between book
and those types so that we can query those properties. For instance, we need a way to record the dependency between a book and an author, indicating that the author wrote the book, allowing us to retrieve the author’s name from the book. We establish these dependencies with relation types. To begin with, we’ll link each entity type to book
using a relation type. It is fairly evident how we should do so for some of those types.
define
contribution sub relation,
relates contributor,
relates work;
publishing sub relation,
relates publisher,
relates published;
book plays contribution:work,
plays publishing:published;
author plays contribution:contributor;
editor plays contribution:contributor;
illustrator plays contribution:contributor;
other-contributor plays contribution:contributor;
publisher plays publishing:publisher;
Once again, we’ve leveraged interface polymorphism to re-use the same contribution
relation types to represent the interdependencies between book
and several contributor types: author
, editor
, illustrator
, and other-contributor
.
However, the connection from city
, state
, and country
to book
is less obvious. What should this relation be? Examining the dataset, we can tell that these fields refer to the location in which the book was published. As such, it makes sense that the publishing
relation is appropriate, and it follows that we should extend publishing
to be a ternary relation type to account for this.
define
publishing relates location,
owns year;
city plays publishing:location;
state plays publishing:location;
country plays publishing:location;
We have also moved the year
attribute from book
to publishing
, as it is more accurately a property of the publication event than it is the book itself. This is a good first iteration, but there are still many ways this model can be improved. We’ll explore them in the next few lessons.