New ACM paper, free-tier cloud, and open-source license

Lesson 9.2: Determining object types

Determining entity types

Data might come from a variety of sources, from tabulated data files to real-time data streams from microservices. In this lesson, we’ll examine how we might go about modeling tabulated book data for the bookstore database using the PERA model, as shown here.

Data table
ISBN-13 ISBN-10 Title Format Authors Editors Illustrators Other contributors Publisher Year City State Country Page count Genres Price Stock

9780008627843

0008627843

The Hobbit

ebook

J.R.R. Tolkien

J.R.R. Tolkien

Harper Collins

2023

New York City

New York

United States

310

fantasy;fiction

16.99

9780060929794

0060929790

One Hundred Years of Solitude

paperback

Garcia Marquez, Gabriel

Perennial

1998

New York City

New York

United States

458

fiction;historical fiction

6.12

4

9780195153446

0195153448

Classical Mythology

paperback

Lenardon, Robert J.;Morford, Mark P. O.

Oxford University Press

2002

New York City

New York

United States

820

history;nonfiction

34.98

12

9780375801679

0375801677

The Iron Giant

ebook

Hughes, Ted

Davidson, Andrew

Knopf Books for Young Readers

1999

New York City

New York

United States

79

fiction;children’s fiction

33.97

9780387881355

0387881352

Electron Backscatter Diffraction in Materials Science

hardback

Schwartz, Adam J.;Kumar, Mukul;Adams, Brent L.;Field, David P.

Springer

2009

New York City

New York

United States

425

nonfiction;technology

230.37

9

9780393045215

0393045218

The Mummies of Urumchi

paperback

Barber, Elizabeth Wayland

W.W. Norton & Company

1999

New York City

New York

United States

240

history;nonfiction

21.6

1

9780393634563

0393634566

The Odyssey

ebook

Homer

Wilson, Emily

W.W. Norton & Company

2017

New York City

New York

United States

656

fiction;classics

13.99

9780446310789

0446310786

To Kill a Mockingbird

paperback

Harper Lee

Grand Central Publishing

1988

New York City

New York

United States

281

fiction;historical fiction

21.64

16

9780451162076

0451162072

Pet Sematary

paperback

King, Stephen

Signet

1984

New York City

New York

United States

374

horror;fiction

93.22

1

9780500026557

0500026556

Hokusai’s Fuji

paperback

Wada, Kyoko

Katsushika, Hokusai

Thames & Hudson

2024

London

United Kingdom

416

art;nonfiction

24.47

11

9780500291221

0500291225

Great Discoveries in Medicine

paperback

Bynum, William;Bynum, Helen

Thames & Hudson

2023

London

United Kingdom

352

history;nonfiction

12.05

18

9780553212150

055321215X

Pride and Prejudice

paperback

Austen, Jane

Bantam Classics

1983

New York City

New York

United States

295

fiction;historical fiction

17.99

15

9780575104419

0575104414

Dune

ebook

Herbert, Frank

Hachette Book Group

2010

New York City

New York

United States

624

fiction;science fiction

5.49

9780671461492

0671461494

The Hitchhiker’s Guide to the Galaxy

paperback

Adams, Douglas

Pocket

1982

New York City

New York

United States

215

fiction;science fiction

91.47

9

9780679425601

0679425608

Under the Black Flag: The Romance and the Reality of Life Among the Pirates

hardback

Cordingly, David

Random House

1996

New York City

New York

United States

296

history;nonfiction

34.73

13

9780740748479

0740748475

The Complete Calvin and Hobbes

hardback

Watterson, Bill

Watterson, Bill

Andrews McMeel Publishing

2005

Kansas City

Missouri

United States

1451

comics;fiction

128.71

6

9781098108274

1098108272

Fundamentals of Data Engineering

ebook

Reis, Joe;Housley, Matt

O’Reilly Media

2022

Sevastopol

California

United States

450

nonfiction;technology;children’s fiction

47.99

9781489962287

148996228X

Interpretation of Electron Diffraction Patterns

paperback

Keown, Samuel Robert;Andrews, Kenneth William;Dyson, David John

Springer

1967

New York City

New York

United States

199

nonfiction;technology

47.17

15

9781859840665

1859840663

The Motorcycle Diaries: A Journey Around South America

paperback

Guevara, Ernesto

Wright, Ann

Verso

1996

London

United Kingdom

160

biography;nonfiction

14.52

4

9783319398778

Physical Principles of Electron Microscopy: An Introduction to TEM, SEM, and AEM

ebook

Egerton, R.F.

Springer

2016

London

United Kingdom

196

nonfiction;technology

19.5

9798691153570

Business Secrets of The Pharoahs

paperback

Crorigan, Mark

British London Publishing

2020

London

United Kingdom

260

business;nonfiction

11.99

8

The first step is to determine what entity types exist in the data. If we examine the data table, it might appear that each row represents a book entity and every value in that row represents an attribute of the book. This would be a perfectly valid model, but it wouldn’t be particularly expressive.

define
book sub entity,
    owns isbn-13,
    owns isbn-10,
    owns title,
    owns format,
    owns author,
    owns editor,
    owns illustrator,
    owns other-contributor,
    owns publisher,
    owns year,
    owns city,
    owns state,
    owns country,
    owns page-count,
    owns genre,
    owns price,
    owns stock;

For simplicity, throughout most of Lesson 9, we’ll be omitting the attribute type definitions in schema definitions where not required.

The lack of expressivity causes several problems. Consider, for instance, if we had an author that was listed as having published different books under different names, as often is the case. We would not be able to tell which names (and therefore books) corresponded to the same author. Similarly, if two different authors published books under the same name, we would not be able to tell that they were written by different authors.

In order to distinguish cases like these, we need a notion of identity for authors. The identity of a concept is independent of any properties it has. In this case, an author remains the same author no matter what different names they are known by, and two authors are not the same person simply because they have the same name. Concepts with identities are generally best categorised as belonging to entity types. This would indicate that authors should be entities, and we can apply the same logic to editors, illustrators, and other contributors. We can also apply it to publishers, cities, states, and countries, when considering the following:

  • A publisher may have multiple imprints.

  • Two cities may have the same name.

  • A state or country may change its name.

These facts indicate that these concepts have identities not determined by their names (or any other property), and so should likewise be considered entities.

define
book sub entity,
    owns isbn-13,
    owns isbn-10,
    owns title,
    owns format,
    owns year,
    owns page-count,
    owns genre,
    owns price,
    owns stock;
author sub entity,
    owns name;
editor sub entity,
    owns name;
illustrator sub entity,
    owns name;
other-contributor sub entity,
    owns name;
publisher sub entity,
    owns name;
city sub entity,
    owns name;
state sub entity,
    owns name;
country sub entity,
    owns name;

Here we’ve made use of interface polymorphism to have several unrelated types implement ownership of name.

Determining relation types

Now that properties have been moved from book to other entity types, we need a way of recording the dependencies between book and those types so that we can query those properties. For instance, we need a way to record the dependency between a book and an author, indicating that the author wrote the book, allowing us to retrieve the author’s name from the book. We establish these dependencies with relation types. To begin with, we’ll link each entity type to book using a relation type. It is fairly evident how we should do so for some of those types.

define
contribution sub relation,
    relates contributor,
    relates work;
publishing sub relation,
    relates publisher,
    relates published;
book plays contribution:work,
    plays publishing:published;
author plays contribution:contributor;
editor plays contribution:contributor;
illustrator plays contribution:contributor;
other-contributor plays contribution:contributor;
publisher plays publishing:publisher;

Once again, we’ve leveraged interface polymorphism to re-use the same contribution relation types to represent the interdependencies between book and several contributor types: author, editor, illustrator, and other-contributor.

However, the connection from city, state, and country to book is less obvious. What should this relation be? Examining the dataset, we can tell that these fields refer to the location in which the book was published. As such, it makes sense that the publishing relation is appropriate, and it follows that we should extend publishing to be a ternary relation type to account for this.

define
publishing relates location,
    owns year;
city plays publishing:location;
state plays publishing:location;
country plays publishing:location;

We have also moved the year attribute from book to publishing, as it is more accurately a property of the publication event than it is the book itself. This is a good first iteration, but there are still many ways this model can be improved. We’ll explore them in the next few lessons.

Provide Feedback