New ACM paper, free-tier cloud, and open-source license

Lesson 9.3: Avoiding data redundancies

Identifying redundancies

In the previous lesson, we created a first iteration of a data model for book data. In this lesson, we’ll see how we can eliminate data redundancies generated by the model.

Data table
ISBN-13 ISBN-10 Title Format Authors Editors Illustrators Other contributors Publisher Year City State Country Page count Genres Price Stock

9780008627843

0008627843

The Hobbit

ebook

J.R.R. Tolkien

J.R.R. Tolkien

Harper Collins

2023

New York City

New York

United States

310

fantasy;fiction

16.99

9780060929794

0060929790

One Hundred Years of Solitude

paperback

Garcia Marquez, Gabriel

Perennial

1998

New York City

New York

United States

458

fiction;historical fiction

6.12

4

9780195153446

0195153448

Classical Mythology

paperback

Lenardon, Robert J.;Morford, Mark P. O.

Oxford University Press

2002

New York City

New York

United States

820

history;nonfiction

34.98

12

9780375801679

0375801677

The Iron Giant

ebook

Hughes, Ted

Davidson, Andrew

Knopf Books for Young Readers

1999

New York City

New York

United States

79

fiction;children’s fiction

33.97

9780387881355

0387881352

Electron Backscatter Diffraction in Materials Science

hardback

Schwartz, Adam J.;Kumar, Mukul;Adams, Brent L.;Field, David P.

Springer

2009

New York City

New York

United States

425

nonfiction;technology

230.37

9

9780393045215

0393045218

The Mummies of Urumchi

paperback

Barber, Elizabeth Wayland

W.W. Norton & Company

1999

New York City

New York

United States

240

history;nonfiction

21.6

1

9780393634563

0393634566

The Odyssey

ebook

Homer

Wilson, Emily

W.W. Norton & Company

2017

New York City

New York

United States

656

fiction;classics

13.99

9780446310789

0446310786

To Kill a Mockingbird

paperback

Harper Lee

Grand Central Publishing

1988

New York City

New York

United States

281

fiction;historical fiction

21.64

16

9780451162076

0451162072

Pet Sematary

paperback

King, Stephen

Signet

1984

New York City

New York

United States

374

horror;fiction

93.22

1

9780500026557

0500026556

Hokusai’s Fuji

paperback

Wada, Kyoko

Katsushika, Hokusai

Thames & Hudson

2024

London

United Kingdom

416

art;nonfiction

24.47

11

9780500291221

0500291225

Great Discoveries in Medicine

paperback

Bynum, William;Bynum, Helen

Thames & Hudson

2023

London

United Kingdom

352

history;nonfiction

12.05

18

9780553212150

055321215X

Pride and Prejudice

paperback

Austen, Jane

Bantam Classics

1983

New York City

New York

United States

295

fiction;historical fiction

17.99

15

9780575104419

0575104414

Dune

ebook

Herbert, Frank

Hachette Book Group

2010

New York City

New York

United States

624

fiction;science fiction

5.49

9780671461492

0671461494

The Hitchhiker’s Guide to the Galaxy

paperback

Adams, Douglas

Pocket

1982

New York City

New York

United States

215

fiction;science fiction

91.47

9

9780679425601

0679425608

Under the Black Flag: The Romance and the Reality of Life Among the Pirates

hardback

Cordingly, David

Random House

1996

New York City

New York

United States

296

history;nonfiction

34.73

13

9780740748479

0740748475

The Complete Calvin and Hobbes

hardback

Watterson, Bill

Watterson, Bill

Andrews McMeel Publishing

2005

Kansas City

Missouri

United States

1451

comics;fiction

128.71

6

9781098108274

1098108272

Fundamentals of Data Engineering

ebook

Reis, Joe;Housley, Matt

O’Reilly Media

2022

Sevastopol

California

United States

450

nonfiction;technology;children’s fiction

47.99

9781489962287

148996228X

Interpretation of Electron Diffraction Patterns

paperback

Keown, Samuel Robert;Andrews, Kenneth William;Dyson, David John

Springer

1967

New York City

New York

United States

199

nonfiction;technology

47.17

15

9781859840665

1859840663

The Motorcycle Diaries: A Journey Around South America

paperback

Guevara, Ernesto

Wright, Ann

Verso

1996

London

United Kingdom

160

biography;nonfiction

14.52

4

9783319398778

Physical Principles of Electron Microscopy: An Introduction to TEM, SEM, and AEM

ebook

Egerton, R.F.

Springer

2016

London

United Kingdom

196

nonfiction;technology

19.5

9798691153570

Business Secrets of The Pharoahs

paperback

Crorigan, Mark

British London Publishing

2020

London

United Kingdom

260

business;nonfiction

11.99

8

Lesson 9.2 schema
define
book sub entity,
    owns isbn-13,
    owns isbn-10,
    owns title,
    owns format,
    owns page-count,
    owns genre,
    owns price,
    owns stock,
    plays contribution:work,
    plays publishing:published;
author sub entity,
    owns name,
    plays contribution:contributor;
editor sub entity,
    owns name,
    plays contribution:contributor;
illustrator sub entity,
    owns name,
    plays contribution:contributor;
other-contributor sub entity,
    owns name,
    plays contribution:contributor;
publisher sub entity,
    owns name,
    plays publishing:publisher;
city sub entity,
    owns name,
    plays publishing:location;
state sub entity,
    owns name,
    plays publishing:location;
country sub entity,
    owns name,
    plays publishing:location;
contribution sub relation,
    relates contributor,
    relates work;
publishing sub relation,
    relates publisher,
    relates published,
    relates location,
    owns year;
isbn-13 sub attribute, value string;
isbn-10 sub attribute, value string;
title sub attribute, value string;
format sub attribute, value string;
page-count sub attribute, value long;
genre sub attribute, value string;
price sub attribute, value double;
stock sub attribute, value long;
name sub attribute, value string;
year sub attribute, value long;

Currently, when recording the location in which a book was published, we must record the city, state, and country.

match
$book isa book, has isbn-13 "9780575104419";
$publisher isa publisher, has name "Hachette Book Group";
$city isa city, has name "New York City";
$state isa state, has name "New York";
$country isa country, has name "United States";
insert
(
    publisher: $publisher,
    published: $book,
    location: $city,
    location: $state,
    location: $country
) isa publishing, has year 2010;

However, there is a functional dependency of state on city, as knowing the city (by identity, not just by name) in which a book is published is sufficient to determine the state in which it was published. Similarly, there is a functional dependency of country on state, and thus country on city, as functional dependencies are transitive. Simply recording the city should be sufficient, as the state and country should then be inferrable. In fact, we saw this inference performed explicitly in Lesson 3.3.

This highlights a problem with the data model. The entity types state and country have functional dependencies on the entity type city. Meanwhile, city has a functional dependency on book via the publishing relation type, in which they both play roles. Thus, state and country both have transitive function dependencies on book. However, state and country also have direct functional dependencies on book via publishing. This means that state and country have duplicate dependencies on book. When dependencies are duplicated, this creates redundancy in the data model, which can lead to data inconsistencies.

Eliminating redundancies

In order to prevent these inconsistencies, we must refactor the model in such a way that ensures there are no duplicate dependencies. In a relational database, this process is known as lossless join decomposition. To do this, we must first explicitly record the dependencies of country on state and of state on city, which are currently not captured by the data model. Then, we must remove the dependencies of country and state on book. The end result will be:

  • city depends on book.

  • state depends on city.

  • country depends on state.

city, state, and country will then each have only a single functional dependency on book, preventing data redundancies and inconsistencies. To be able to record the new dependencies, we’ll introduce a new locating relation type.

define
locating sub relation,
    relates located,
    relates location;
city plays locating:located;
state plays locating:location,
    plays locating:located;
country plays locating:location;

Next, we’ll record the locations of cities using the new type.

match
$city isa city, has name "New York City";
$state isa state, has name "New York";
$country isa country, has name "United States";
insert
(located: $city, location: $state) isa locating;
(located: $state, location: $country) isa locating;

Now, whenever we record the location a book is published, we only need to store the city.

match
$book isa book, has isbn-13 "9780575104419";
$publisher isa publisher, has name "Hachette Book Group";
$city isa city, has name "New York City";
insert
(publisher: $publisher, published: $book, location: $city) isa publishing,
    has year 2010;

If we wish to know the state or the country it was published, we can easily query this information via the location relations.

Provide Feedback