Lesson 9.7: Avoiding interface redundancies
In the previous lesson, we learned how to use interface hierarchies in our schema to achieve different behaviours. In this lesson, we’ll see how having multiple interfaces that fulfill the same behaviour damages the querying capabilities of the model.
Lesson 9.6 schema
define
entity book @abstract,
owns isbn, # abstract, but instantiated using isbn-10 or isbn-13
owns isbn-13,
owns isbn-10,
owns title,
owns page-count,
owns genre,
owns price,
plays contribution:work,
plays publishing:published;
entity paperback sub book,
owns stock;
entity hardback sub book,
owns stock;
entity ebook sub book;
entity contributor,
owns name,
plays contribution:contributor;
entity publisher,
owns name,
plays publishing:publisher;
entity place @abstract,
owns name,
plays locating:location,
plays locating:located;
entity city sub place,
plays publishing:location;
entity state sub place;
entity country sub place;
relation contribution,
relates contributor,
relates work;
relation authoring sub contribution;
relation editing sub contribution;
relation illustrating sub contribution;
relation publishing,
relates publisher,
relates published,
relates location,
owns year;
relation locating,
relates located,
relates location;
attribute isbn @abstract, value string;
attribute isbn-13 sub isbn;
attribute isbn-10 sub isbn;
attribute title, value string;
attribute page-count, value integer;
attribute genre, value string;
attribute price, value double;
attribute stock, value integer;
attribute name, value string;
attribute year, value integer;
Identifying redundancies
Let’s begin by examining the following patterns to describe things that are located in a particular country:
-
To describe states located in the country:
$country isa country, has name $country-name; $state isa state; locating (location: $country, located: $state);
-
To describe cities located in the country:
$country isa country, has name $country-name; $city isa city; locating (location: $country, located: $city);
-
To describe publications that were located in the country:
$country isa country, has name $country-name; locating (location: $country, located: $city); $publishing isa publishing (location: $city);
What happens if we want to polymorphically query for everything located in the country? Well, we must use the following pattern:
$country isa country, has name $country-name;
{
locating (location: $country, located: $x);
} or {
locating (location: $country, located: $city);
$x isa publishing (location: $city);
};
Here we need a disjunction. This is due to the fact that publishing
cannot be cast into locating:located
, so we need to make a special case for it using the second branch of the disjunction. When querying them individually, the patterns for states and cities were structurally identical, while the pattern for publications was different. This means that we cannot query all three together using a structurally common pattern.
Essentially, we are using two different interfaces to represent common behaviour: the locating:location
interface and the publishing:location
interface. This is a bad practice in PERA model design, as it would be in application model design. While it is not too much of a problem at this stage, it has the potential to grow much more prevalent if we do not build the model’s interfaces in an extensible manner.
Eliminating redundancies
In order to eliminate the interface redundancy, we must choose which to keep. Between locating:location
and publishing:location
, the former is the more general-purpose interface. We will keep that interface, and change the way we record the locations where books are published to use it as well. Let’s begin by examining the section of the current schema used for publishing information.
define
entity book,
plays publishing:published;
entity publisher,
plays publishing:publisher;
entity city,
sub place,
plays publishing:location;
relation publishing,
relates publisher,
relates published,
relates location,
owns year;
We need to choose an object type to play the locating:located
role. The entity type publisher
would be a very poor choice.
Let’s add this schema on top of our bookstore database:
define
entity book,
plays publishing:published;
entity publisher,
plays publishing:publisher,
plays locating:located;
relation publishing,
relates publisher,
relates published,
owns year;
Publishers can publish different books in different cities, and if we record the publishing location on the publisher, we won’t be able to tell which book was published in which city. Consider if we chose this approach and then inserted, for example, the following data from the dataset.
match
$book-1 isa book, has isbn-13 "9780387881355";
$springer isa publisher, has name "Springer";
$nyc isa city, has name "New York City";
insert
publishing (published: $book-1, publisher: $springer);
locating (located: $springer, location: $nyc);
match
$book-2 isa book, has isbn-13 "9783319398778";
$springer isa publisher, has name "Springer";
$london isa city, has name "London";
insert
publishing (published: $book-2, publisher: $springer);
locating (located: $springer, location: $london);
If we were to then query the location that one of the books was published, we’d get two results back.
match
$book-1 isa book, has isbn-13 "9780387881355";
$publisher isa publisher;
$city isa city, has name $city-name;
publishing (published: $book-1, publisher: $publisher);
locating (located: $publisher, location: $city);
fetch {
"name": $city-name,
};
If you’ve done the schema and two data queries above, you’ll get:
{
"name": "New York City"
}
{
"name": "London"
}
This occurs because both publisher
and city
have functional dependencies on book
, but there is no functional dependency between city
and publisher
. As a result, we can’t use publisher
as the roleplayer because the data model will store the data in a lossy manner, as we have just seen.
Moving to the next option for the new roleplayer of locating:located
, the entity type book
is a reasonable choice.
define
entity book,
plays publishing:published,
plays locating:located;
entity publisher,
plays publishing:publisher;
relation publishing,
relates publisher,
relates published,
owns year;
We won’t run into the issue we do with publisher
as there is a functional dependency of city
on book
. But this feels semantically odd. If we chose this approach, then the above query for a book’s publication city would become the following.
match
$book-1 isa book, has isbn-13 "9780387881355";
$city isa city, has name $city-name;
locating (located: $book-1, location: $city);
fetch {
"name": $city-name,
};
While this model would not incur data loss, it is strange to talk about the location of a book as if it is somehow fixed, and the intent of the query is not immediately obvious as there is no mention of the book’s publication. In fact, there is a better choice.
In this case, the best option is to make the relation type publishing
the roleplayer of location:located
.
define
entity book,
plays publishing:published;
entity publisher,
plays publishing:publisher;
relation publishing,
relates publisher,
relates published,
owns year,
plays locating:located;
Because publishing
is an object type, it can implement any interface like an entity type can. And because it depends functionally on book
(by definition), doing so does not incur information loss. Here we have created a nested relation type: a relation type in which another relation type plays a role. Now we can query for a book’s publication city in the following manner.
match
$book-1 isa book, has isbn-13 "9780387881355";
$publishing isa publishing (published: $book-1);
$city isa city, has name $city-name;
locating (located: $publishing, location: $city);
With this model, we do not incur information loss, and the intent of the query is clear. If we return to our polymorphic pattern from earlier, it will now match instances of city
, state
, and publishing
for $x
, as they can all be upcast to location:located
!
$country isa country, has name $country-name;
locating (location: $country, located: $x);
When we extend the model to include more concepts with locations, we should always make them play locating:located
. In this way, this polymorphic query will always be able to find everything in a given location. If we would like to restrict the list of things returned, we can always place further constraints on the type of $x
.
Nested relation types are an advanced feature of the PERA model. They can be difficult to deal with, and so should be used with caution. Notably, the strategies we explored in Lesson 4.3 for deleting entities and relations cannot be applied universally if there are nested relations in the schema. However, nested relations are also an extremely powerful tool, as we will see in Lesson 11.1, where we will explore the full type-theoretic capabilities of the PERA model.