Officially out now: The TypeDB 3.0 Roadmap

Formatting data

During bulk loading, data is often sourced from text files, most often in CSV or JSON format. Before it can be loaded, it must be formatted into a TypeQL query. This is best done by using scripts that build the queries from the data files. Let’s consider the following JSON object, which describes a book to be loaded into a bookstore database.

{
    "ISBN-13": "9780008627843",
    "ISBN-10": "0008627843",
    "Title": "The Hobbit",
    "Format": "ebook",
    "Authors": ["J.R.R. Tolkien"],
    "Editors": [],
    "Illustrators": ["J.R.R. Tolkien"],
    "Other contributors": [],
    "Publisher": "Harper Collins",
    "Page count": 310,
    "Genres": ["fiction", "fantasy"],
    "Price": 16.99
}

We might format it into an Insert query using the following Python function.

def build_book_query(entry: dict) -> str:
    query_parts = ["insert"]

    query_parts += [
        f"""$book isa {entry["Format"]};""",
        f"""$book has isbn-13 "{entry["ISBN-13"]}";""",
        f"""$book has title "{entry["Title"]}";""",
        f"""$book has page-count {entry["Page count"]};""",
        f"""$book has price {entry["Price"]};""",
    ]

    if "ISBN-10" in entry:
        query_parts += [f"""$book has isbn-10 "{entry["ISBN-10"]}";"""]

    query_parts += [f"""$book has genre "{genre}";""" for genre in entry["Genres"]]

    contributors: dict[str, int] = dict()

    for contributor in entry["Authors"] + entry["Editors"] + entry["Illustrators"] + entry["Other contributors"]:
        if contributor not in contributors:
            contributors[contributor] = len(contributors) + 1

    query_parts += [f"""$contributor_{index} isa contributor;""" for contributor, index in contributors.items()]
    query_parts += [f"""$contributor_{index} has name "{contributor}";""" for contributor, index in contributors.items()]
    query_parts += [f"""(work: $book, author: $contributor_{contributors[author]}) isa authoring;""" for author in entry["Authors"]]
    query_parts += [f"""(work: $book, editor: $contributor_{contributors[editor]}) isa authoring;""" for editor in entry["Editors"]]
    query_parts += [f"""(work: $book, illustrator: $contributor_{contributors[illustrator]}) isa illustrating;""" for illustrator in entry["Illustrators"]]
    query_parts += [f"""(work: $book, contributor: $contributor_{contributors[contributor]}) isa contribution;""" for contributor in entry["Other contributors"]]

    query_parts += [
        f"""$publisher isa publisher;""",
        f"""$publisher has name "{entry["Publisher"]}";""",
        f"""(published: $book, publisher: $publisher) isa publishing;""",
    ]

    return "\n".join(query_parts)

When called with the above JSON object, the function will return the following TypeQL query string.

insert
$book isa ebook;
$book has isbn-13 "9780008627843";
$book has title "The Hobbit";
$book has page-count 310;
$book has price 16.99;
$book has isbn-10 "0008627843";
$book has genre "fiction";
$book has genre "fantasy";
$contributor_1 isa contributor;
$contributor_1 has name "J.R.R. Tolkien";
(work: $book, author: $contributor_1) isa authoring;
(work: $book, illustrator: $contributor_1) isa illustrating;
$publisher isa publisher;
$publisher has name "Harper Collins";
(published: $book, publisher: $publisher) isa publishing;

We have employed a few strategies to help generate the query:

  • Rather than using composite statements, we only use simple statements. This makes queries easier to construct and debug.

  • Because a book might have multiple contributors, we automatically generate variable bindings for them by taking the string "$contributor_" and appending a number. This is a useful way of building a query that includes multiple similar variables.

  • We build the query up as a list of statements and then join them into a string at the end. This helps prevent errors from occurring.