Graph Visualisation for TypeDB Data

output 20 0

This tutorial is based on the jupyter notebook in the typedb-graph-utils repository. You can set it up and run it locally.

This is a tutorial to visualise TypeDB data as graphs using the query representation returned by the analyze endpoint, and with query results (when the include_query_structure QueryOption is set).

It uses the TypeDB python driver for interacting with TypeDB, networkx for building the graph, and matplotlib for visualisation. We chose networkx and matplotlib because they’re widely used, but you could use any libraries - TypeDB studio uses sigmajs and graphology.

At the end, we present the typedb-graph-utils python library, which provides an easy interface to build graphs and other structures from TypeDB query results.

This tutorial requires TypeDB version >= 3.7.0

Dataset

For the following sections, we’ll use this toy dataset.

# schema
define
  attribute name, value string;
  attribute age, value integer;
  entity person, owns name, owns age;

# data
insert
  $john isa person, has name "John", has age 20;
  $jane isa person, has name "Jane", has age 30;

# simple_query
match $x isa person, has name $n;
Code to set up
SCHEMA = """
define
  attribute name, value string;
  attribute age, value integer;
  entity person, owns name, owns age;
"""

DATA = """
insert
  $john isa person, has name "John", has age 20;
  $jane isa person, has name "Jane", has age 30;
"""

SIMPLE_QUERY = """
    match $x isa person, has name $n;
"""

from typedb.driver import TypeDB, Credentials, DriverOptions, TypeDB, QueryOptions, TransactionType

DB_ADDRESS = "127.0.0.1:1729"
DB_CREDENTIALS = Credentials("admin", "password")
DRIVER_OPTIONS = DriverOptions(is_tls_enabled=False)
QUERY_OPTIONS = QueryOptions()
QUERY_OPTIONS.include_query_structure = True
DB_NAME = "typedb-graph-tutorial-py"

def setup(driver, schema, data):
    if DB_NAME in [db.name for db in driver.databases.all()]:
        driver.databases.get(DB_NAME).delete()
    driver.databases.create(DB_NAME)
    with driver.transaction(DB_NAME, TransactionType.SCHEMA) as tx:
        tx.query(schema).resolve()
        tx.commit()
    with driver.transaction(DB_NAME, TransactionType.WRITE) as tx:
        rows = list(tx.query(data).resolve())
        assert 1 == len(rows)
        tx.commit()

driver = TypeDB.driver(DB_ADDRESS, DB_CREDENTIALS, DRIVER_OPTIONS)
setup(driver, SCHEMA, DATA)

Analyze & the query graph

TypeDB allows users to analyze a query to get a type-annotated representation of it. This includes the pipeline structure, and typing information for the variables in each pattern.

with driver.transaction(DB_NAME, TransactionType.READ) as tx:
    analyzed = tx.analyze(SIMPLE_QUERY).resolve()

A pipeline is made up of PipelineStage s, some of which are made of Conjunction s. structure, and typing information for the variables in each pattern.

from typedb.analyze import Pipeline, Constraint, ConstraintVertex
class Pipeline:
    def stages(self) -> Iterator[PipelineStage]
    def conjunction(self, conjunction_id: ConjunctionID) -> Optional[Conjunction]
    # ...

class MatchStage(PipelineStage):
    def block(self) -> "ConjunctionID"

class InsertStage(PipelineStage):
    def block(self) -> "ConjunctionID"
# Analyze the simple query
pipeline = analyzed.pipeline()
stages = list(pipeline.stages())
stages
Output
[
    Match { block: ConjunctionID(0) }
]

A conjunction is a collection of Constraint s, which should be familiar from the TypeQL statements:

class Conjunction:
    def constraints(self) -> Iterator["Constraint"]


class Isa(Constraint, ABC):
    """Represents an 'isa' constraint: <instance> isa(!) <type>"""
    def instance(self) -> "ConstraintVertex"
    def type(self) -> "ConstraintVertex"
    def exactness(self) -> "ConstraintExactness" # isa or isa!


class Has(Constraint, ABC):
    """Represents a 'has' constraint: <owner> has <attribute>"""
    def owner(self) -> "ConstraintVertex"
    def attribute(self) -> "ConstraintVertex"

TypeQL statements can be seen as constraints between one or more ConstraintVertex es - These can be type Label s, Variable s or raw Value s.

# Get the constraints from the match stage
match_stage = stages[0]
conjunction = pipeline.conjunction(match_stage.block())
constraints = list(conjunction.constraints())
constraints
Output
[
    Isa { instance: Variable(Variable(0)), type: Label(EntityType(EntityType { label: "person" })), exactness: Subtypes },
    Has { owner: Variable(Variable(0)), attribute: Variable(Variable(1)), exactness: Subtypes },
    Isa { instance: Variable(Variable(1)), type: Label(AttributeType(AttributeType { label: "name", value_type: Some(string) })), exactness: Subtypes }
]

From constraints to the query graph

Before we get to visualising the data returned by queries, We’ll turn the constraints in our query into a query-graph and visualise it. We’ll use a to_query_edge function to go from a Constraint to a labelled edge represented by the tuple (from, label, to)

# Convert all constraints to edges
from typing import Tuple
def to_query_edge(constraint: Constraint) -> Tuple[ConstraintVertex, str, ConstraintVertex]:
    if constraint.is_isa():
        isa = constraint.as_isa()
        return (isa.instance(), "isa", isa.type())
    elif constraint.is_has():
        has = constraint.as_has()
        return (has.owner(), "has", has.attribute())
    else:
        raise NotImplementedError("Not implemented in tutorial.")

query_edges = [to_query_edge(constraint) for constraint in constraints]
query_edges
Output
[
    (Variable(0), 'isa', EntityType(person)),
    (Variable(0), 'has', Variable(1)),
    (Variable(1), 'isa', AttributeType(name))
]

We then use networkx and matplotlib to visualise the graph:

import networkx
from matplotlib import pyplot

query_graph = networkx.MultiDiGraph()
for (u, label, v) in query_edges:
    query_graph.add_edge(u,v)
node_labels = {node: str(node) for node in query_graph.nodes()}
networkx.draw(query_graph, labels=node_labels)
pyplot.show()

output 11 0

Colouring & Labelling the graph

We’ve constructed the structure of the query graph! Now let’s add some colours and labelling. We’ll create a node_style function which returns the style given a vertex, and move the drawing logic into a draw function. Although this code is specific to matplotlib, it can serve as inspiration for other visualization libraries.

from typing import Dict, List, Tuple
def node_style(pipeline: Pipeline, node: ConstraintVertex) -> Dict[str, any]:
    color = "g" if node.is_variable() else "b"
    label = ("$" + pipeline.get_variable_name(node.as_variable())) if node.is_variable() else str(node)
    shape = "s" if node.is_label() else "o"
    return {
        "color": color,
        "label": label,
        "shape": shape,
    }

def draw(edges: List[Tuple[ConstraintVertex, str, ConstraintVertex]], node_styles: Dict[ConstraintVertex, Dict[str, any]]):
    graph = networkx.MultiDiGraph()
    graph.add_edges_from((u,v, label) for (u, label, v) in edges)
    pos = networkx.forceatlas2_layout(graph) if hasattr(networkx, 'forceatlas2_layout') else networkx.planar_layout(graph)

    nodes_by_shape = {node_styles[n]["shape"]: [] for n in graph.nodes}
    for node in node_styles:
        nodes_by_shape[node_styles[node]["shape"]].append(node)

    for (shape, node_subset) in nodes_by_shape.items():
        node_colors = [node_styles[n]["color"] for n in node_subset]
        node_labels = {n: node_styles[n]["label"] for n in node_subset}
        networkx.draw_networkx_nodes(graph, pos, nodelist=nodes_by_shape[shape], node_color=node_colors, node_shape=shape)
        networkx.draw_networkx_labels(graph, pos, labels = node_labels)
    networkx.draw_networkx_edges(graph, pos)

    edge_labels = { (u,v): label for (u, label, v) in edges}
    networkx.draw_networkx_edge_labels(graph, pos, edge_labels=edge_labels)
    pyplot.show()

# Prepare node styles
nodes = set(u for (u,_,_) in query_edges).union(set(v for (_,_,v) in query_edges))
node_styles = { n: node_style(pipeline, n) for n in nodes}
# Draw
draw(query_edges, node_styles)

output 13 0

Visualising Data

TypeDB answers are ConceptRow s which map variables in the queries to concepts in the database. It has a similar interface to a python dictionary. If the include_query_structure QueryOption is set, the pipeline structure will be returned and each row will have shared access to it.

class ConceptRow:
    def column_names(self) -> Iterator[str] # keys
    def concepts(self) -> Iterator[Concept]  # values
    def get(self, column_name: str) -> Optional[Concept] # get

    def query_structure(self) -> Optional["Pipeline"] # Shared access to the pipeline structure

Run a query:

with driver.transaction(DB_NAME, TransactionType.READ) as tx:
    answers = list(tx.query(SIMPLE_QUERY, QUERY_OPTIONS).resolve())
assert 2 == len(answers), "TypeDB answer count mismatch"
answers
Output
[
|  $n: Attribute(name: "Jane")  |  $x: Entity(person: 0x1e00000000000000000001)  |,
|  $n: Attribute(name: "John")  |  $x: Entity(person: 0x1e00000000000000000000)  |
]

Get the pipeline stages from it:

# Every answer also has a reference to the pipeline structure
list(answers[0].query_structure().stages())
Output
[
    Match { block: ConjunctionID(0) }
]

From query structure and rows to graphs

To construct the graph representing an answer, we simply have to substitute the concepts in the answer for the variables in the query-graph. Since there are multiple answers, we just combine the graphs of each answer

from typedb.driver import ConceptRow, Concept
def substitute(pipeline: Pipeline, vertex: ConstraintVertex, row: ConceptRow) -> Concept:
    if vertex.is_label():
        return vertex.as_label()
    elif vertex.is_variable():
        var_name = pipeline.get_variable_name(vertex.as_variable())
        return row.get(var_name) if var_name else None
    else:
        raise NotImplementedError("Not implemented in tutorial. See resolve_constraint_vertex")

answers_as_data_edges = [
    [(substitute(pipeline, u, row), label, substitute(pipeline, v, row)) for (u,label,v) in query_edges]
    for row in answers
]
answers_as_data_edges
Output
[
    [
        (Entity(person: 0x1e00000000000000000001), 'isa', EntityType(person)),
        (Entity(person: 0x1e00000000000000000001), 'has', Attribute(name: "Jane")),
        (Attribute(name: "Jane"), 'isa', AttributeType(name))
    ],
    [
        (Entity(person: 0x1e00000000000000000000), 'isa', EntityType(person)),
        (Entity(person: 0x1e00000000000000000000), 'has', Attribute(name: "John")),
        (Attribute(name: "John"), 'isa', AttributeType(name))
    ]
]

Drawing the graph

We’ll need a new node style since our vertices are now concepts rather than ConstraintVertex es.

# We need to update our node_style
def data_node_style(node: Concept) -> Dict[str, any]:
    if node.is_type():
        label = str(node)
        color = "c"
    elif node.is_attribute():
        label = f"{node.get_type().get_label()}:{node.get_value()}"
        color = "g"
    else:
        label = f"{node.get_type().get_label()}#{node.get_iid()[-4:]}"
        color = "b"

    shape = "s" if (node.is_attribute() or node.is_attribute_type()) else "o"
    return {
        "color": color,
        "label": label,
        "shape": shape,
    }

# Flatten them and remove any duplicate edges:
data_edges = set(e for answer_edges in answers_as_data_edges for e in answer_edges)

# Prepare node styles
nodes = set(u for (u,_,_) in data_edges).union(set(v for (_,_,v) in data_edges))
node_styles = { n: data_node_style(n) for n in nodes}
# Draw
draw(data_edges, node_styles)

output 20 0

Disjunctions and optionals

TypeQL queries can be more than simple conjunctions of patterns. They can also contain disjunctions & optional patterns - meaning not every answer satisfies every constraint.

Consider the query:

BRANCHED_QUERY = """
match
$x isa person;
{ $x has name $y; } or { $x has age $y; };
"""

with driver.transaction(DB_NAME, TransactionType.READ) as tx:
    answers = list(tx.query(BRANCHED_QUERY, QUERY_OPTIONS).resolve())
assert 4 == len(answers), "TypeDB answer count mismatch"
answers
Output
[
|  $x: Entity(person: 0x1e00000000000000000000)  |  $y: Attribute(age: 20)  |,
|  $x: Entity(person: 0x1e00000000000000000001)  |  $y: Attribute(age: 30)  |,
|  $x: Entity(person: 0x1e00000000000000000001)  |  $y: Attribute(name: "Jane")  |,
|  $x: Entity(person: 0x1e00000000000000000000)  |  $y: Attribute(name: "John")  |
]

Get the constraints in the trunk:

pipeline = answers[0].query_structure()
stages = list(pipeline.stages())
trunk_id = stages[0].as_match().block()
trunk = list(pipeline.conjunction(trunk_id).constraints())
(trunk_id, trunk)
Output
(
    2,
    [
        Isa { instance: Variable(Variable(0)), type: Label(EntityType(EntityType { label: "person" })), exactness: Subtypes },
        Or { branches: [ConjunctionID(0), ConjunctionID(1)] }
    ]
)

Get the constraints by branch:

branches = { branch_id: list(pipeline.conjunction(branch_id).constraints()) for branch_id in trunk[1].as_or().branches() }
branches
Output
{
    0: [
        Has { owner: Variable(Variable(0)), attribute: Variable(Variable(1)), exactness: Subtypes },
        Isa { instance: Variable(Variable(1)), type: Label(AttributeType(AttributeType { label: "name", value_type: Some(string) })), exactness: Subtypes }
    ],
    1: [
        Has { owner: Variable(Variable(0)), attribute: Variable(Variable(1)), exactness: Subtypes },
        Isa { instance: Variable(Variable(1)), type: Label(AttributeType(AttributeType { label: "age", value_type: Some(integer) })), exactness: Subtypes }
    ]
}

How do we know which of these branches were satisfied by each answer? If we were to naively draw the constraints from all branches, we’d have some absurd edges such as name:John isa age. Each ConceptRow has an involved_conjunctions method which tells us exactly this.

[list(answer.involved_conjunctions()) for answer in answers]
Output
[
    [1, 2],
    [1, 2],
    [0, 2],
    [0, 2]
]

As expected, the first two answers satisfy the branch with age, and the second two answers satisfy the branch with name (all answers satisfy the trunk, of course). Constructing the graph for these is still quite simple. We take the union of all constraints in each involved conjunction:

def flatten(list_of_lists: List[List[any]]) -> List[any]:
    return [x for l in list_of_lists for x in l]

answers_as_data_edges = []
for row in answers:
    involved_constraints = flatten([list(pipeline.conjunction(conj_id).constraints()) for conj_id in row.involved_conjunctions()])
    as_query_edges = [to_query_edge(c) for c in involved_constraints if not c.is_or()] # Exclude or constraint
    as_data_edges = [(substitute(pipeline, u, row), label, substitute(pipeline, v, row)) for (u,label,v) in as_query_edges]
    answers_as_data_edges.append(as_data_edges)

answers_as_data_edges
Output
[
    [
        (Entity(person: 0x1e00000000000000000000), 'has', Attribute(age: 20)),
        (Attribute(age: 20), 'isa', AttributeType(age)),
        (Entity(person: 0x1e00000000000000000000), 'isa', EntityType(person))
    ],
    [
        (Entity(person: 0x1e00000000000000000001), 'has', Attribute(age: 30)),
        (Attribute(age: 30), 'isa', AttributeType(age)),
        (Entity(person: 0x1e00000000000000000001), 'isa', EntityType(person))
    ],
    [
        (Entity(person: 0x1e00000000000000000001), 'has', Attribute(name: "Jane")),
        (Attribute(name: "Jane"), 'isa', AttributeType(name)),
        (Entity(person: 0x1e00000000000000000001), 'isa', EntityType(person))
    ],
    [
        (Entity(person: 0x1e00000000000000000000), 'has', Attribute(name: "John")),
        (Attribute(name: "John"), 'isa', AttributeType(name)),
        (Entity(person: 0x1e00000000000000000000), 'isa', EntityType(person))
    ]
]

Prepare the styles for the nodes, and draw:

# Prepare node styles
flattened_data_edges = flatten(answers_as_data_edges)
nodes = set(u for (u,_,_) in data_edges).union(set(v for (_,_,v) in flattened_data_edges))
node_styles = { n: data_node_style(n) for n in nodes }
# Draw
draw(flattened_data_edges, node_styles)

output 30 0

Notes on visualising constraints

We have now covered the basics of visualising TypeDB answers as graphs. In this section, we discuss a few cases where TypeDB constraints don’t directly map to a simple labelled binary edge between two concepts, and the solutions we chose when building TypeDB Studio.

Functions

A query can contain function calls. Functions have arguments and return values - all of which are concepts. Unlike a relation, the function call itself is not a concept. We felt the cleanest way to visualise the many-to-many constraint between return values and arguments was to introduce a FunctionCall vertex for each call. Two function calls are the same vertex if they have the same (1) function name, (2) tuple of argument concepts, and (3) tuple of returned concepts.

Expressions

We treat expressions as we did functions, except the string representing the expression is used in place of the function name.

Links constraints are ternary constraints involving the relation instance, the player instance and the played role type. We choose to represent it as a binary edge from the relation to the player, with the role serving as the label of the edge.

Named roles

TypeDB allows the use of unscoped role names in links & relates constraints to specify the role. e.g. For the schema below, the query match $r relates subject; will return both sentence and email as answers for $r;

define
relation sentence relates subject;
relation email relates subject;

Internally, TypeDB introduces an internal variable and constrains it to be any of the role types with the specified name. In the structure returned by the analyze operation, these variables and their associated names are returned as NamedRole vertices. (Since the variable introduced is anonymous, it is unavailable in the ConceptRow. Since it does not uniquely determine the type, it cannot be considered a label.)

Is & comparison constraints

Since is constraints are always a self-edge on a vertex, we choose not to visualise it. We also skip visualising comparison constraints to reduce the number of edges, though they can be useful in certain cases - the comparator symbol is available from the comparison constraint.

The typedb-graph-utils library

In this section, we introduce the typedb-graph-utils python library developed alongside this tutorial. It follows the structure of the TypeScript library we wrote for TypeDB studio.

The essence remains the same as what we’ve covered in the tutorial - find the constraints “involved” in each answer, and substitute in the concepts for the variables. Instead of converting a constraint into query edges, and then applying the substitution to form data-edges, we apply the substitution directly on the Constraint s to obtain a corresponding DataConstraint. These constrain DataVertex es instead of ConstraintVertex es. The library handles the conversion of answers to DataConstraint s, allowing you to focus on constructing your target representation.

Data constraints

Here’s the signature of an Isa DataConstraint from the library, and the Isa Constraint from the driver API to compare against:

# From library
class Isa(DataConstraint):
    def instance(self) -> DataVertex
    def type(self) -> DataVertex
    def exactness(self) -> ConstraintExactness

# From driver
class Isa(Constraint, ABC):
    def instance(self) -> ConstraintVertex
    def type(self) -> ConstraintVertex
    def exactness(self) -> ConstraintExactness # isa or isa!

The TypeDBAnswerConverter interface

We introduce a TypeDBAnswerConverter[OutputType] abstract class, which defines methods the user has to implement - One add_ method per data constraint, and a finish(self) → OutputType to build your final representation.

We provide a sample implementation -NetworkXBuilder - which builds a MultiDiGraph as in this tutorial. We also provide a basic MatplotlibVisualizer with a familiar draw function for inspiration.

Building a visualiser using the library

We now recreate the previous example by implementing the TypeDBAnswerConverter class provided by the library.

from typedb_graph_utils import data_constraint, TypeDBAnswerConverter, MatplotlibVisualizer
from typedb.common.enums import ConstraintExactness
from networkx import MultiDiGraph
class MyTutorialBuilder(TypeDBAnswerConverter[MultiDiGraph]):

    def __init__(self, pipeline: Pipeline):
        super().__init__(pipeline)
        self.graph = MultiDiGraph()

    def finish(self) -> MultiDiGraph:
        return self.graph

    def add_isa(self, isa: data_constraint.Isa):
        edge_type = "isa!" if isa.exactness() == ConstraintExactness.Exact else "isa"
         # Use the edge attributes to store metadata. The visualiser uses it.
        self.graph.add_edge(isa.instance(), isa.type(), label=edge_type)

    def add_has(self, has: data_constraint.Has):
        if has.owner() is None or has.attribute() is None:
            return
        edge_type = "has!" if has.exactness() == ConstraintExactness.Exact else "has"
        self.graph.add_edge(has.owner(), has.attribute(), label = edge_type)

    # We leave the remaining unimplemented for brevity. Check the NetworkXConverter
    def add_comparison(c): pass
    def add_expression(c): pass
    def  add_function_call(c): pass
    def  add_iid(c): pass
    def  add_is(c): pass
    def  add_kind(c): pass
    def  add_label(c): pass
    def  add_links(c): pass
    def  add_owns(c): pass
    def  add_plays(c): pass
    def  add_relates(c): pass
    def  add_sub(c): pass
    def  add_value(c): pass

with driver.transaction(DB_NAME, TransactionType.READ) as tx:
    answers = list(tx.query(BRANCHED_QUERY, QUERY_OPTIONS).resolve())

builder = MyTutorialBuilder(answers[0].query_structure())
for (i, answer) in enumerate(answers):
    builder.add_answer(i, answer)

graph = builder.finish()
MatplotlibVisualizer.draw(graph)

output 35 0

Conclusion

This tutorial combines the rows and query-structure returned by a TypeQL query to construct and visualise a graph representation of the result. We hope this new feature and the typedb-graph-utils library will enable users to construct whatever representation their applications may need.

Next steps

Read the core-concepts section on analyzing queries.

Complete documentation for all the analyzed query structures.

The web-based IDE for TypeDB with graph visualization

Utility libraries for constructing graphs & other structures from TypeDB answers