Graph Visualisation for TypeDB Data

|
This tutorial is based on the jupyter notebook in the typedb-graph-utils repository. You can set it up and run it locally. |
This is a tutorial to visualise TypeDB data as graphs
using the query representation returned by the analyze endpoint,
and with query results (when the include_query_structure QueryOption is set).
It uses the TypeDB python driver for interacting with TypeDB, networkx for building the graph, and matplotlib for visualisation. We chose networkx and matplotlib because they’re widely used, but you could use any libraries - TypeDB studio uses sigmajs and graphology.
At the end, we present the typedb-graph-utils python library,
which provides an easy interface to build graphs and other structures from TypeDB query results.
|
This tutorial requires TypeDB version >= 3.7.0 |
Dataset
For the following sections, we’ll use this toy dataset.
# schema
define
attribute name, value string;
attribute age, value integer;
entity person, owns name, owns age;
# data
insert
$john isa person, has name "John", has age 20;
$jane isa person, has name "Jane", has age 30;
# simple_query
match $x isa person, has name $n;
Code to set up
SCHEMA = """
define
attribute name, value string;
attribute age, value integer;
entity person, owns name, owns age;
"""
DATA = """
insert
$john isa person, has name "John", has age 20;
$jane isa person, has name "Jane", has age 30;
"""
SIMPLE_QUERY = """
match $x isa person, has name $n;
"""
from typedb.driver import TypeDB, Credentials, DriverOptions, TypeDB, QueryOptions, TransactionType
DB_ADDRESS = "127.0.0.1:1729"
DB_CREDENTIALS = Credentials("admin", "password")
DRIVER_OPTIONS = DriverOptions(is_tls_enabled=False)
QUERY_OPTIONS = QueryOptions()
QUERY_OPTIONS.include_query_structure = True
DB_NAME = "typedb-graph-tutorial-py"
def setup(driver, schema, data):
if DB_NAME in [db.name for db in driver.databases.all()]:
driver.databases.get(DB_NAME).delete()
driver.databases.create(DB_NAME)
with driver.transaction(DB_NAME, TransactionType.SCHEMA) as tx:
tx.query(schema).resolve()
tx.commit()
with driver.transaction(DB_NAME, TransactionType.WRITE) as tx:
rows = list(tx.query(data).resolve())
assert 1 == len(rows)
tx.commit()
driver = TypeDB.driver(DB_ADDRESS, DB_CREDENTIALS, DRIVER_OPTIONS)
setup(driver, SCHEMA, DATA)
Analyze & the query graph
TypeDB allows users to analyze a query to get a type-annotated representation of it.
This includes the pipeline structure, and typing information for the variables in each pattern.
with driver.transaction(DB_NAME, TransactionType.READ) as tx:
analyzed = tx.analyze(SIMPLE_QUERY).resolve()
A pipeline is made up of PipelineStage s, some of which are made of
Conjunction s. structure, and typing information for the
variables in each pattern.
from typedb.analyze import Pipeline, Constraint, ConstraintVertex
class Pipeline:
def stages(self) -> Iterator[PipelineStage]
def conjunction(self, conjunction_id: ConjunctionID) -> Optional[Conjunction]
# ...
class MatchStage(PipelineStage):
def block(self) -> "ConjunctionID"
class InsertStage(PipelineStage):
def block(self) -> "ConjunctionID"
# Analyze the simple query
pipeline = analyzed.pipeline()
stages = list(pipeline.stages())
stages
Output
[
Match { block: ConjunctionID(0) }
]
A conjunction is a collection of Constraint s, which should be
familiar from the TypeQL statements:
class Conjunction:
def constraints(self) -> Iterator["Constraint"]
class Isa(Constraint, ABC):
"""Represents an 'isa' constraint: <instance> isa(!) <type>"""
def instance(self) -> "ConstraintVertex"
def type(self) -> "ConstraintVertex"
def exactness(self) -> "ConstraintExactness" # isa or isa!
class Has(Constraint, ABC):
"""Represents a 'has' constraint: <owner> has <attribute>"""
def owner(self) -> "ConstraintVertex"
def attribute(self) -> "ConstraintVertex"
TypeQL statements can be seen as constraints between one or more
ConstraintVertex es - These can be type Label s, Variable s or
raw Value s.
# Get the constraints from the match stage
match_stage = stages[0]
conjunction = pipeline.conjunction(match_stage.block())
constraints = list(conjunction.constraints())
constraints
Output
[
Isa { instance: Variable(Variable(0)), type: Label(EntityType(EntityType { label: "person" })), exactness: Subtypes },
Has { owner: Variable(Variable(0)), attribute: Variable(Variable(1)), exactness: Subtypes },
Isa { instance: Variable(Variable(1)), type: Label(AttributeType(AttributeType { label: "name", value_type: Some(string) })), exactness: Subtypes }
]
From constraints to the query graph
Before we get to visualising the data returned by queries, We’ll turn
the constraints in our query into a query-graph and visualise it. We’ll
use a to_query_edge function to go from a Constraint to a labelled
edge represented by the tuple (from, label, to)
# Convert all constraints to edges
from typing import Tuple
def to_query_edge(constraint: Constraint) -> Tuple[ConstraintVertex, str, ConstraintVertex]:
if constraint.is_isa():
isa = constraint.as_isa()
return (isa.instance(), "isa", isa.type())
elif constraint.is_has():
has = constraint.as_has()
return (has.owner(), "has", has.attribute())
else:
raise NotImplementedError("Not implemented in tutorial.")
query_edges = [to_query_edge(constraint) for constraint in constraints]
query_edges
Output
[
(Variable(0), 'isa', EntityType(person)),
(Variable(0), 'has', Variable(1)),
(Variable(1), 'isa', AttributeType(name))
]
We then use networkx and matplotlib to visualise the graph:
import networkx
from matplotlib import pyplot
query_graph = networkx.MultiDiGraph()
for (u, label, v) in query_edges:
query_graph.add_edge(u,v)
node_labels = {node: str(node) for node in query_graph.nodes()}
networkx.draw(query_graph, labels=node_labels)
pyplot.show()

Colouring & Labelling the graph
We’ve constructed the structure of the query graph! Now let’s add some colours and labelling.
We’ll create a node_style function which returns the style given a vertex,
and move the drawing logic into a draw function.
Although this code is specific to matplotlib, it can serve as inspiration for other visualization libraries.
from typing import Dict, List, Tuple
def node_style(pipeline: Pipeline, node: ConstraintVertex) -> Dict[str, any]:
color = "g" if node.is_variable() else "b"
label = ("$" + pipeline.get_variable_name(node.as_variable())) if node.is_variable() else str(node)
shape = "s" if node.is_label() else "o"
return {
"color": color,
"label": label,
"shape": shape,
}
def draw(edges: List[Tuple[ConstraintVertex, str, ConstraintVertex]], node_styles: Dict[ConstraintVertex, Dict[str, any]]):
graph = networkx.MultiDiGraph()
graph.add_edges_from((u,v, label) for (u, label, v) in edges)
pos = networkx.forceatlas2_layout(graph) if hasattr(networkx, 'forceatlas2_layout') else networkx.planar_layout(graph)
nodes_by_shape = {node_styles[n]["shape"]: [] for n in graph.nodes}
for node in node_styles:
nodes_by_shape[node_styles[node]["shape"]].append(node)
for (shape, node_subset) in nodes_by_shape.items():
node_colors = [node_styles[n]["color"] for n in node_subset]
node_labels = {n: node_styles[n]["label"] for n in node_subset}
networkx.draw_networkx_nodes(graph, pos, nodelist=nodes_by_shape[shape], node_color=node_colors, node_shape=shape)
networkx.draw_networkx_labels(graph, pos, labels = node_labels)
networkx.draw_networkx_edges(graph, pos)
edge_labels = { (u,v): label for (u, label, v) in edges}
networkx.draw_networkx_edge_labels(graph, pos, edge_labels=edge_labels)
pyplot.show()
# Prepare node styles
nodes = set(u for (u,_,_) in query_edges).union(set(v for (_,_,v) in query_edges))
node_styles = { n: node_style(pipeline, n) for n in nodes}
# Draw
draw(query_edges, node_styles)

Visualising Data
TypeDB answers are ConceptRow s which map variables in the queries to
concepts in the database. It has a similar interface to a python
dictionary. If the include_query_structure QueryOption is set,
the pipeline structure will be returned and each row will have shared
access to it.
class ConceptRow:
def column_names(self) -> Iterator[str] # keys
def concepts(self) -> Iterator[Concept] # values
def get(self, column_name: str) -> Optional[Concept] # get
def query_structure(self) -> Optional["Pipeline"] # Shared access to the pipeline structure
Run a query:
with driver.transaction(DB_NAME, TransactionType.READ) as tx:
answers = list(tx.query(SIMPLE_QUERY, QUERY_OPTIONS).resolve())
assert 2 == len(answers), "TypeDB answer count mismatch"
answers
Output
[ | $n: Attribute(name: "Jane") | $x: Entity(person: 0x1e00000000000000000001) |, | $n: Attribute(name: "John") | $x: Entity(person: 0x1e00000000000000000000) | ]
Get the pipeline stages from it:
# Every answer also has a reference to the pipeline structure
list(answers[0].query_structure().stages())
Output
[
Match { block: ConjunctionID(0) }
]
From query structure and rows to graphs
To construct the graph representing an answer, we simply have to substitute the concepts in the answer for the variables in the query-graph. Since there are multiple answers, we just combine the graphs of each answer
from typedb.driver import ConceptRow, Concept
def substitute(pipeline: Pipeline, vertex: ConstraintVertex, row: ConceptRow) -> Concept:
if vertex.is_label():
return vertex.as_label()
elif vertex.is_variable():
var_name = pipeline.get_variable_name(vertex.as_variable())
return row.get(var_name) if var_name else None
else:
raise NotImplementedError("Not implemented in tutorial. See resolve_constraint_vertex")
answers_as_data_edges = [
[(substitute(pipeline, u, row), label, substitute(pipeline, v, row)) for (u,label,v) in query_edges]
for row in answers
]
answers_as_data_edges
Output
[
[
(Entity(person: 0x1e00000000000000000001), 'isa', EntityType(person)),
(Entity(person: 0x1e00000000000000000001), 'has', Attribute(name: "Jane")),
(Attribute(name: "Jane"), 'isa', AttributeType(name))
],
[
(Entity(person: 0x1e00000000000000000000), 'isa', EntityType(person)),
(Entity(person: 0x1e00000000000000000000), 'has', Attribute(name: "John")),
(Attribute(name: "John"), 'isa', AttributeType(name))
]
]
Drawing the graph
We’ll need a new node style since our vertices are now concepts rather
than ConstraintVertex es.
# We need to update our node_style
def data_node_style(node: Concept) -> Dict[str, any]:
if node.is_type():
label = str(node)
color = "c"
elif node.is_attribute():
label = f"{node.get_type().get_label()}:{node.get_value()}"
color = "g"
else:
label = f"{node.get_type().get_label()}#{node.get_iid()[-4:]}"
color = "b"
shape = "s" if (node.is_attribute() or node.is_attribute_type()) else "o"
return {
"color": color,
"label": label,
"shape": shape,
}
# Flatten them and remove any duplicate edges:
data_edges = set(e for answer_edges in answers_as_data_edges for e in answer_edges)
# Prepare node styles
nodes = set(u for (u,_,_) in data_edges).union(set(v for (_,_,v) in data_edges))
node_styles = { n: data_node_style(n) for n in nodes}
# Draw
draw(data_edges, node_styles)

Disjunctions and optionals
TypeQL queries can be more than simple conjunctions of patterns. They can also contain disjunctions & optional patterns - meaning not every answer satisfies every constraint.
Consider the query:
BRANCHED_QUERY = """
match
$x isa person;
{ $x has name $y; } or { $x has age $y; };
"""
with driver.transaction(DB_NAME, TransactionType.READ) as tx:
answers = list(tx.query(BRANCHED_QUERY, QUERY_OPTIONS).resolve())
assert 4 == len(answers), "TypeDB answer count mismatch"
answers
Output
[ | $x: Entity(person: 0x1e00000000000000000000) | $y: Attribute(age: 20) |, | $x: Entity(person: 0x1e00000000000000000001) | $y: Attribute(age: 30) |, | $x: Entity(person: 0x1e00000000000000000001) | $y: Attribute(name: "Jane") |, | $x: Entity(person: 0x1e00000000000000000000) | $y: Attribute(name: "John") | ]
Get the constraints in the trunk:
pipeline = answers[0].query_structure()
stages = list(pipeline.stages())
trunk_id = stages[0].as_match().block()
trunk = list(pipeline.conjunction(trunk_id).constraints())
(trunk_id, trunk)
Output
(
2,
[
Isa { instance: Variable(Variable(0)), type: Label(EntityType(EntityType { label: "person" })), exactness: Subtypes },
Or { branches: [ConjunctionID(0), ConjunctionID(1)] }
]
)
Get the constraints by branch:
branches = { branch_id: list(pipeline.conjunction(branch_id).constraints()) for branch_id in trunk[1].as_or().branches() }
branches
Output
{
0: [
Has { owner: Variable(Variable(0)), attribute: Variable(Variable(1)), exactness: Subtypes },
Isa { instance: Variable(Variable(1)), type: Label(AttributeType(AttributeType { label: "name", value_type: Some(string) })), exactness: Subtypes }
],
1: [
Has { owner: Variable(Variable(0)), attribute: Variable(Variable(1)), exactness: Subtypes },
Isa { instance: Variable(Variable(1)), type: Label(AttributeType(AttributeType { label: "age", value_type: Some(integer) })), exactness: Subtypes }
]
}
How do we know which of these branches were satisfied by each answer? If
we were to naively draw the constraints from all branches, we’d have
some absurd edges such as name:John isa age. Each ConceptRow has
an involved_conjunctions method which tells us exactly this.
[list(answer.involved_conjunctions()) for answer in answers]
Output
[
[1, 2],
[1, 2],
[0, 2],
[0, 2]
]
As expected, the first two answers satisfy the branch with age, and the second two answers satisfy the branch with name (all answers satisfy the trunk, of course). Constructing the graph for these is still quite simple. We take the union of all constraints in each involved conjunction:
def flatten(list_of_lists: List[List[any]]) -> List[any]:
return [x for l in list_of_lists for x in l]
answers_as_data_edges = []
for row in answers:
involved_constraints = flatten([list(pipeline.conjunction(conj_id).constraints()) for conj_id in row.involved_conjunctions()])
as_query_edges = [to_query_edge(c) for c in involved_constraints if not c.is_or()] # Exclude or constraint
as_data_edges = [(substitute(pipeline, u, row), label, substitute(pipeline, v, row)) for (u,label,v) in as_query_edges]
answers_as_data_edges.append(as_data_edges)
answers_as_data_edges
Output
[
[
(Entity(person: 0x1e00000000000000000000), 'has', Attribute(age: 20)),
(Attribute(age: 20), 'isa', AttributeType(age)),
(Entity(person: 0x1e00000000000000000000), 'isa', EntityType(person))
],
[
(Entity(person: 0x1e00000000000000000001), 'has', Attribute(age: 30)),
(Attribute(age: 30), 'isa', AttributeType(age)),
(Entity(person: 0x1e00000000000000000001), 'isa', EntityType(person))
],
[
(Entity(person: 0x1e00000000000000000001), 'has', Attribute(name: "Jane")),
(Attribute(name: "Jane"), 'isa', AttributeType(name)),
(Entity(person: 0x1e00000000000000000001), 'isa', EntityType(person))
],
[
(Entity(person: 0x1e00000000000000000000), 'has', Attribute(name: "John")),
(Attribute(name: "John"), 'isa', AttributeType(name)),
(Entity(person: 0x1e00000000000000000000), 'isa', EntityType(person))
]
]
Prepare the styles for the nodes, and draw:
# Prepare node styles
flattened_data_edges = flatten(answers_as_data_edges)
nodes = set(u for (u,_,_) in data_edges).union(set(v for (_,_,v) in flattened_data_edges))
node_styles = { n: data_node_style(n) for n in nodes }
# Draw
draw(flattened_data_edges, node_styles)

Notes on visualising constraints
We have now covered the basics of visualising TypeDB answers as graphs. In this section, we discuss a few cases where TypeDB constraints don’t directly map to a simple labelled binary edge between two concepts, and the solutions we chose when building TypeDB Studio.
Functions
A query can contain function calls. Functions have arguments and return
values - all of which are concepts. Unlike a relation, the function call
itself is not a concept. We felt the cleanest way to visualise the
many-to-many constraint between return values and arguments was to
introduce a FunctionCall vertex for each call. Two function calls
are the same vertex if they have the same (1) function name, (2) tuple
of argument concepts, and (3) tuple of returned concepts.
Expressions
We treat expressions as we did functions, except the string representing the expression is used in place of the function name.
Links constraints
Links constraints are ternary constraints involving the relation instance, the player instance and the played role type. We choose to represent it as a binary edge from the relation to the player, with the role serving as the label of the edge.
Named roles
TypeDB allows the use of unscoped role names in links & relates
constraints to specify the role. e.g. For the schema below, the query
match $r relates subject; will return both sentence and
email as answers for $r;
define
relation sentence relates subject;
relation email relates subject;
Internally, TypeDB introduces an internal variable and constrains it to
be any of the role types with the specified name. In the structure
returned by the analyze operation, these variables and their
associated names are returned as NamedRole vertices. (Since the
variable introduced is anonymous, it is unavailable in the ConceptRow.
Since it does not uniquely determine the type, it cannot be considered a
label.)
Is & comparison constraints
Since is constraints are always a self-edge on a vertex, we choose
not to visualise it. We also skip visualising comparison constraints to
reduce the number of edges, though they can be useful in certain cases -
the comparator symbol is available from the comparison constraint.
The typedb-graph-utils library
In this section, we introduce the typedb-graph-utils python library developed alongside this tutorial.
It follows the structure of the TypeScript library we wrote for TypeDB studio.
The essence remains the same as what we’ve covered in the tutorial -
find the constraints “involved” in each answer, and substitute in the concepts for the variables.
Instead of converting a constraint into query edges, and then applying the substitution to form data-edges,
we apply the substitution directly on the Constraint s to obtain a corresponding DataConstraint.
These constrain DataVertex es instead of ConstraintVertex es.
The library handles the conversion of answers to DataConstraint s, allowing you to focus on constructing your target representation.
Data constraints
Here’s the signature of an Isa DataConstraint from the library,
and the Isa Constraint from the driver API to compare against:
# From library
class Isa(DataConstraint):
def instance(self) -> DataVertex
def type(self) -> DataVertex
def exactness(self) -> ConstraintExactness
# From driver
class Isa(Constraint, ABC):
def instance(self) -> ConstraintVertex
def type(self) -> ConstraintVertex
def exactness(self) -> ConstraintExactness # isa or isa!
The TypeDBAnswerConverter interface
We introduce a TypeDBAnswerConverter[OutputType] abstract class, which defines methods the user has to implement -
One add_ method per data constraint, and a finish(self) → OutputType to build your final representation.
We provide a sample implementation -NetworkXBuilder - which builds a MultiDiGraph as in this tutorial.
We also provide a basic MatplotlibVisualizer with a familiar draw function for inspiration.
Building a visualiser using the library
We now recreate the previous example by implementing the TypeDBAnswerConverter class provided by the library.
from typedb_graph_utils import data_constraint, TypeDBAnswerConverter, MatplotlibVisualizer
from typedb.common.enums import ConstraintExactness
from networkx import MultiDiGraph
class MyTutorialBuilder(TypeDBAnswerConverter[MultiDiGraph]):
def __init__(self, pipeline: Pipeline):
super().__init__(pipeline)
self.graph = MultiDiGraph()
def finish(self) -> MultiDiGraph:
return self.graph
def add_isa(self, isa: data_constraint.Isa):
edge_type = "isa!" if isa.exactness() == ConstraintExactness.Exact else "isa"
# Use the edge attributes to store metadata. The visualiser uses it.
self.graph.add_edge(isa.instance(), isa.type(), label=edge_type)
def add_has(self, has: data_constraint.Has):
if has.owner() is None or has.attribute() is None:
return
edge_type = "has!" if has.exactness() == ConstraintExactness.Exact else "has"
self.graph.add_edge(has.owner(), has.attribute(), label = edge_type)
# We leave the remaining unimplemented for brevity. Check the NetworkXConverter
def add_comparison(c): pass
def add_expression(c): pass
def add_function_call(c): pass
def add_iid(c): pass
def add_is(c): pass
def add_kind(c): pass
def add_label(c): pass
def add_links(c): pass
def add_owns(c): pass
def add_plays(c): pass
def add_relates(c): pass
def add_sub(c): pass
def add_value(c): pass
with driver.transaction(DB_NAME, TransactionType.READ) as tx:
answers = list(tx.query(BRANCHED_QUERY, QUERY_OPTIONS).resolve())
builder = MyTutorialBuilder(answers[0].query_structure())
for (i, answer) in enumerate(answers):
builder.add_answer(i, answer)
graph = builder.finish()
MatplotlibVisualizer.draw(graph)
