TypeDB as a knowledge base for LLMs

In their book, Russel & Norvig describe the objective of the Knowledge Representation as “expressing knowledge in a computer-tractable form” so programs can reason on them. As Large Language Models (LLMs) are applied to tasks which involve large repositories of knowledge, the interest in KR solutions such as knowledge graphs is no surprise.
What sets LLMs apart from previous machine learning is their learnt ability to perform a wide variety of new tasks without needing to be re-trained on examples specific to the new task. They can “few-shot” a task based on context provided as input during inference. This shifted the paradigm from having to train a new model for your task, constructing the right context & request as input.
Information retrieval & GraphRAG
To enable LLMs to perform tasks where in larger domains, we naturally turned to information retrieval techniques such as vector-based search. The process of “augmenting” the LLMs input with relevant context came to be known as RAG. GraphRAG soon followed – using an LLM to extract entities & relationships from text and create a “knowledge graph”. When given a query, a graph search identifies the relevant nodes and retrieves the associated text. This showed that organising the documents based on the contained entities and relations usually yielded more relevant results than full-text or vector based RAG – especially in cases where the relevant information is spread across multiple documents.
Knowledge, reasoning & TypeDB
GraphRAG is an improved retrieval mechanism to provide LLMs with relevant unstructured textual information. It organises text in a basic graph but does not apply the semantics of your data-model, leaving the reasoning to the LLM on the text.
TypeDB was built for strong semantics in your data and complex reasoning in your queries – turning information from your domain into computable “knowledge”. Can an LLM use TypeQL queries to directly answer questions using this knowledge.?
Question answering
Question answering seems like a natural choice of task for databases, whose primary purpose is “answering queries”. We considered using one of our existing schemas such as STIX, constructing the graph from articles and handwriting interesting queries (as we’ve done here), but realised it would be hard to evaluate how well our solution worked.
So we instead decided to use an existing benchmark task – the 2WikiMultiHopQA dataset which evaluates “multihop reasoning”. The dataset consists of is built from wikipedia articles, with objective questions (from simple domains), their answers and sentences retrieved from the articles which are needed to answer the question (along with some distractor sentences). This makes it easy to understand, use and evaluate.
There are drawbacks to this choice of dataset – it was designed to evaluate an LLMs reasoning ability given the relevant paragraphs. This removes some of the typical challenges such as entity-resolution & relevance of retrieved documents. It is also outdated as large modern LLMs are excellent at it.
Still, we believe it to be a great example to demonstrate constructing a TypeDB database from text, and its use as a knowledge base for LLMs. It also serves as a good starting point for future evaluations on larger, more complex datasets.
Dataset format
The dataset consists of a dev, train and test set. Each of these contain a number of examples. An example consists of a question, the expected answer & a “context”, as well as an expected set of “evidence” triples and “supporting documents” which test the system’s ability to back up its answer.
{
"question": "Who is the mother of the director of film Polish-Russian War (Film)?",
"context": [ /* see below */ ],
"supporting_facts": [
["Polish-Russian War (film)",1],
["Xawery Żuławski",2]
],
"evidences": [
["Polish-Russian War", "director","Xawery Żuławski"],
["Xawery Żuławski", "mother", "Małgorzata Braunek"]
],
"answer": "Małgorzata Braunek"
}
The context is a list with element: [title, [sentence1, sentence2, …]].
The title is that of a wikipedia page, and sentences are relevant sentences retrieved from it. e.g.:
Polish-Russian War (film)
Polish-Russian War (Wojna polsko-ruska) is a 2009 Polish film directed by Xawery Żuławski based on the novel Polish-Russian War under the white-red flag by Dorota Masłowska.
GraphRAG & KRR on TypeDB
The github for this project is here: https://github.com/krishnangovindraj/typedb-kgqa
Baseline: GraphRAG
To demonstrate TypeQL’s flexibility, we implement a naive form of GraphRAG. We use a generic schema consisting mainly of entity-node and relation-node which have a label, an embedding based on this label, and a source from which it was built. We also have property-nodes which connect these to their attributes.
The sample sentence generates the following put statements:
put $doc-polish-russian-war-film isa meta-document, has meta-page-title "Polish-Russian War (film)";
put $film-polish-russian-war isa entity-node, has node-label "film:polish-russian-war";
put $film-polish-russian-war has embedding "<base64-encoded-embeddding>";
put $_ isa meta-knowledge-source, links (knowledge: $film-polish-russian-war, source: $doc-polish-russian-war-film);
...
put $person-xawery-zulawski isa entity-node, has node-label "person:xawery-zulawski";
# (embedding and knowledge-source)
put $r-directed-xawery-zulawski-polish-russian-war isa relation-node, has node-label "directed:xawery-zulawski:polish-russian-war", links (related: $person-xawery-zulawski, related: $film-polish-russian-war);
# (embedding and knowledge-source)
Our GraphRAG query embeds the question and runs a simple TypeQL query:
match
let $query-embedding = "{query_embedding}";
let $looked-up-embedding in embeddings_by_similarity($query-embedding);
let $neighbour-doc in graph_neigbhour_documents_by_similarity($query-embedding, $looked-up-embedding, 0.0);
$neighbour-doc has text-content $text, has meta-page-title $title;
select $title, $text;
limit {max_docs};
We then add this to the prompt we pass the LLM, which returns answers in the format:
{
"answer": "Małgorzata Braunek",
"supporting": {
"Polish-Russian War (film)": "Polish-Russian War (Wojna polsko-ruska) is a 2009 Polish film directed by Xawery Żuławski based on the novel Polish-Russian War under the white-red flag by Dorota Masłowska.",
"Xawery Żuławski": "He is the son of actress Małgorzata Braunek and director Andrzej Żuławski."
}
}
It works, though it’s worth remembering that this is a really simple GraphRAG that does a local search of just one hop to the neighbours. Actual implementations use a variety of techniques including a “global search” based on community summarisation.
Note: TypeDB doesn’t have vector-indexing yet, so we’ve emulated it (on this fork)- Our vectors are encoded as string attributes, and similarity search involves exhaustively comparing all neighbours. We’re working on a very early prototype of TypeDB with true vector and vector indexing support – get in touch if you’d like to see it in TypeDB sooner!
TypeQL KRR
GraphRAG still relied on the LLM to reason on the text and answer the question. In the Knowledge Representation & Reasoning (KRR) approach, the challenge is to construct a graph which captures all the information relevant to our domain.
Designing the schema
As always with TypeQL (and other KG+LLM settings), the first thing to do is to write the schema. First, we looked at some existing schemas for the web – like schema.org. These felt too “bottom-up” and vast – we wanted a closed domain after all. Inspired by a talk on the Basic Formal Ontology, we decided to write our own basic, not-so-formal schema. We then asked Claude to look at the questions from the dev-set and specialise this into a domain-specific schema.
Knowledge graph construction
For each example, we use an LLM to extract all entities & relations which correspond to some type in the schema. The prompt focuses on put statements involving a minimal subset of TypeQL – isa, has and links constraints, and encodes the schema in a similar DSL. For the sample, it generates:
put $polish-russian-war isa film, has normalised-name "polish russian war", has alternate-name "wojna polsko-ruska";
put $release-polish-russian-war isa release, links (released-work: $polish-russian-war), has occured-at 2009-01-01T00:00:00;
put $xawery isa person, has normalised-name "xawery zulawski";
put $xawery has gender "male";
put $film-direction-polish-russian-war isa film-direction, links (director: $xawery, directed-film: $polish-russian-war);
Notice there is much more information here, since we need to encode all the facts relevant to our domain.
Query generation
We use an LLM to generate the TypeQL query corresponding to a question. The prompt contains a few examples, the schema in DSL, and the question. As before, we ask the LLM to return the answer and supporting documents using a fetch clause:
match
$film isa film, has normalised-name "polish-russian war";
$director isa person;
$film-direction isa film-direction, links (directed-film: $film, director: $director);
$mother isa person, has gender "female";
$parentage isa parentage, links (child: $director, parent: $mother);
fetch {
"answer": $mother.normalised-name,
"supporting": { "film": { $film.* }, "director": {...}, "mother": {...} }
};
We can see this approach works just as well, with intermediate entities returned as well. We could also link each fact to the source they were generated from to provide sources.
{
"answer": "malgorzata braunek",
"supporting": {
"film": {
"alternate-name": ["wojna polsko-ruska"],
"normalised-name": "polish russian war"
},
"mother": {
"normalised-name": "malgorzata braunek",
"gender": "female"
},
"director": {
"normalised-name": "xawery zulawski",
"gender": "male"
}
},
}
Takeaways
Schema validated construction leads to better knowledge graphs
Having a schema is essential to inform the LLM of the kind of information we’re interested in, as well as ensuring the type assigned the extracted entities & relations makes sense.
As an example,
Cleomenes II( died 309 BC) was Agiad King of Sparta from 369 to 309 BC.
Since our schema included a nationality , the LLM generated the following put:
put $cleomenes-ii isa person, has name "cleomenes ii";
put $sparta isa city, has name "sparta";
put $nationality-cleomenes isa nationality, links (national: $cleomenes-ii, nation: $sparta);
This failed validation as Sparta was inserted as a city, while only a country can play the nation role. Sparta is (according to wikipedia) a city-state.
This presents the challenges of writing schema for open domains, but also shows how having a schema improves the data-quality.
TypeQL allows LLMs to reason on the schema level
GraphRAG only fetches relevant text resources. The LLM must still read through all the text and reason to arrive at the answer. With TypeQL, the LLM can do the reasoning at the schema level and write queries which directly answer the question. This gives it an advantage in queries which involve multi-hop reasoning, and saves on LLM inference.
Knowledge encoded in LLMs helps fill in gaps
Our schema has three subtypes of place: city, state and country. The LLM does a great job of classifying these from a sentence like “North Marion High School is a public high school in Aurora, Oregon, United States.”
put $aurora isa city, has normalised-name "aurora";
put $oregon isa state, has normalised-name "oregon";
put $usa isa country, has normalised-name "united states";
put $north-marion-or isa school, has normalised-name "north marion high school oregon";
# And it can construct the location hierarchy
put $location-north-marion-or isa located-in, links (located-subject: $north-marion-or, containing-location: $aurora);
put $location-aurora isa place-located-in, links (contained-location: $aurora, containing-location: $oregon);
put $location-aurora isa place-located-in, links (contained-location: $oregon, containing-location: $usa);
TypeDB functions as “tool use”?
A TypeQL schema can contain functions for common reasoning tasks. In our schema, we wrote the following function to reason about the location of a function at different granularity:fun located_in_transitive($contained: base-entity) -> { base-place }
This helps in questions like “Are North Marion High School (Oregon) and Seoul High School both located in the same country?”
match
$school1 isa school;
$school1 has name "north marion high school oregon";
$school2 isa school;
$school2 has name "seoul high school";
let $country1 in located_in_transitive($school1);
let $country2 in located_in_transitive($school2);
let $answer = is_same($country1, $country2);
fetch { "answer": $answer, "supporting": { ... }, };
We also emulated a ternary operator for questions like: Which film has the director who was born later, El Extraño Viaje or Love In Pawn?
match
$film1 isa film, has normalised-name "el extrano viaje";
$film2 isa film, has normalised-name "love in pawn";
$_ isa film-direction, links (directed-film: $film1, director: $director1);
$_ isa film-direction, links (directed-film: $film2, director: $director2);
$birth1 isa birth, links (person: $director1), has occured-at $birth-date1;
$birth2 isa birth, links (person: $director2),has occured-at $birth-date2;
let $later-born-director = ternary(less_or_equal($birth-date1, $birth-date2), $director2, $director1);
let $answer-film = ternary(less_or_equal($birth-date1, $birth-date2), $film2, $film1);
fetch {
"answer": $answer-film.normalised-name,
"supporting": { ... },
};
Advanced agents could write their own preamble functions relevant to the question at hand.
KRR is better suited for specific domains, GraphRAG for open domains
Although the KRR approach was quite successful in our closed domain of questions, TypeDB is still an exact system with fixed semantics and no “general knowledge” about the world beyond that encoded in the schema. This makes it sensitive to ambiguity or missing information in the constructed graph (as with Sparta, which the strong typing caught). For more open domains, a hybrid approach with a “high-level” schema and more aspects of GraphRAG may be more suitable. Nevertheless, for the closed-domain generated from the questions, it was very successful at extracting the relevant facts and answering queries one-shot.
Local models were sensitive to schema size in context
Before turning to Claude, we experimented with reasonably sized local coder models for our KRR task. These are typically faster (and cheaper) than using Claude or a hosted service. However, the models struggled when both the paragraphs and the schema were long. We could introduce the triple-extraction from GraphRAG as a first step, converting the text to triples (without looking at the schema). The second step converts these to TypeQL looking at the schema. The first step removes dependencies between triples so we can use smaller chunks. We can also use specialised models for each step – text-summarisation/entity-extraction for the first, and coder models for the second.
To support even larger schema, we may have to filter the schema to only relevant types to include in our prompt.
Future work
This is a POC, which could be developed into a proper library. We’ve also not spent much time tuning our prompts or experimenting with different models. With that done, it would be great to evaluate either approach (and possibly a hybrid one) on a full run of the test-set. If we do well, we could move on to more recent benchmarks such as LongBench. It would also be interesting to remodel tool-use benchmarks (such as BFCL) as “strongly-typed” to see if agents benefit from typing information.
Conclusion
Database query languages were created as an abstraction, decoupling the data from its manipulation and access. As LLMs are applied to more complex, multi-step tasks involving vast amounts of knowledge there is a need to equip them with the tools they need to scale to these tasks. This POC demonstrates TypeQL’s flexibility – both as a traditional “textual” knowledge store for GraphRAG, and as an expressive abstraction for reasoning over knowledge. It also hints at the potential of TypeQL functions as an alternate, strongly-typed representation for LLM tooling in agentic workflows.
The code is available on GitHub, where we welcome contributions on all fronts. We know a lot of you have been working on similar projects, and would love to hear your thoughts and suggestions on our Discord.






