Vibe coding with TypeDB just got a whole lot better!

Let’s get the awkward part out of the way: vibe coding with TypeDB used to suck.
If you’ve pointed a coding agent at a TypeDB project and asked it to “just write the schema” or “query for all users connected to X,” you’ve probably watched it confidently generate TypeQL that is just nonsense. The problem is that frontier LLMs simply haven’t seen enough TypeQL in training to be reliably correct.
Fret not, we hear you and things are about to get a whole lot better!
How bad? Let’s quantify it
You can’t fix what you don’t measure. So we built typeql-evals, a small evals-based benchmark that measures LLM accuracy in generating valid TypeQL queries. It consists of simple schema and a list of tasks described in plain english such as “find these entities”, “relate those two entities”, “count the ones matching some condition” and so on. The cases are relatively simple, but span the full surface of the language: define, insert, delete, update, match, fetch, reduce, pipelines, expressions, and the rest. We hand each task to a model, pull the TypeQL out of its answer, and check for correctness.
The result shows that an overwhelming majority of failures weren’t bad logic or misunderstood requirements. LLMs generally understood what to query, but just couldn’t spell it in valid TypeQL: using wrong keywords, stale syntax from older versions or simply hallucinating syntax that never existed.
Acting on the numbers: typeql-check, TypeQL skill
If syntax is the bottleneck, give the agent a way to check syntax. Two pieces came out of that:
- typeql-check: a lightweight CLI that validates whether a query or schema actually parses against the current TypeQL grammar. Once installed, our coding agent can leverage it to validate the query it generates and correct them for mistakes
- TypeQL skill: An agent skill originally made by our enthusiastic community member Luca and with a slew of contributions from our CTO Josh. It is responsible for two things: guiding LLMs to write better TypeQL and instructing it use
typeql-checkfor validation, if installed. In the future we’ll grow the repertoire to include design patterns, refactoring and other things.
How much better it is now then?
Here’s the comparison between the accuracy of vanilla Claude Sonnet and Opus, against the same ones but equipped with the new toolings:
anthropic:claude-sonnet-4-6 / sp=none / tools=[] : pass-rate=23.33%anthropic:claude-sonnet-4-6 / sp=luca / tools=['typeql_check'] : pass-rate=93.66%anthropic:claude-opus-4-7 / sp=none / tools=[] : pass-rate=43.33%anthropic:claude-opus-4-7 / sp=luca / tools=['typeql_check'] : pass-rate=96.43%
To put it plainly:
- Sonnet 4.6 jumps 4x from 23% to 93%
- Opus 4.7 climbs 2x from 43% to 96%
Huge gains from a simple act of letting the agent check its own work!
Add them to your coding agent
Now let’s make sure you enable your coding agent to reap the benefit. Follow our coding agent configuration guide to configure TypeQL skill and typeql-check. With these two installed, you should see your coding agent automatically using them and producing much more accurate TypeQL queries.
So, if you were struggling with TypeDB earlier this year because of the agentic coding experience, this is a good moment to try again!



