First look at TypeDB 3 benchmarks

A first look at TypeDB 3 benchmarks, analyzing performance at scale on OLTP workloads using comparisons to previous versions of TypeDB and Neo4j

Dmitrii Ubskii

Our benchmarks show that TypeDB is now performant to industry standards at scale. It is also predictable, which is key for uses that rely on consistent response times under variable load.

A step-change in performance and scalability

At the start of 2025, we launched our rewrite of TypeDB in Rust (If you want to learn more about that, check out this blog). After extensive testing and evaluation, we are releasing our first official benchmarks. Compared to TypeDB 2, TypeDB 3 delivers 3-5x greater query performance, superior scalability, and more consistent latency under load.

Tested alongside Neo4j, TypeDB shows a different scaling profile. Neo4j’s query performance is very strong on small datasets, but decreases once data grows beyond memory. TypeDB maintains more stable performance as data scales, and continues to handle larger datasets effectively once they no longer fit fully in memory.

TypeDB 3.4.1 against Neo4j benchmarks using TPC-C. Note, higher is better. (See below for data)

This result is a strong validation of our rewrite in Rust. Beyond safety and maintainability, the architectural changes in TypeDB 3 are proving their worth in raw performance. These results are single-threaded and determine our baseline query efficiency. They are also useful for investigating future development direction beyond intra-query parallelism.

Why compare with Neo4j?

We benchmarked with Neo4j because it is the most widely adopted graph database and a frequent point of comparison we hear from the market. Many of these comparisons come from cybersecurity and information management domains, which typically feature transactional systems that use graph databases.

Neo4j focuses on property graphs and is highly optimized for in-memory performance. TypeDB (although a different category of database) is often compared with traditional graph databases because it can store graph or hypergraph data natively.

A key difference compared to property graphs is that TypeDB introduces a high-level schema that enforces relationships and connectivity more rigorously. This makes the comparison natural, while reflecting the different priorities of the two systems.

Benchmark setup

We used TPC-C, a standard OLTP benchmark to measure real-world query performance. Our goal was to simulate realistic transactional workloads and observe how TypeDB behaves as data size increases. Note that this is an atypical use for TPC-C, which normally measures long-running throughput at a fixed data size.

We benchmarked:

Environment: Fresh OVH c3-16 instances (8 vCPUs, 16 GB RAM, 200 GB NVMe, Ubuntu 24.10) per run.
Data size: 100 warehouses = ~100k entities, 50M relations, 150M attributes, ≈24 GB on disk (TypeDB) and 17 GB (Neo4j). Up to 400 warehouses = ~400k entities, 200M relations, 600M attributes, ≈100 GB on disk (TypeDB) and 70 GB (Neo4j). The additional spaced used by TypeDB is due to automatic indices that support dynamic and ad-hoc query patterns. Neo4j has manually added indexes for the expressed query patterns.
Scaling: Entities, relations, attributes, and storage grow linearly with warehouses. Workloads touch only a single warehouse, so performance is expected to remain approximately constant if the database scales effectively.
Approach: Focused on single-threaded performance. We are interested in the total work performed per query, which when optimized translates to lower latencies.

TypeDB CE v2.29.1 – final release of TypeDB 2, heavily tuned.
TypeDB CE v3.4.1 – with first performance optimizations.
Neo4j CE v5.26.5 – industry standard reference point.

A note on scope: these results use TPC-C, which is an OLTP benchmark. TPC-C is not a graph-specific benchmark like LDBC. TypeDB is designed to support SQL-like, graph, and document transactional workloads, and this benchmark provides a practical way to measure OLTP-style performance and mark a performance baseline.

Results

We examined the number of transactional operations (“workloads”) completed in 10 minutes at different dataset sizes. Workloads may execute multiple queries sequentially.

Workloads (higher is better)

DB	1 warehouse	100 warehouses	200 warehouses	300 warehouses	400 warehouses
TypeDB CE v2.29.1*	8,282	4,204	DNF	DNF	DNF
Neo4j v5.26.5	59,846 ± 3,805	16,887 ± 2,977	14,208 ± 135	9,675 ± 2,049	8,295 ± 1,448
TypeDB CE v3.4.1	22,886 ± 552	19,278 ± 313	17,479 ± 569	16,525 ± 330	14,495 ± 1,459

At 400 warehouses, TypeDB completes around 75% more workloads than Neo4j

We did not collect enough data for the earlier version of TypeDB to provide a confidence interval. It was also unable to finish loading 200 warehouses worth of data (hence the DNF – Did not finish)

Key takeaways

TypeDB 3.4.1 is 3-5x faster than TypeDB 2 and scales significantly better.
Neo4j achieves very strong results on small datasets, but performance decreases once data exceeds available memory.
At 400 warehouses, TypeDB completes significantly more workloads than Neo4j, highlighting their different scaling characteristics.

Workload examination

To understand the performance differences more clearly, we compared workload latencies. Since this is a single-threaded benchmark, latency gives a sharper view than workloads alone.

TypeDB 3.4.1: Latencies are tightly bounded. At 400 warehouses, worst-case latency is under 200ms.
Neo4j 5.26.5: Latencies show notable variance and a long tail.

Takeaway: TypeDB isn’t just fast at scale, it also exhibits a predictable baseline, which matters for systems that depend on consistent response times.

Latency by workload

TPC-C executions consist of five different “workloads”, each performing one or more queries and a commit for those that write to the database.

Mean time taken by individual workloads (400 warehouses) – lower is better

	Delivery	New Order	Order status	Payment	Stock level
TypeDB CE v2.29.1 (100 warehouses)	549.49 ms	203.46 ms	94.63 ms	33.06 ms	128.24 ms
Neo4j v5.26.5	467.26 ms	75.68 ms	10.00 ms	17.20 ms	1.17 ms
TypeDB CE v3.4.1	147.19 ms	60.84 ms	8.35 ms	9.90 ms	81.58 ms

TypeDB matches or exceeds Neo4j on most workloads except Stock Level. This workload counts recently ordered items with low stock, which is a common type of query. In TypeDB, it highlights a limitation in the query planner, which currently fetches more data than necessary before filtering. Addressing this is already on our roadmap for optimization.

Latency distribution

Latency distributions show how response times behave under load. Before looking at the numbers, a quick note on terminology:

p50 is the median latency (half of all queries are faster).
p99 is the 99th percentile (99% of queries are faster, 1% are slower).
max is the single highest latency measured
Higher percentiles show how the “long tail” of queries behave under load.

TypeDB CE 3.4.1

Detailed timing breakdown for a representative run with TypeDB CE v3.4.1

	Completed	Time (ms)
	Completed	p50	p75	p90	p95	p99	max
Delivery	599	147.19	155.05	160.06	163.10	170.57	180.33
New Order	6,537	60.84	73.95	87.17	93.97	103.70	144.15
Order status	586	8.35	9.38	10.27	11.19	13.59	36.81
Payment	6,370	9.90	13.08	23.23	39.19	58.20	99.53
Stock level*	557	81.58	90.50	96.11	99.99	105.14	121.03

TypeDB maintains sub-200 ms latencies at scale

*Stock level workloads are read-only, so the information does not include the time to commit the workload operations. Every other workload latency includes the time to durably commit.

For most workloads, TypeDB’s latency range stays within 2–3x of the median, with only Payment showing a wider spread. Even in those cases, maximum latency remains below 200 ms.

Neo4j 5.26.5

Detailed timing breakdown for a representative run with Neo4j v5.26.5

	Complete	Time (ms)
	Complete	p50	p75	p90	p95	p99	max
Delivery	379	467.26	495.89	536.47	557.07	717.55	3,185.93
New Order	4,054	75.68	93.81	112.86	134.88	314.15	2,017.28
Order status	360	10.00	25.52	27.02	28.44	41.48	228.57
Payment	3,836	17.20	26.81	28.58	30.52	36.19	1,894.42
Stock level	340	1.17	1.34	1.83	3.21	6.58	329.64

Neo4j shows a wider latency spread. For example, in Payment and Stock Level, maximum latencies sometimes extend into seconds, although median values remain low.

Takeaway: TypeDB provides tighter and more predictable latency bounds at scale, while Neo4j shows greater variance on larger datasets.

Why TypeDB might scale differently

Neo4j’s performance decreases more noticeably as data grows, reflecting potential in-memory design tradeoffs. We hypothesize this could be explained by Neo4j CE’s pointer-based storage, which is extremely fast in-memory, but when data spills to disk every pointer lookup risks costly page faults. Performance then reduces as datasets grow further and further beyond memory scale.

TypeDB encodes data into RocksDB, grouped by type and connection. This preserves data locality and minimizes disk overhead. From 100 to 400 warehouses, TypeDB reduces by ~25%, while Neo4j reduces by ~51% over the same range, showing the different tradeoffs in their storage models.

Future work

These benchmarks mark only the beginning. They have been invaluable for identifying general inefficiencies and guiding further improvements:

Multi-threaded execution: Current results are single-threaded. Adding pipelining will unlock parallelism across CPU cores.
Smarter query planning: Stock Level workloads expose planner limitations. Improvements will reduce unnecessary scans.
Bushy plans: Moving beyond linear execution trees will allow reuse of intermediate results and more efficient joins.

Each of these directions promises significant further gains. We also want to investigate other benchmarks, such as YCSB and LDBC, in order to broaden the scope of optimization potential.

Conclusion

These benchmarks show that TypeDB 3 is a significant step forward from TypeDB 2, both in raw performance and in predictability under load. They also highlight the different scaling characteristics between TypeDB and Neo4j. Neo4j performs extremely well in memory-bound scenarios, while TypeDB demonstrates stable performance as datasets grow beyond memory.

Combined with the safety and maintainability benefits of the Rust rewrite, TypeDB 3 is now well-positioned for demanding, production-grade workloads. We will continue to refine the system and share progress as optimizations land. In the meantime, we welcome discussion and questions from the community on our Discord.

These benchmarks show that TypeDB 3 is production-ready. For teams that want to benefit from these gains without managing infrastructure, TypeDB Cloud offers a fully managed path to deployment.

Services

Platform

Tools

What is TypeDB?

Why TypeDB?

Start building

By use case

By industry

Learn

Content

First look at TypeDB 3 benchmarks

A step-change in performance and scalability

Why compare with Neo4j?

Benchmark setup