Introduction to TypeDB
TypeDB looks beyond relational and NoSQL databases by introducing a strong type system and extending it with inference and pattern matching for simple, yet powerful querying.
TypeDB brings the benefits of strong typing to databases
We believe strong typing must extend to databases. It’s a core concept in the programming languages we love – from Java and Kotlin to Rust and TypeScript – and it lets developers write efficient and flexible code using abstraction, inheritance and polymorphism. However, because databases lack type systems, there’s a mismatch between logical and physical data models.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
define
credential sub attribute, value string;
username sub attribute, value string;
email sub attribute, value string;
subject sub entity, abstract, owns credential;
user sub subject, abstract, owns username;
employee sub user, owns email;
insert $e isa employee,
has credential "passwd", has username "jdoe", has email "[email protected]";
In TypeDB, developers can define entities, relations and attributes as subtypes of others in order to inherit attributes and roles (e.g., employee inherits username from user). This allows attributes and roles, and any constraints on them, to be reused and easily maintained because they are only defined once. Further, developers don't have to insert data in multiple places for subtypes (or make sure they specify every supertype of a given subtype).
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
CREATE TABLE subjects (
id integer PRIMARY KEY,
credential text);
CREATE TABLE users (
id integer PRIMARY KEY,
s_id integer REFERENCES subjects(id),
username text);
CREATE TABLE employees (
id integer PRIMARY KEY,
u_id integer REFERENCES users(id),
email text);
-- three inserts required
INSERT INTO subjects VALUES (1, 'password');
INSERT INTO users VALUES (1, 1, 'jdoe');
INSERT INTO employees VALUES (1, 1, '[email protected]');
There are multiple ways to model inheritance in relational databases, all of them inadequate. Create a table for the supertype and one for each subtypes (and require joins), create a table for each subtype (and require unions) or create a table for all subtypes (and require filtering). Regardless of the choice, developers have to implement the logic in queries because the database has no context for the relationship behind foreign keys.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
db.createCollection("employees", {
validator: {
$jsonSchema: {
bsonType: "object",
title: "IAM employee",
required: [ "credential", "username", "email"],
properties: {
credential: {
bsonType: "string",
description: "'credential' must be a string and is required"},
username: {
bsonType: "string",
description: "'username' must be a string and is required"},
email: {
bsonType: "string",
description: "'email' must be a string and is required"}
}
}
}
})
db.employees.insert({credentials: "passwd", username: "jdoe",
email: "[email protected]" })
When it comes to integrity constraints, MongoDB schema validation is on par with table definitions in relational databases. However, it too lacks native support for inheritance. As a result, developers are forced to create a set of validation rules, apply them to each subtype and make sure they are always in sync when fields are added or removed – all because, unlike in TypeDB, field constraints can't be inherited from a supertype.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
CREATE CONSTRAINT [subject_credential_string] [IF NOT EXISTS]
FOR (n:Subject)
REQUIRE n.credential IS :: STRING
CREATE CONSTRAINT [user_username_string] [IF NOT EXISTS]
FOR (n:User)
REQUIRE n.username IS :: STRING
CREATE CONSTRAINT [employee_email_string] [IF NOT EXISTS]
FOR (n:Employee)
REQUIRE n.email IS :: STRING
CREATE (emp:Subject:User:Employee
{
credential: 'passwd',
username: 'jdoe',
email: '[email protected]'
})
Neo4j, like MongoDB, support validation constrains but lacks a proper schema and inheritance support. The known workaround is for developers to add a label for every applicable supertype when creating a node. This forces developers to bear the burden of ensuring data is consistent because each node defines its own type hierarchy. There is nothing preventing nodes from being created with missing/wrong labels.
Over time, physical data models become even less logical because of database optimizations such as normalization and workarounds such as a lack of many-to-many relationships in document databases. The result is a database which doesn’t understand its own data and how its related. For example, schemas in relational databases can define foreign key constraints, but they're unaware of the context behind the relationship (e.g., inheritance).
MongoDB is no different with its $lookup stage. These databases know next to nothing about their own data. TypeDB was built to fix this problem. It has a strong type system based on logical data models: entities, relations and attributes. And by making relations and attributes first-class citizens, TypeDB is able to understand the nature of its data and how it's all connected – making it easy for developers to find what they’re looking for.
TypeQL queries are fully declarative via composable patterns
TypeDB’s query language, TypeQL, is based on pattern matching. To find data, developers combine patterns to describe what they’re looking for. It is fundamentally different than SQL queries which are long statements telling the database what to do and how. As a result, TypeQL is a truly declarative query language – simply describe the data to be returned, and TypeDB will figure out the rest.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
# find all files in the /usr/bin directory which
# John Doe has permission to write
match
$e isa employee, has email "[email protected]";
$f isa file, has path "/usr/bin/";
$a isa action, has name "WRITE";
$ac ($f, $a) isa access;
$p ($e, $ac) isa permission;
There are no joins, unions or even WHERE clauses in TypeDB's query language, TypeQL, because it's fully declarative and based on pattern matching. All developers have to do is describe the entities and/or relations they're looking for. TypeDB will figure out how to find them. Further, TypeQL queries are fully composable such that individual patterns can be added or removed to modify the results and each pattern is independent of the others.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
-- find all files in the /usr/bin directory which
-- John Doe has permission to write
SELECT f.path
FROM permissions p
JOIN employees e ON e.id = p.subject_id
JOIN access ac ON ac.id = p.access_id
JOIN actions a ON a.id = ac.action_id
JOIN files f ON f.id = ac.object_id
WHERE e.email = '[email protected]' AND
f.path LIKE '/usr/bin/%' AND
a.name = 'WRITE';
Telling a database how to join tables or traverse a self-referencing table is not declarative. In fact, developers shouldn’t have to worry about primary and foreign keys – whether it's to join tables or insert rows. The problem is relationships are an afterthought in relational databases. This puts the onus on developers to ensure data is properly referenced and increases the risk of accidentally joining the wrong rows and not knowing it.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
// find all files in the /usr/bin directory which
// John Doe has permission to write
db.permissions.aggregate([
{
$match: {
email: "[email protected]",
},
},
{
$lookup: {
from: "permissions",
localField: "email",
foreignField: "subject",
as: "permission",
},
},
{
$unwind: {
path: "$permission",
},
},
{
$lookup: {
from: "accesses",
localField: "permission.access_id",
foreignField: "access_id",
as: "access",
},
},
{
$lookup: {
from: "actions",
localField: "access.action_name",
foreignField: "name",
as: "action",
pipeline: [
{
$match: {
$expr: {
$eq: ["$name", "WRITE"],
},
},
},
],
},
},
{
$lookup: {
from: "files",
localField: "access.resource_id",
foreignField: "resource_id",
as: "resource",
pipeline: [
{
$match: {
$expr: {
$eq: ["$path", "/usr/bin"],
},
},
},
],
},
},
{
$match:
{
resource: {
$ne: [],
},
},
},
])
It's easy to find data in MongoDB when all related data is stored within a single document. However, things get much more complicated when it isn't. MongoDB supports left outer joins via the $lookup function, but performing lookups across multiple collections results in large, complex aggregation pipelines that are difficult to maintain and troubleshoot. For example, specifying the wrong from/to value or forgetting to unwind an array.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
// find all files in the /usr/bin directory which
// John Doe has permission to write
MATCH
(e:Employee)-[:GRANTED]-(p:Permission),
(p)-[:INCLUDES]-(x:Access),
(x)-[:HAS]-(a:Action),
(x)-[:ON]-(f:File)
WHERE e.email = '[email protected]' AND
f.path STARTS WITH 'usr/bin' AND
a.name = 'WRITE'
RETURN p
Neo4J relationships must connect exactly two nodes. It does not support n-ary relationships (i.e., relationships between multiple nodes) or nested relationships (i.e., relationships between relationships). As a result, the data model includes extraneous nodes and relationships in order to work around these limitations. Further, filtering data with property constraints beyond an exact match requires a WHERE clause (just like SQL).
Queries are created by combining discrete patterns, making TypeQL not only declarative, but composable. For developers, writing a query requires little more than coming up with a set of patterns that describe the data they're looking for. And troubleshooting queries is as easy as adding or removing patterns in order to change the results of a query. However, though easy to use, TypeQL is a powerful query language – supporting conjunction, disjunction and negation of patterns as well as grouping and aggregation of data.
Further, it does away joins, unions and recursive CTEs so developers don't have to waste time trying to comprehend and troubleshoot long query statements. Rather, the expressiveness of TypeQL expressiveness, combined with a type system and inference, allows developers to simply describe what they’re looking for – not what it is, not how to find it and not how it's all connected. As a result, developers don't have to worry about getting bogged down by complicated queries or writing code to handle database shortcomings.
TypeDB infers data on behalf of developers, simplifying data access
The age of AI is upon us, and it necessitates intelligent databases. It's not enough for databases to simply store data and provide a means to access it. The future will be built on those that are capable of reasoning over their data too. TypeDB's type system, combined with built-in reasoning engine, allows it to infer data on behalf of developers – putting deep information in plain sight.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
# find all of Jane Doe's permissions on restricted objects
match
$e isa employee, has email "[email protected]";
$p (subject: $e) isa permission, has restricted true;
TypeDB can infer attributes and relations for developers. In effect, its reasoning engine uses logic and rules to materialize abstractions that hide the complexity of interconnected data – making them easy to query. For example, restricted permissions can be inferred with business rules so developers can simply query for them rather than having to traverse the relations between subjects and objects, and then checking to see if those objects are restricted.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
-- find all of Jane Doe's permissions
WITH RECURSIVE user_groups AS (
SELECT member_id, group_id,
FROM user_group_members
WHERE email = "[email protected]"
UNION ALL
SELECT member_ug.member_id, member_ug.group_id
FROM user_group_members member_ug
JOIN user_groups group_ug ON group_ug.group_id = member_ug.member_id
)
SELECT p.id
FROM user_groups ugs
JOIN permissions p ON p.subject_id = ugs.group_id
UNION
SELECT p.id
FROM permissions p
WHERE p.subject_id = 1;
It's difficult to query interconnected data in relational databases because developers have to define every potential path with a crude set of joins, unions and recursive common table expressions (CTEs) – and with every path, branch and relation that has to be traversed, query complexity grows. To make matters worse, the paths might not be known due to a lack of domain expertise, might have changed or the number of permutations is simply too much to express in SQL.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
// find all of Jane Doe's permissions
db.group-memberships.aggregate([
{
$match: {
email: "[email protected]",
},
},
{
$graphLookup: {
from: "group-memberships",
startWith: "$group_name",
connectFromField: "group_name",
connectToField: "member_id",
as: "groups",
},
},
{
$set: {
group_names: {
$concatArrays: [
["$group_name"],
"$groups.group_name",
],
},
},
},
{
$lookup: {
from: "permissions",
localField: "group_names",
foreignField: "subject_id",
as: "group_permissions",
},
},
{
$lookup: {
from: "permissions",
localField: "email",
foreignField: "subject_id",
as: "member_permissions",
},
},
{
$set: {
permissions: {
$concatArrays: [
"$group_permissions",
"$member_permissions",
],
},
},
},
])
MongoDB's $graphLookup function is similar to recursive CTEs in relational databases in that it can be used to traverse self-referencing collections. However, it's not very helpful when documents are indirectly connected across multiple collections. In such cases, MongoDB's aggregation pipeline requires developers to explicitly identify and traverse every potential path between documents. This leads to brittle aggregation pipelines that require a large number of stages.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
// find all of Jane Doe's permissions on restricted objects
MATCH
(e:Employee)-[:GRANTED_TO]-(p:Permission),
(p)-[:FOR_ACCESS|ON_OBJECT*]-(r:Resource)
WHERE
e.email = '[email protected]' AND
(
r.hostname = 'db001' OR // restricted server
r.path STARTS WITH '/opt/secure' OR // restricted directory
r.name = 'account_numbers' OR // restricted table
r.pk = '100' // restricted row
)
RETURN p
Neo4j can query transitive relationships in Cypher almost as easily as TypeDB does in TypeQL. However, because Neo4j lacks rule-based inference, it can't materialize properties on a node or relationship at query time by evaluating the properties of other nodes and relationships – forcing developers to specify all potential combinations of nodes, relationships and properties in their queries, and moving what should be centralized business logic into application-specific code.
TypeDB is able to perform type inference, allowing developers to find data based on a shared supertype rather than specifying individual types. However, the real power of its reasoning engines lies in its ability to derive new information from existing data (i.e., to infer data that doesn't physically exist). TypeDB applies user-defined rules to existing data at query time. By embedding logic in these rules, developers don't have to incorporate it in queries – or even know about it.
The reasoning engine can materialize new attributes on an entity not only based on its other attributes, but on any relations it has to other entities (directly or indirectly) – and their attributes too. Further, it can materialize new relations between entities. The result is a layer of abstraction that hides the complexity of highly interconnected data so developers don't have to worry about specifying all of the potential permutations of connections to find what they're looking for.
Start building today
Cloud or containers, a database with strong typing, logical abstractions and declarative expressions is minutes away.