New ACM paper, free-tier cloud, and open-source license

TypeDB Blog


The age of AI is upon us — where are the smart databases?

In the same way AI-powered code generation tools can improve developer productivity, so can an AI-powered database.

Shane Johnson


AI is arriving, and it’s coming in hot.

I hear about a new use case for ChatGPT every day, and every product I come across is being rebranded as an AI product (whether it’s true or not). Still, it’s to hard deny AI is going to prove useful in many ways.

Consider developer productivity. You’ve got tools such as SpellBox, GitHub Copilot, Tabnine, Amazon CodeWhisperer, AskCodi, Resplit and others for code generation. Then you have tools such as DeepCode AI for identifying and resolving security bugs. It makes sense to infuse AI in our favorite IDEs so we can code better and/or faster, especially when it’s easy to use and helps us be more productive – and maybe even learning a little something along the way.

So, why is AI missing from databases? Vector databases are cool, but that’s not what I’m getting at. What if a database used AI to help developers access data without having to know a great deal about it? What if it could put an end to complex queries?

What if you could ask more sophisticated questions without having to tell the database how to answer them? In the same way AI-powered code generation tools can improve developer productivity, so can an AI-powered database.

Let’s start with two of the most popular open source databases.

Postgres has long been, and still is, one of the most beloved databases. And MongoDB? It fundamentally changed how developers build applications. They’re both great at what they do.

And they’re both as dumb as a box of rocks.

I’m not attacking Postgres and MongoDB. They wouldn’t be loved by millions of developers if they weren’t doing something right.

But, when it comes to interpreting data, they don’t have the slightest idea.

How about an example. I have a CI/CD platform which produces build artifacts. There are different types of artifacts. In a relational database, I’d create a table for artifacts and separate tables for each type of artifact. In an application, I’d create an abstract class for artifacts, and for each type of artifact, a separate class which extends it. We know archives, executables and scripts are types of artifacts, and through inheritance, our application would too.

The database? It doesn’t have a clue.

INSERT INTO artifacts VALUES (100, 'config.sh');
INSERT INTO executables VALUES (200, 100, 'Linux');
INSERT INTO scripts VALUES (300, 200, 'Bash');

Why are multiple inserts required? Shouldn’t the database know where to put the data?

The same is true for MongoDB and Neo4j. They can’t begin to understand their own data because, without classification, they can’t interpret the abstractions we use to communicate the meanings of things.

Here’s another example. I have an internal developer platform (IDP) which empowers developers to deploy applications and services. There’s three things involved in a deployment: the developer, the artifact and the environment (e.g., PROD). I’d create a deployment table with foreign key references to the artifact and environment tables. The environment table would have a foreign key reference to the server table. You and I know developers launch deployments which ship artifacts to servers which run in environments.

The database? Yeah, no.

SELECT f.path
FROM deployments d
JOIN artifacts a ON a.id = d.artifact_id
JOIN environments e ON e.id = d.env_id
JOIN servers s ON s.id = e.server_id
WHERE d.result = 'SUCCESS' AND
   e.name = 'PROD';

Why do I have to tell the database which tables to join and how? Shouldn’t it be able to figure it out by itself?

Postgres and MongoDB can’t interpret the relationships between data. Yes, you can have a foreign key, but it’s nothing more than a constraint.

Neo4j is better in so much as it’s a graph database. However, because it’s limited to binary relationships, what we think of as a relationship must often be represented as a node. However, a deployment isn’t a set of independent relationships. It’s a single relationship. If the deployment-artifact relationship is removed, but the deployment-user and deployment-environment relationships remain, does the deployment node still represent a deployment?

MATCH
  (a:Artifact {name: 'config.sh'}),
  (e:Environment {name: 'PROD'}),
  (u:Developer {name: 'Neo'}
CREATE
  (d:Deployment),
  (d)-[r1:LAUNCHED_BY]->(u),
  (d)-[r1:TARGETS]->(e),
  (d)-[r1:DEPLOYS]->(a)
RETURN d

Why do I have to create multiple relationships to insert a single deployment?

The bottom line is databases can’t interpret their own data unless they have context for it, and context requires classification and relationships.

No classification, no relationships, no context. No context, intelligence.

No intelligence, box of rocks.

Ok. Let’s assume we have a database which supports classification and relationships. It’d be a little smart. After all, it doesn’t simply store data, it stores its context.

Given a model like this:

We could ask questions like:

  • Find all artifacts, or find all archives
  • Find all developers and the executables the’ve deployed
  • Find all production servers and the scripts they’re running

The great thing is the queries would look a lot like the sentences above. You wouldn’t have to write a SQL statement with joins and recursive CTEs or create a MongoDB aggregation pipeline with multiple stages. We wouldn’t have to tell the database what to do because it’d have enough context to figure it out on its own.

We could write a query for the second question above like this:

match
  $u isa developer;
  $e isa executable;
  $d ($u, $e) isa deployment;
return $d;

A database with context is smarter than one without it, but what if a database had intelligence too?

If only there was a way to infuse a database with AI. Oh wait, there is. All we have to do is add an inference engine. Once the data is classified and connected, we can use logic and rules to infer implicit knowledge from explicit data. Boom!

Now we can ask questions like these too:

  • Find all servers running critical services which may have an issue
  • Find all services whose failure would impact core business operations
  • Find all infrastructure exposed by a compromised build

You may be thinking, isn’t this a knowledge graph? Yes and no.

The problem with knowledge graphs is their obsession with the semantic web. If I’m building an application, I couldn’t care less about being part of a web of data. I don’t want to work with RDF, OWL and SPARQL. I mean really, who the hell wants to go back to XML? Not to mention, I probably shouldn’t be exposing company data on the web.

The applications you run on MongoDB and Postgres, you wouldn’t run them on a knowledge base, would you? Neither would I.

But… what if MongoDB and Postgres incorporated AI via an inference engine? Would you call them knowledge graphs now? Or, would you be excited to build your application on them, and take advantage of new AI capabilities?

This is the approach we’re taking with TypeDB. Yes, it has context for its data, and yes, it has an inference engine – but it’s designed to help developers build applications which take advantage of AI in the database so they can operate intelligently and meet rising customer expectations.

In my next blog, I’ll dive a little deeper into TypeDB and how it does all the things I’ve mentioned above. In the meantime, register for my upcoming webinar: Introduction to TypeDB: Beyond relational and NoSQL.

Then head on over to the website and check out the Introduction and Features pages. It won’t take long to see why life is better with TypeDB.

If you just want to see some code, check out our example projects here and here on GitHub.

Share this article

TypeDB Newsletter

Stay up to date with the latest TypeDB announcements and events.

Subscribe to Newsletter

Further Learning

Inference in TypeDB

If a rule represents deductive logic, rule inference is the process of inferring new information according to the deductive logic encoded in rules.

Read article

Accelerating Drug Discovery

Learn how TypeDB can be used to model complex biological relationships and accelerate the drug discovery processs.

Read article

Symbolic AI in Robotics

TypeDB gives robots the ability to reason independently without having to rely exclusively on human intervention or expensive machine learning approaches.

Read paper

Feedback