Alexandr Chibilyaev explains why AACFlow runs on a single PostgreSQL database with pgvector — no separate vector DB, no Pinecone, no Weaviate. Relational data and embeddings in one system with transactional consistency.
Every AI platform has an opinion about its database. Most have the same opinion: "You need PostgreSQL for your relational data and Pinecone for your vectors." Or Weaviate. Or Qdrant. Or Milvus. Pick your flavor of separate vector database.
AACFlow has a different opinion: you need PostgreSQL. Period.
We run one database: PostgreSQL with the pgvector extension. Relational data (users, workspaces, workflows, executions, triggers, connectors) and vector data (embeddings for RAG, semantic search, agent memory) live in the same system. No separate vector database. No synchronization between two databases. No "which database has the latest data?" questions at 2 AM.
This is not a compromise. It's an architectural choice that has paid off in reliability, simplicity, and performance.
The standard "PostgreSQL + Pinecone" architecture looks reasonable on a whiteboard. In production, it creates problems:
Two systems to operate. Backups, migrations, monitoring, scaling — all doubled. Two sets of credentials. Two connection pools. Two failure modes. When something breaks at 3 AM, you have to figure out which database broke.
No transactional consistency. You insert a document into PostgreSQL. You embed it and store the vector in Pinecone. The PostgreSQL insert succeeds but the Pinecone insert fails. Now you have a document without a vector. Or you delete a document from PostgreSQL but the Pinecone delete times out. Now you have a ghost vector that returns in search results. Either way, your data is inconsistent.
Query complexity. "Show me all documents in workspace X, created in the last 7 days, that are semantically similar to this query." In a split architecture, you write two queries: one against PostgreSQL (for the relational filters) and one against Pinecone (for the vector similarity). Then you merge the results in application code. In a unified architecture, you write one query. The database handles both the relational filter and the vector similarity search in a single execution plan.
Latency. Every split query adds a network round trip: application → PostgreSQL, application → Pinecone. In a single-database setup, the query stays inside PostgreSQL. One network hop. One query planner. One result set.
Cost. Pinecone isn't cheap. For production workloads with millions of vectors, you're paying hundreds or thousands of dollars per month for a service that PostgreSQL + pgvector can handle for the cost of a slightly larger PG instance you're already running.
pgvector adds a vector data type to PostgreSQL. A vector is just an array of floats with a fixed dimension. For OpenAI embeddings, that's 1536 dimensions. For Cohere, 1024. For our own embeddings, configurable.
1
-- Create a documents table with a vector column
2
CREATETABLEdocuments(
3
id UUIDPRIMARYKEYDEFAULTgen_random_uuid(),
4
workspace_id UUIDNOTNULLREFERENCESworkspaces(id),
5
external_id TEXTNOTNULL,
6
title TEXTNOTNULL,
7
content TEXTNOTNULL,
8
embedding vector(1536),-- pgvector column
9
metadata JSONB,
10
created_at TIMESTAMPTZDEFAULTnow(),
11
updated_at TIMESTAMPTZDEFAULTnow()
12
);
13
14
-- Create an HNSW index for fast similarity search
15
CREATEINDEXON documents
16
USINGhnsw(embedding vector_cosine_ops)
17
WITH(m =16, ef_construction =200);
That's it. You now have a table that stores relational data, JSON metadata, and vector embeddings — all in one row. One insert writes everything. One delete removes everything. Transactional consistency by default.
The magic that makes vector search fast in PostgreSQL is the HNSW (Hierarchical Navigable Small World) index. It's an approximate nearest neighbor algorithm that trades a tiny amount of recall precision for massive speed gains.
With an HNSW index on our documents table, a similarity search across 10 million vectors takes under 10 milliseconds. Without the index, it would take seconds. The index is the difference between "vector search is a gimmick" and "vector search is production infrastructure."
Configuration matters. The m parameter controls the number of connections per layer in the graph. Higher m = better recall but slower builds and more memory. The ef_construction parameter controls the search depth during index construction. Higher ef_construction = better index quality but slower builds.
Our default settings (m=16, ef_construction=200) produce indexes with >99% recall on 1536-dimensional embeddings. For most use cases, the difference between 99.0% recall and 99.5% is invisible. You'll never notice.
The real power of single-database architecture is hybrid search: combining traditional SQL filters with vector similarity in one query.
1
-- Hybrid search:SQL filter + vector similarity
2
SELECT
3
id,
4
title,
5
content,
6
1-(embedding <=> $query_embedding)AS similarity
7
FROM documents
8
WHERE
9
workspace_id = $workspace_id
10
AND metadata->>'category'='support_ticket'
11
AND updated_at >now()-INTERVAL'7 days'
12
ORDERBY embedding <=> $query_embedding
13
LIMIT10;
This query does three things simultaneously:
Filters by workspace (tenant isolation — critical for our multi-tenant platform)
Filters by document category (only support tickets, not CRM deals)
Filters by recency (only last 7 days, because old tickets are less relevant)
Sorts by vector similarity (most semantically relevant results first)
In a split architecture, you'd run the SQL filter in PostgreSQL, get 500 matching document IDs, send them to Pinecone, do a vector search constrained to those IDs, and return results. Two queries. Two network hops. Application-level merge logic.
In AACFlow, it's one query. The PostgreSQL query planner builds an execution plan that uses the HNSW index for the similarity sort and the B-tree indexes for the SQL filters. The database does what databases are good at: query optimization.
AND expires_at >now()-- Don't return expired memories
7
ORDERBY embedding <=> $embedding
8
LIMIT20;
Semantic workflow discovery:
1
SELECT wf.id, wf.name, wf.description,
2
1-(wf.embedding <=> $embedding)AS score
3
FROM workflows wf
4
WHERE wf.workspace_id = $workspace_id
5
AND wf.is_public =true
6
AND wf.embedding ISNOTNULL
7
ORDERBY wf.embedding <=> $embedding
8
LIMIT20;
Three different use cases. Three different tables. Three different filter combinations. All using the same <=> operator (cosine distance) and the same HNSW index infrastructure. No special SDK. No separate API. Just SQL.
The vector type is first-class in our Drizzle schema. The <=> operator is exposed as raw SQL (Drizzle doesn't natively support it — yet). But the result is fully type-safe: results is typed as { id: string; title: string; similarity: number }[]. Autocomplete works. Refactoring works. Compile-time errors instead of runtime surprises.
We didn't start with pgvector. Like many platforms, we initially considered a separate vector database. We looked at Pinecone, Weaviate, and Qdrant. Each had strengths. Each had serious weaknesses for our use case.
The deciding factors:
Multi-tenant isolation. AACFlow is multi-tenant. Every query must be scoped by workspace_id. In a separate vector database, this requires either separate namespaces (which have scaling limits) or application-level filtering (which breaks vector index efficiency). In PostgreSQL, it's just a WHERE clause on a standard B-tree index.
Transactional RAG. When a connector syncs new documents, we need to insert the document AND its embedding atomically. If the embedding generation fails, the document insert must roll back. If the document insert succeeds but the embedding insert fails, we have an orphan. PostgreSQL's transactional guarantees solve this. Separate databases require two-phase commits or eventual consistency — both of which add complexity.
Operational simplicity. We're a small team building a complex platform. Every piece of infrastructure we add is a piece we have to monitor, backup, scale, and debug at 3 AM. pgvector removes an entire category of operational burden.
The migration from "considering separate vector DB" to "all-in on PostgreSQL + pgvector" was simple: we added a vector column to the documents table, created an HNSW index, and started writing embeddings. No data migration. No API changes. No downtime. Just an ALTER TABLE and a CREATE INDEX.
At 10 million document embeddings, with HNSW indexes, our 99th percentile query latency for similarity search is under 15ms. For hybrid queries with both relational filters and vector similarity, the 99th percentile is under 40ms.
Memory is the primary constraint. HNSW indexes live in shared buffers. For 10 million 1536-dimensional vectors, the index requires roughly 5-8 GB of memory. This fits comfortably in a mid-range PostgreSQL instance. For 100 million vectors, you'd need 50-80 GB of memory — which starts to push towards dedicated hardware. But that's a good problem to have, and one we'll solve when we get there (partitioning by workspace, tiered storage, read replicas).
The key insight: for the vast majority of AI agent workloads — knowledge bases with thousands to millions of documents — a single PostgreSQL instance with pgvector handles everything comfortably. You don't need a distributed vector database until you're at internet scale. And by the time you're at internet scale, you have the resources to solve that problem.
Pinecone. Proprietary. Expensive. Another system to operate. No transactional consistency with our relational data. Vendor lock-in if Pinecone changes pricing or deprecates features.
Weaviate. Open-source but complex to operate. Separate deployment. Separate backup strategy. Separate monitoring. All for functionality that pgvector provides with an extension.
Qdrant. Fast. Well-designed. But again: separate system. And when your vector database is separate from your relational database, you inevitably have consistency problems. We've seen it. We've debugged it. We avoid it.
Milvus. Powerful but heavy. Requires etcd, MinIO, Pulsar — a whole distributed system just to store and search vectors. For our scale, massive overkill. And again: separate database.
MongoDB Atlas Vector Search. If we used MongoDB as our primary database, this would be compelling. But we don't. And we're not adding MongoDB just for vector search when PostgreSQL already does everything we need.
This isn't just about databases. It's about an engineering philosophy that runs through all of AACFlow: fewer moving parts is better.
Every system you add is a system that can fail. Every service boundary is a place where consistency can break. Every new technology in your stack is something your team has to learn, debug, and maintain.
PostgreSQL has been battle-tested for 30 years. It handles tens of thousands of transactions per second. It's ACID-compliant. It has a query planner that understands both B-tree indexes and HNSW indexes. It runs on every cloud, on every VPS, in Docker, on bare metal. It's not exciting. It's not new. It's reliable — and reliability is exactly what AI agent infrastructure needs.
pgvector is the extension that makes PostgreSQL a vector database. Not a separate system with a separate API, but a column type and an index type inside the database you're already using. You write SQL. You get vectors back. Your data is consistent. Your ops are simple. Your team sleeps well.
We chose PostgreSQL + pgvector because it's the boring, reliable choice. And in infrastructure, boring is a compliment. If you're building an AI platform and evaluating databases, ask yourself: do you really need a separate vector database? Or do you just need PostgreSQL with one extra extension?