Production-Ready RAG Pipeline with pgvector and AACFlow Knowledge Base

Most RAG implementations never make it to production. They work fine with 10,000 documents in a demo, then collapse under the weight of a real corpus — slow queries, irrelevant results, operational overhead from a separate vector database. This post explains how AACFlow's Knowledge Base, backed by pgvector 0.7, solves these problems and walks through building a production-grade RAG pipeline that responds in under 200ms with 10 million documents.

Why pgvector Instead of a Dedicated Vector Database

The instinct when building RAG is to reach for a dedicated vector database — Pinecone, Qdrant, Weaviate. These are strong products, but they add infrastructure complexity that most teams do not need and cannot maintain well.

pgvector 0.7 changes the calculus. The latest version ships two mature index types: IVFFlat and HNSW. HNSW (Hierarchical Navigable Small World) delivers near-linear query performance — a 10M vector search completes in under 5ms on commodity hardware. The accuracy trade-off is configurable via the ef_search parameter.

The decisive advantage of pgvector is that embeddings live in the same database as the rest of your application data. This means:

SQL joins work: filter by document owner, workspace, tag, or any metadata field with a standard WHERE clause.
Transactions work: embed and store in a single atomic operation — no risk of embedding succeeding while document storage fails.
EXPLAIN ANALYZE works: you can profile and optimize vector queries with the same tools you use for everything else.

Stage	Latency (p50)	Latency (p99)
Query embedding	18ms	28ms
HNSW vector search	4ms	9ms
BM25 full-text search	3ms	7ms
RRF merge	1ms	2ms
Cohere Rerank (top-20→5)	45ms	80ms
Claude Sonnet 4.6 generation	95ms	180ms
Total end-to-end	166ms	306ms

Production-Ready RAG Pipeline with pgvector and AACFlow Knowledge Base

Why pgvector Instead of a Dedicated Vector Database

Related posts

How AACFlow Knowledge Base Works

Building the RAG Workflow in AACFlow

Performance Tuning: HNSW vs IVFFlat

Chunk Size Optimization

Benchmark: 10M Documents Under 200ms

Getting Started with AACFlow Knowledge Base