GraphRAG vs Vector RAG: Why We Stopped Retrieving Chunks and Started Retrieving Entities

Vanilla RAG fails the moment a question has more than one hop.

Ask "what does the contract say about refunds?" and cosine similarity over chunked PDFs works fine — there is exactly one paragraph that answers it. Ask "which customers mentioned a refund AND are integrated with Stripe?" and the retriever returns two unrelated piles of chunks: some talk about refunds, some mention Stripe, and the LLM is left to guess whether any of them overlap. They usually don't. The model hallucinates the intersection and ships a confident wrong answer.

We hit this on the third week of running AACFlow in production. The fix wasn't a bigger embedding model. The fix was to stop retrieving chunks and start retrieving entities.

What Vanilla RAG Actually Retrieves

A standard pipeline chunks documents, embeds each chunk, stores the vectors in pgvector, and ranks by cosine on query embedding. Every chunk is independent. The retriever has no idea that the paragraph about "Acme Corp" on page 4 is the same Acme Corp mentioned in the support ticket from last Tuesday. To the index, they are two strings with similar embeddings — and "similar" is doing all the work.

The result: a top-k that looks plausible in the eval set and falls apart on real multi-hop queries. The bottleneck is not embedding quality. It is the flat, entity-blind structure of the index.

How AACFlow's GraphRAG Layer Works

We did not throw out pgvector. We layered a graph on top of it. The knowledge pipeline now writes to three tables instead of one:

GraphRAG vs Vector RAG: Why We Stopped Retrieving Chunks and Started Retrieving Entities

What Vanilla RAG Actually Retrieves

How AACFlow's GraphRAG Layer Works

Related posts

Retrieval Becomes a Two-Phase Walk

Why pgvector + a Graph Beats a Real Graph Database

Where It Pays Off Hardest

What This Means If You Are Building Agents