Vanilla RAG fails the moment a question has more than one hop.
Ask "what does the contract say about refunds?" and cosine similarity over chunked PDFs works fine — there is exactly one paragraph that answers it. Ask "which customers mentioned a refund AND are integrated with Stripe?" and the retriever returns two unrelated piles of chunks: some talk about refunds, some mention Stripe, and the LLM is left to guess whether any of them overlap. They usually don't. The model hallucinates the intersection and ships a confident wrong answer.
We hit this on the third week of running AACFlow in production. The fix wasn't a bigger embedding model. The fix was to stop retrieving chunks and start retrieving entities.
What Vanilla RAG Actually Retrieves
A standard pipeline chunks documents, embeds each chunk, stores the vectors in pgvector, and ranks by cosine on query embedding. Every chunk is independent. The retriever has no idea that the paragraph about "Acme Corp" on page 4 is the same Acme Corp mentioned in the support ticket from last Tuesday. To the index, they are two strings with similar embeddings — and "similar" is doing all the work.
The result: a top-k that looks plausible in the eval set and falls apart on real multi-hop queries. The bottleneck is not embedding quality. It is the flat, entity-blind structure of the index.
How AACFlow's GraphRAG Layer Works
We did not throw out pgvector. We layered a graph on top of it. The knowledge pipeline now writes to three tables instead of one:



