Alexandr Chibilyaev explains the three-layer memory architecture in AACFlow: session memory, persistent conversation memory via conversationId, sliding window for cost control, plus Mem0 semantic facts and Zep episodic history — and how to use the Memory block for manual control.
An agent without memory is not an agent. It's a stateless function. Every interaction starts from zero — no context, no history, no learning. You wouldn't hire an employee who forgets everything at the end of each day. Why accept that from your AI?
At AACFlow, we've engineered memory not as an afterthought bolted onto the LLM, but as a first-class architectural layer with three distinct tiers. Each tier solves a different problem, and together they make agents that actually improve over time.
This is the agent's working memory. Everything that happens within a single workflow execution: which tools were called, what parameters were passed, what results came back, what intermediate decisions the agent made. It lives in the LLM's context window and disappears when the execution ends.
Session memory is automatic. You don't configure it — every Agent block maintains it while running. It's what allows an agent to use the output of one tool call as input to the next, or to adjust its approach based on intermediate results.
Limitation: the context window is finite and expensive. At scale, you can't stuff every past interaction into it. That's where the next two layers come in.
The memoryType dropdown in the Agent block offers four modes, and the most powerful for long-running agents is Conversation mode.
When you set memoryType: conversation and provide a conversationId (e.g., user-123, deal-ABC, ticket-4567), the agent persists its entire interaction history keyed to that ID. Every subsequent invocation with the same conversationId picks up exactly where the last one left off.
This is not a simple "append to a log file." The conversation history is stored server-side with proper role attribution (user, assistant, system), timestamps, and tool call metadata. When the agent runs again, the full history is injected into the system prompt as context — including what the user asked, how the agent responded, which tools were called, and what data was retrieved.
Practical example — Customer Support Agent:
1
conversationId:"ticket-2847"
2
3
Day 1: User reports "login not working on mobile"
4
→ Agent diagnoses: cookie issue on Safari iOS
5
→ Agent sends fix instructions
6
7
Day 7: User returns:"same problem again"
8
→ Agent sees full Day 1 context
9
→ Instead of starting from scratch: "Last time this was a Safari cookie issue.
10
Has anything changed since our fix? Let me check if there's a newiOS update..."
11
→ Resolution time: seconds instead of minutes
12
13
Day 30: User asks:"any tips for my account?"
14
→ Agent recalls the login issues, Safari usage pattern
15
→ Proactively suggests: "Since you use Safari on iOS, here's how to enable
16
biometric login for faster access..."
The conversationId pattern is what makes single-purpose agents behave like long-term employees, not one-off calculators. And it scales: a single Agent block can handle thousands of conversations simultaneously, each isolated by its ID.
Conversation mode grows unboundedly. For high-volume agents, that's a cost problem — every new interaction adds to the token count of every subsequent interaction.
The two sliding window modes solve this:
Sliding Window (messages): keeps only the last N messages. Set memoryType: sliding_window and slidingWindowSize: 10, and the agent always has exactly the last 10 messages in context. Deterministic, predictable cost. Best for agents where recent context matters more than ancient history — like a sales negotiation agent tracking the last few exchanges.
Sliding Window (tokens): keeps messages up to a token budget. Set memoryType: sliding_window_tokens and slidingWindowTokens: 4000, and the agent maintains context up to 4,000 tokens — roughly 3,000 words of history. This is more precise for cost control because different messages have different token counts. Best for agents on tight per-execution budgets.
The critical insight: these modes are not mutually exclusive with persistent storage. The sliding window controls what goes into the LLM's context window, but the full conversation is still persisted on the server. You can switch modes without losing history — the window just changes what's recalled.
The Agent block's built-in memory handles natural conversation continuity. But sometimes you need explicit, programmatic control over what gets remembered.
The Memory block provides CRUD operations on a conversation store:
Add Memory (operation: add): manually store a message with a specific role (user/assistant/system), conversationId, and content. Use this to inject context that didn't come from a conversation — like "this user is on the enterprise plan" or "this deal has a hard deadline of Friday."
Get Memory (operation: get): retrieve a specific memory by ID. Use this for conditional logic — "if previous_interaction_sentiment is negative, escalate to human."
Get All Memories (operation: getAll): retrieve the full memory store for a conversation. Use this for analysis, export, or migration.
Delete Memory (operation: delete): remove stale or incorrect memories. Essential for compliance (GDPR right to erasure) and data hygiene.
The Memory block is designed to be used in conjunction with Agent blocks, not instead of them. The guidance is explicit: "Do not use this block unless the user explicitly asks for it. Used in conjunction with agent blocks to inject artificial memory into the conversation. For natural conversations, use the agent block memory modes directly instead."
For agents that need to learn beyond simple conversation history — to extract semantic facts, recognize patterns, and build a knowledge graph — AACFlow integrates with two specialized memory services:
Mem0 handles semantic memory. Instead of storing raw conversation text, Mem0 extracts factual assertions: "User prefers communication via Telegram," "Company budget cycle resets in Q1," "Product X has a known issue with Safari." These are stored as structured facts that can be queried semantically — "what do we know about this user's preferences?" — rather than via exact keyword match.
Zep handles episodic memory. It stores the narrative of what happened: "On March 12, the user reported a billing error. The agent investigated and found a duplicate charge. The issue was resolved on March 14." Zep's temporal understanding allows queries like "what issues has this user had in the last 90 days?" with proper chronological context.
These are not replacements for the built-in memory system. They're complementary layers for agents that need to build genuine institutional knowledge over months and years of operation.
The conversationId is not just an internal key. It's a multi-tenancy primitive. By parameterizing the conversationId with a workflow variable — <trigger.userId>, <input.customerId>, <webhook.accountId> — a single Agent block serves thousands of users simultaneously, each with isolated memory.
A support agent deployed once handles all customers. A sales agent deployed once tracks all deals. A CEO briefing agent deployed once serves the entire executive team, with each executive getting a personalized conversationId that only surfaces their context.
This is how you build agents that scale. Not by deploying N instances for N users, but by deploying one agent with N conversation contexts.
The difference between a demo agent and a production agent is memory. A demo agent answers a question well once. A production agent answers it better the second time, and better still the tenth, because it remembers every previous interaction and builds on them.
At AACFlow, we've made memory a configuration choice, not an engineering project. Choose your mode, set your conversation ID, and the agent remembers. Add the Memory block when you need explicit control. Layer on Mem0 and Zep when you need semantic understanding. The infrastructure is there — you focus on what the agent should do with what it remembers.