Alexandr Chibilyaev shares the public AACFlow roadmap for 2026-2027 — Block SDK, agent marketplace, streaming outputs, conditional branching, AI trace analysis, distributed tracing, air-gapped deployment, and the philosophy that guides every feature decision.
Roadmaps are promises. And in software, promises are dangerous — the industry is littered with "coming soon" pages that never materialized and feature announcements that quietly disappeared.
So let me be clear about what follows: this is our current plan, based on what we know today, funded by revenue from 60,000+ developers, shaped by the feature requests and pain points we hear every day. It will change. Some things will ship earlier. Some will take longer. Some will be replaced by better ideas we haven't had yet.
But the direction — the philosophy — won't change. That's what this post is really about.
Build infrastructure, not demos. Every feature we ship must work in production at scale. It must handle errors, edge cases, and the messy reality of real-world automation. If a feature only works in a carefully controlled demo environment, it doesn't ship.
Open protocols over walled gardens. The AI industry is fragmenting into competing ecosystems. OpenAI has its agent SDK. Anthropic has its MCP. Google has its Vertex AI Agent Builder. Each wants you to build exclusively on their platform.
We believe the opposite: agents should be portable. They should work across models, across providers, across deployment environments. That's why we invest in open protocols — A2A (Agent-to-Agent) for inter-agent communication, MCP (Model Context Protocol) for tool integration. These aren't AACFlow proprietary standards. They're open specifications that any platform can implement.
Reliability over hype. It's tempting to ship flashy features that generate Twitter buzz. AI-generated workflow suggestions! Autonomous agent swarms! But buzz doesn't help when a workflow fails silently in production at 3 AM. Reliability does. We prioritize features that make agents more dependable — observability, error recovery, state management — over features that make demos more impressive.
Practical examples over abstract capabilities. A feature isn't done when the code works. It's done when we can show a real business using it to solve a real problem. Every roadmap item below is paired with the concrete use case it enables.
Today, AACFlow's 213 blocks are built by our team. That's a bottleneck — we can't build every block every user needs. The Block SDK changes that.
The SDK will let any developer build and publish a custom block. A block is a self-contained module with:
Configuration schema — what parameters does the block accept? What's their types, defaults, and validation rules?
Input/output schema — what data does the block consume and produce? Type-safe, Zod-validated schemas.
Execution logic — the actual code that runs when the block executes. Can call external APIs, transform data, interact with databases.
Error handling — declared error types with recovery strategies (retryable, non-retryable, requires human intervention).
UI component — how does the block appear in the visual editor? Icon, color, configuration panel.
Blocks built with the SDK will be first-class citizens in the platform. They'll appear in the block library alongside native blocks. They'll work with the DAG executor, the observability layer, the retry logic — everything. There's no "custom block ghetto" where third-party blocks have fewer capabilities than native ones.
What this enables: A logistics company builds a custom block for their proprietary routing algorithm. An accounting firm builds a block for a specific tax calculation that only applies in their jurisdiction. A marketing agency builds blocks for their clients' unique analytics pipelines. The platform becomes extensible without our team being the bottleneck.
Technical approach: The SDK will be a TypeScript package (@aacflow/block-sdk) with a CLI for scaffolding, testing, and publishing blocks. Published blocks will be versioned with semver. The visual editor will display compatibility information — "This block requires AACFlow v2.4+." We're considering a block registry (npm-like) where developers can publish blocks publicly or keep them private to their workspace.
Once third-party blocks exist, there needs to be a place to discover, install, and share them. The agent marketplace is that place.
Think of it as an app store for AI agent capabilities:
Search and discovery. Browse by category (CRM, e-commerce, finance, communication), popularity, rating, and recency.
One-click install. Add a marketplace block to your workspace. It appears in your block library. It works.
Ratings and reviews. Users rate blocks on reliability, documentation quality, and usefulness. Reviews surface edge cases and integration tips.
Verified publishers. Trusted organizations (Stripe, AmoCRM, 1C) can publish official blocks with a verification badge.
Pricing. Free blocks. Paid blocks (one-time purchase or subscription). Revenue share with developers.
Workspace-level visibility. Blocks can be public (visible to all AACFlow users) or private (visible only to your workspace members).
The marketplace turns AACFlow from a platform with 213 blocks into a platform with unlimited blocks — built by the community, for the community, covering the long tail of integrations no single company could build alone.
What this enables: A developer in Novosibirsk builds a block for an obscure Russian regional bank's API. 50 other businesses in the same region discover it, install it, and automate their bank reconciliation. The developer earns recurring revenue. The businesses save weeks of custom development. AACFlow becomes more useful for everyone.
Today, when a workflow executes, you see nodes light up as they complete. But you don't see what's happening inside a long-running node until it finishes.
Streaming node outputs change that. When an LLM node is generating a response, you'll see the tokens appear in real-time in the node's output panel. When an HTTP request node is downloading a large file, you'll see the progress bar. When a data transformation node is processing 50,000 records, you'll see the counter increment.
This matters for two reasons:
Debugging. If an LLM node is producing a low-quality response, you see it immediately — not after the full response is generated and the workflow completes. You can stop the execution, fix the prompt, and rerun. Debugging cycles go from minutes to seconds.
User experience. For human-in-the-loop workflows, seeing partial results lets the human start evaluating while the agent is still working. An agent that's summarizing 100 support tickets can show the first 10 summaries while it processes the remaining 90.
Technical approach: Streaming will use Server-Sent Events (SSE) over the existing Socket.IO infrastructure. Each node can emit progress events with structured data: tokens for LLM nodes, bytes downloaded for HTTP nodes, records processed for data transformation nodes. The visual editor renders these events as they arrive.
Today, conditional logic in AACFlow is expressed through block configuration: an "If/Else" block that evaluates a condition and routes execution down one of two paths. This works, but it's visual noise — what should be a simple "if this, do that" becomes three blocks (condition check + two branches) with configuration panels to open.
Conditional branching on canvas makes this visual. You'll be able to draw branches directly on the canvas:
1
┌──→ [Send SMS notification]
2
[Check priority] ───┤
3
└──→ [Send email notification]
A single node with multiple output edges, each labeled with a condition. priority === "high" goes to SMS. priority !== "high" goes to email. The conditions are expressed in a simple expression language:
This makes workflows more readable — the branching logic is visible on the canvas, not hidden in configuration panels. It's the visual equivalent of if/else in code, and it's how most users intuitively expect conditional logic to work.
Sub-workflows let you encapsulate a sequence of blocks into a reusable unit. Think of it as a function in programming: define it once, call it from multiple places, pass parameters in, get results out.
What this enables:
Reusability. Define a "Verify Counterparty" sub-workflow that checks EGRUL, FNS, and FSSP. Use it in every workflow that deals with counterparties. When the verification logic changes, update it once — every parent workflow gets the update.
Abstraction. A complex workflow becomes readable when implementation details are hidden behind named sub-workflows. "Onboard New Client" might contain "Verify Counterparty," "Create 1C Record," "Send Welcome Email" — each a sub-workflow, each understandable at a glance.
Team collaboration. A senior engineer builds the "Fraud Detection" sub-workflow. Junior team members use it in their customer service workflows without needing to understand the fraud detection logic.
Marketplace potential. Sub-workflows can be published to the agent marketplace. A tax accounting firm could publish a "Russian Tax Calculation Q4 2026" sub-workflow that other businesses install and use.
Sub-workflows are parameterized: the parent workflow passes data in, the sub-workflow processes it, and returns results. The sub-workflow has its own execution trace, nested inside the parent's trace. Observability is preserved at every level.
Today, when a workflow fails, you look at the execution trace and diagnose the problem yourself. This works — the traces are detailed — but it requires the user to (a) notice the failure, (b) open the trace, (c) understand what went wrong, and (d) know how to fix it.
AI-powered trace analysis automates steps (b) through (d). When a workflow execution fails, an analysis agent automatically:
Examines the execution trace — which node failed, what was the error, what were the inputs and outputs leading up to the failure
Identifies the root cause — "The Stripe API returned a 401 Unauthorized error on node charge_card, suggesting the API key has expired or been revoked."
Suggests a fix — "Update your Stripe API key in the connector configuration. The current key was issued on March 12, 2026 and may have been rotated."
Optionally applies the fix — for simple issues (retry the execution, adjust a timeout, switch to a fallback provider), the agent can fix and rerun automatically with user approval.
The analysis is presented as a natural language summary alongside the technical trace. A non-technical user sees: "Your payment workflow failed because the Stripe connection needs to be refreshed. Click here to update your API key." A technical user can drill down into the full trace for deeper investigation.
Today, you can see workflow failures if you open the dashboard and check. That's fine for workflows you run manually. It's useless for always-on agents that run unattended.
The alerting system lets you configure:
Thresholds. "Alert me if the error rate for any always-on agent exceeds 5% in the last hour." "Alert me if the p95 execution latency exceeds 30 seconds."
Channels. Slack, email, Telegram, in-app notification, webhook (so you can integrate with PagerDuty, OpsGenie, or your own alerting infrastructure).
Severity levels.info (something interesting happened), warning (something might be wrong), critical (something is definitely wrong and needs immediate attention).
Silence windows. "Don't alert me about the nightly data sync workflow failures between 2 AM and 6 AM — I know it's flaky and I'm fixing it next sprint."
Alert grouping. If 50 workflows fail within 30 seconds because of the same root cause (e.g., OpenAI outage), you get one grouped alert — not 50 individual ones.
Today, an AACFlow workflow is a single execution graph. But as agents become more sophisticated, they need to communicate with other agents — an agent in one workspace calling an agent in another, an AACFlow agent calling an external agent via A2A, or a hierarchy of agents delegating sub-tasks to specialized sub-agents.
Distributed tracing follows an execution across agent boundaries:
1
[Workspace A: Order Processing Agent]
2
├── Node: Classify Order → "premium_customer"
3
├── Node: Check Inventory → {in_stock:true}
4
├── Node:[Sub-call: Workspace B — Fraud Detection Agent]
5
│ ├── [Workspace B] Check Transaction History → "clean"
6
│ └── [Workspace B] Verify Address → "verified"
7
├── Node: Process Payment → success
8
└── Node: Generate Shipping Label → "LP-982341"
The trace spans workspaces, services, and even external agent calls. Every step is correlated with a single traceId that propagates across boundaries. This is infrastructure-level tracing — the same pattern used by microservice observability tools like Jaeger and Zipkin, applied to AI agents.
What this enables: You can debug an agent ecosystem the same way you debug a microservice architecture. If an order processing agent fails, you can trace through to the fraud detection agent it called, find the failing node, and diagnose the issue — even if the two agents were built by different teams and run on different infrastructure.
Today's metrics dashboard shows platform-wide aggregates: total executions, success rate, credit consumption. Custom dashboards let you build focused views:
"Show me the error rate for all always-on agents in the Logistics workspace, broken down by connector"
"Show me credit consumption by provider for the last 30 days, with a running total"
"Show me the 10 slowest workflows this week, with their p95 latency"
"Show me Chestny Znak reporting compliance: which products missed their reporting deadline this month?"
Widgets are configurable: line charts, bar charts, pie charts, single-stat panels, tables. Dashboards are workspace-scoped and shareable. They're the operational command center for teams running agent fleets at scale.
Today, self-hosted AACFlow runs via Docker Compose. That's fine for single-server deployments. It's not adequate for organizations that need:
High availability — multiple replicas behind a load balancer, with automatic failover
Horizontal scaling — scale the executor tier independently from the API tier
Rolling updates — deploy new versions without downtime
Infrastructure as code — manage AACFlow alongside other services in a GitOps workflow
The Helm chart will provide a production-grade Kubernetes deployment: separate deployments for the web server, executor workers, and background job processors; PostgreSQL with replication; Redis with Sentinel for high availability; ingress configuration with TLS termination; resource limits and autoscaling policies.
The Terraform module will provision the cloud infrastructure: VPC, subnets, security groups, Kubernetes cluster (EKS/GKE), database instance, Redis instance, DNS records, SSL certificates. A single terraform apply provisions a production-ready AACFlow deployment.
Some organizations operate in environments with no internet connectivity: defense contractors, government agencies, financial institutions with strict network security policies. These organizations still need AI agents — but the agents must run entirely within the air-gapped network.
Air-gapped deployment mode enables:
Offline model serving. Run LLMs locally on the air-gapped infrastructure (via Ollama, vLLM, or custom model servers). No external API calls.
Local connector execution. Connectors that don't require internet access (PostgreSQL, internal APIs, file system) work normally. Connectors that require external APIs are disabled or configured with proxy access through the security boundary.
Offline updates. Platform updates and block/connector updates are delivered as signed archive files that can be transferred into the air-gapped environment via approved media.
No telemetry. Zero data leaves the air-gapped environment. No usage analytics. No crash reports. No license validation calls. The platform operates entirely self-contained.
This is a specialized deployment mode for a specific set of customers. But for those customers, it's not optional — it's the difference between "we can use AACFlow" and "we can't."
When we add a feature or refactor a component, we need to know: does this break any existing workflows? For 60,000+ users with thousands of unique workflow configurations, manual testing is impossible.
Automated migration testing will run representative workflows from the user base (anonymized, with user consent) against new platform versions before release. If a workflow that succeeded on v2.3.0 fails on v2.4.0, the release is blocked until the regression is understood and fixed.
This is a massive investment in reliability. It means slower releases (each release must pass the migration test suite), but dramatically fewer "I upgraded and my workflows broke" incidents. For a platform handling production business automation, that trade-off is worth it.
Today, each agent execution is stateless — it runs, produces a result, and terminates. The next execution starts fresh. This is intentional (state is the enemy of reliability), but it's limiting. Agents can't learn from past executions. They can't maintain context across interactions.
Team-level agent memory gives agents a persistent, shared memory store:
Execution history. "Last week, this same type of support ticket was resolved by escalating to the billing team. Try that first."
User preferences. "When generating reports for the CFO, use the formal template, not the casual one."
Learned corrections. "The Stripe API has been returning amount_decimal as a string since their June update. Parse it before passing to the invoice generator."
Team knowledge. "The warehouse team prefers Telegram notifications for inventory alerts, not email."
Memory is scoped to the workspace — agents in the Sales workspace don't have access to the Support workspace's memory. Memory entries have TTLs (time-to-live) — "this Stripe API quirk" might be irrelevant after Stripe fixes their API. Memory is queryable and editable by humans — if the agent learned something wrong, you can correct it.
The general-purpose AI models are powerful but generic. An agent using GPT-4o knows about business in general — it doesn't know about your business. Your product catalog. Your pricing rules. Your return policy. Your communication style.
Agent training lets you fine-tune an agent's behavior for your specific business context:
Knowledge base fine-tuning. Upload your product catalog, support documentation, internal wikis. The agent uses this as its primary knowledge source.
Example-driven training. "Here are 50 examples of how we handle refund requests." The agent learns the pattern and applies it to new requests.
Style calibration. "Here are 20 examples of our brand's communication tone." The agent matches the tone in all generated content.
Decision boundary training. "These types of orders should always be flagged for manual review." The agent learns the boundary between "handle automatically" and "escalate to human."
Training is workspace-scoped. The Sales team's agent learns the sales playbook. The Support team's agent learns the support knowledge base. The Logistics team's agent learns the warehouse procedures. Each agent gets smarter about its specific domain without making other agents worse at theirs.
Today, choosing a model for a task is based on reputation and pricing: "Claude is good at writing, GPT-4o is good at reasoning, Haiku is fast and cheap." This is directionally correct but imprecise.
Automated benchmarking will evaluate models against your actual use cases:
"For classifying support tickets into categories, GPT-4o-mini achieves 94% accuracy at 0.5 credits. Claude Haiku achieves 92% at 0.5 credits. DeepSeek V3 achieves 91% at 2.0 credits."
"For generating marketing copy, Claude Sonnet scores highest on our human evaluation criteria."
"For extracting structured data from invoices, GPT-4o with vision is the clear winner."
Benchmarks run continuously as new models are released. When Anthropic releases Claude 4, within 24 hours you'll have data on whether it's better than Claude 3.5 for your specific use cases — not just generic benchmarks from a leaderboard.
The routing layer will use benchmark data to make smarter model selection decisions: not just "pick the cheapest model," but "pick the model that balances cost and quality for this specific task type."
For some use cases, even the best general-purpose model isn't good enough. You need a model fine-tuned on your data: your product catalog, your support transcripts, your industry's terminology.
Fine-tuned model hosting lets you:
Upload a fine-tuned model (OpenAI fine-tuning output, LoRA adapters, full model weights)
Deploy it as a private endpoint within AACFlow
Route traffic to it through the same AI API layer as any other provider
Benchmark it against general-purpose models to measure the improvement
The hosted model is private to your workspace. Other workspaces can't access it. Usage is tracked through the credit system (credits cover the inference compute, not the training).
When the stakes are high — drafting a legal contract, generating a financial report, making a medical recommendation — one model's output isn't enough. You want multiple independent models to produce answers and a consensus mechanism to select the best one.
Multi-model ensembles will support:
Parallel execution. Send the same prompt to 3-5 different models simultaneously.
Consensus mechanisms. Majority voting (for classification tasks). Best-of-N with a scoring model. Human review of diverging outputs.
Debate mode. Model A generates a response. Model B critiques it. Model A revises. Model C scores the final output. This produces significantly better results than any single model for complex reasoning tasks.
Ensemble traces. See what each model produced, how the consensus was reached, and why the final output was selected.
This is power-user functionality. Most workflows don't need it. But for high-stakes decisions where accuracy matters more than cost, it's transformative.
Today, you see your credit consumption after it happens. Cost forecasting shows you what's coming:
"Based on your usage patterns over the last 30 days, your projected monthly credit spend is 85,000 credits (€170)."
"The new always-on agent you added this week will add approximately 2,400 credits/month based on its configured polling interval."
"If you switch your bulk summarization tasks from GPT-4o to GPT-4o-mini, you'll save approximately 18,000 credits/month with an estimated quality impact of less than 2%."
Forecasting uses historical usage patterns, configured trigger intervals, and benchmark data to predict future costs. It's not exact (usage patterns change, model pricing changes), but it's significantly better than "wait for the bill and hope."
Just as important as what we're building is what we're not building:
We're not building our own LLM. The world has enough foundation model companies. We're building the infrastructure that makes those models useful — the operating system, not the CPU.
We're not building a no-code platform for non-technical users. AACFlow is for developers, technical operators, and power users who understand the systems they're automating. We're not dumbing down the product for a hypothetical "citizen developer" who doesn't understand what an API is.
We're not pivoting to "enterprise-only." Our free tier stays. Our self-serve pricing stays. The enterprise features are for teams that need them, but the platform remains accessible to solo developers and small teams.
We're not chasing AI hype cycles. When "agent swarms" are trending on Twitter, we won't ship a half-baked implementation. When "autonomous recursive self-improving agents" become the buzzword, we won't pretend our product does that. We ship features when they're production-ready, not when they're tweetable.
The roadmap is shaped by our users. Here's how to influence it:
Feature requests on GitHub. We track everything. Upvotes matter.
Support conversations. When you describe a workflow that's painful to build, we hear a feature gap.
Cancellation reasons. When someone leaves, we ask why. Those answers shape priorities more than feature requests from active users.
Enterprise customer calls. Our largest customers have dedicated calls where they tell us exactly what they need.
The most influential feedback is specific: "I need to do X, and currently AACFlow makes it painful because Y. If Z existed, I could do X in 10 minutes instead of 2 hours."
Two years ago, AACFlow was a single developer's tool for automating his own work. Today, it's infrastructure for 60,000+ developers running 10 million agent executions per month. The trajectory is clear: AI agents are becoming a standard part of the software stack, and AACFlow is becoming the standard platform for running them.
The roadmap above represents our best understanding of what our users need to make agents more capable, more reliable, and easier to build. It will evolve. Some items will be replaced by better ideas. New items will be added as the AI landscape shifts.
What won't change: our commitment to building infrastructure, not demos. To open protocols over walled gardens. To reliability over hype. And to building the platform we'd want to use ourselves.
If that sounds like the kind of platform you want to build on — join us. The roadmap is ambitious. The work is hard. And it's just getting started.