Alexandr Chibilyaev on the connector architecture that syncs 170+ external services into the AACFlow knowledge base — turning CRM records, support tickets, wiki pages and financial documents into AI agent memory.
An AI agent is only as smart as the data it can access. Give an agent access to your CRM, your support tickets, your wiki, your financial documents — and it transforms from a generic chatbot into a domain expert that knows your business better than most employees.
This is the promise of the AACFlow Knowledge Base. And it's powered by 170+ connectors — each one a bridge between an external service and your agent's brain.
Most AI platforms offer "RAG" — Retrieval-Augmented Generation. You upload a PDF, it gets chunked, embedded, and stored in a vector database. The agent searches it and uses relevant chunks in its responses.
This works for static documents. It fails for living business data.
Your CRM changes every hour. New leads come in. Deals move through pipelines. Support tickets get resolved. A PDF uploaded last week is already stale. An agent working with stale data makes wrong decisions — and wrong decisions cost money.
AACFlow solves this with a living knowledge base powered by connectors that continuously sync external data:
Connect — pick a service from 170+ options: AmoCRM, Bitrix24, Gmail, Confluence, 1C, Wildberries...
Configure — set filters: which deals, which date range, which document types
The sync engine doesn't know or care what service it's talking to. It calls listDocuments with a cursor, gets a page of results, persists them, and requests the next page. When incremental sync runs, it passes lastSyncAt so the connector only returns changed documents.
This normalization is the secret sauce. The agent doesn't need to understand "this is a CRM deal with custom field X." It sees: "here's a document titled 'Deal with Company Y', with this content, last modified on this date." The metadata field preserves source-specific data for filtering and display — but the agent's RAG pipeline works on the uniform title + content.
Re-syncing every document on every sync run would be wasteful and slow. The connector architecture uses content hashing to detect changes:
On first sync: every document is fetched, hashed, and stored
On incremental sync: the connector returns documents modified after lastSyncAt
For each returned document: the engine compares the hash with the stored version
If the hash matches → document is unchanged, skip processing
If the hash differs → document was modified, re-embed and update
This means incremental syncs process only what actually changed. A knowledge base with 50,000 documents where 200 changed today processes exactly 200 documents — not 50,000.
Some services have millions of documents. Fetching them all in one request is impossible. Every connector's listDocuments supports cursor-based pagination:
The sync engine handles the pagination loop automatically. The connector just returns { documents, nextCursor, hasMore }. The engine manages rate limits, retries, and progress tracking across pages.
Raw document content is useful for semantic search. But sometimes you need structured filtering: "show me all deals worth more than $10,000" or "only documents from the 'Legal' space."
Connectors implement tagDefinitions and mapTags to extract structured metadata:
These tags populate dedicated columns in the document table. Users can filter the knowledge base by tag values. Agents can use tag filters in their queries: "search for documents about refunds, but only from deals in the 'negotiation' stage."
OAuth 2.0 — for services like Gmail, Google Drive, Confluence. The user goes through a standard OAuth flow. Tokens are stored encrypted and automatically refreshed. The connector receives a fresh accessToken on every call.
API Key — for services like AmoCRM, Bitrix24, 1C. The user provides an API key or credentials in the connector config. Keys are stored encrypted at rest. The connector receives them via sourceConfig.
The auth system is shared with the rest of the platform. A user who connected Gmail for the knowledge base doesn't need to re-authenticate to use the Gmail block in a workflow. One connection, many uses.
This catches problems immediately: wrong API key, insufficient permissions, network issues. The user sees a clear error in the UI, not a mysterious sync failure hours later.
Every API has undocumented behavior. Gmail's pagination behaves differently for labels with >10,000 threads. Confluence's CQL has undocumented reserved words that break queries. Wildberries' API returns different field names in different endpoints. You discover these by building, testing, and encountering real-world edge cases.
Rate limiting is the universal challenge. Every service has limits, and they're always lower than documented. We built an adaptive rate limiter that starts conservative and gradually increases throughput, backing off when it hits limits. This "learn by doing" approach works better than trusting docs.
Incremental sync is harder than full sync. Detecting what changed since last sync requires the source API to support modifiedAfter filtering — and many don't, or implement it incorrectly. For those, we fall back to full sync with content hashing. It's less efficient but always correct.
Text extraction is an art. CRM deals are structured JSON, easy to convert. Confluence pages are HTML with macros, tables, and embedded content — hard to extract clean text from. PDFs from Diadoc are scanned documents requiring OCR. Each format needs its own extraction pipeline.
Every single one of these connectors follows the same ConnectorConfig contract. They're tested, documented, and ready to sync your business data into your agents' knowledge.