Building Enterprise RAG Framework
Why I Stopped Building
Bespoke AI Systems
for Every Client
A behind-the-scenes look at the reusable Enterprise RAG Framework I deploy across client engagements — and why the architecture decisions that seem over-engineered are the ones that matter most in production.
Early in my AI consulting work, I made the same mistake most architects make: I built bespoke AI systems for each client. New client, new ingestion pipeline. New client, new retrieval logic. New client, new governance layer. Three months of work, repeated from scratch, every engagement.
The breaking point came when a second financial services client asked for almost exactly what I had built for the first — a system to query regulatory documents in natural language, with citations, confidence scoring, and a full audit trail. The underlying problem was identical. The technology was identical. The only thing different was the document corpus and the cloud provider.
That’s when I stopped building projects and started building a framework.
What the Framework Actually Is
The Enterprise RAG Architecture Framework is a production-grade, configurable retrieval-augmented generation platform I deploy and adapt for each client engagement. It handles the foundational concerns that every enterprise AI deployment shares — secure ingestion, governed retrieval, auditable LLM orchestration, and a REST API with access control baked in.
Think of it the way a systems integrator brings a reference architecture to an engagement, rather than designing from first principles every time. The framework is the reference architecture. Each deployment is a configured instance of it.
What makes it a framework rather than a project is the deliberate abstraction at every layer. The LLM provider, vector store, document corpus, agent workflows, and governance configuration are all externalised. Changing them requires configuration, not code.
The Four Layers — and Why Each One Exists
PDF loading, text cleaning, semantic chunking, metadata tagging, and embedding generation. Idempotent — safe to re-run when documents are updated.
Vector search plus BM25 keyword search, fused using Reciprocal Rank Fusion. Query analyser classifies intent and applies metadata filters before retrieval.
LangChain RAG chain for factual queries. LangGraph multi-agent graph for complex workflows — comparison, gap analysis, multi-document synthesis.
RBAC with role-based jurisdiction restrictions, prompt injection guard, append-only audit log, confidence scoring, and citation enforcement. Ships standard.
Governance is a first-class framework component — not a client-specific add-on. Every enterprise deployment, regardless of sector, needs identity and access control, an audit trail, prompt safety controls, and confidence signalling. Building these once and configuring them per engagement is the only sensible approach at consulting scale.
The Financial Services Configuration
The deployment I’ll walk through here is the financial services configuration — built for regulatory compliance Q&A. The corpus is publicly available regulatory documents: Basel Framework publications from the Bank for International Settlements and OSFI guidelines from Canada’s federal banking regulator.
The use case is concrete: compliance analysts at a universal bank spend 60–80% of their research time manually searching regulatory PDFs to answer questions like “What are the minimum CET1 capital requirements under Basel III?” or “Does our current liquidity framework satisfy OSFI B-10 obligations?” The framework answers both — with citations, confidence scores, and a full audit trail of who asked what and when.
Here is what a live response looks like:
Every factual claim is cited. Every response carries a confidence assessment. Every query is logged to an immutable audit trail. These aren’t features — they’re requirements in a regulated environment, and they’re delivered by the framework, not built per engagement.
The Decision That Changes Everything — Hybrid Retrieval
Most RAG implementations use pure vector search. It works well for conceptual queries. It fails badly for one specific class of query that is extremely common in enterprise contexts: exact citation lookup.
A question like “What is the spirit of Basel capital requirements?” is conceptual — vector search handles it well. A question like “Article 147(2)(b) counterparty credit risk weighting” requires exact term matching. Vector search will miss it if the semantically similar chunks don’t happen to embed that precise citation.
The framework uses Reciprocal Rank Fusion — a rank-based merging algorithm that combines vector search candidates and BM25 keyword search candidates without requiring score normalisation. Documents appearing in both result sets receive a compounding boost. The result is a 15–25% improvement in retrieval precision over pure vector search, and a 35 percentage point improvement in citation recall.
The Agentic Layer — When a Single Retrieval Pass Isn’t Enough
Simple factual queries — single retrieval, single generation. But enterprise use cases rarely stay simple. Gap analysis between an internal policy and a regulatory requirement needs multiple retrieval passes against different document sets, independent reasoning over each, and a synthesis step that produces a risk-rated output.
The framework’s LangGraph multi-agent layer handles exactly this. A supervisor node classifies the query and routes it through the appropriate agent sequence:
| Query Type | Example | Agent Path |
|---|---|---|
| Factual | What is the CET1 minimum? | RAG Chain only |
| Procedural | How do I implement an ICAAP? | RAG Chain only |
| Comparative | Basel III vs Basel IV liquidity rules | Retrieval → Comparison → Summary |
| Gap Analysis | Does our policy satisfy OSFI B-10? | Retrieval → Gap Analysis → Summary |
The supervisor routing is rule-based, not LLM-driven. Routing decisions are free, instant, and fully auditable — I can explain every routing decision without inspecting LLM outputs. LLM tokens are spent on agents that the query actually requires, not on routing logic.
The Architecture Decisions That Made This a Framework
Three decisions separate a reusable framework from a one-time project. Each is documented in an Architecture Decision Record — a discipline I apply across every engagement so decisions are portable, reversible, and explainable.
The Tech Stack
Orchestration
Cloud — Local / Production
Governance
What Makes It Reusable Across Industries
The same framework deployed against a different corpus becomes a different product. The architecture doesn’t change — the configuration does.
| Industry | Corpus | Agent Workflows | Governance Config |
|---|---|---|---|
| Financial Services | Basel, OSFI, FCA rulebooks | Gap analysis, comparison | Jurisdiction RBAC, immutable audit |
| Healthcare | Clinical guidelines, formularies | Treatment comparison, protocol Q&A | HIPAA audit controls, role restrictions |
| Legal | Case law, contracts, statutes | Precedent search, clause analysis | Matter-level access control |
| Enterprise Internal | Policies, SOPs, knowledge base | Factual Q&A, procedural guidance | Department-level RBAC |
A client engagement that would take 12 weeks to deliver from scratch takes 4–6 weeks with the framework. The first two weeks are corpus preparation and client-specific agent workflow design. The governance layer, retrieval architecture, and API layer are already built, tested, and documented. That time saving is the commercial case for framework thinking over project thinking.
The Lesson I’d Give Every AI Architect
The temptation in AI consulting is to let the technology lead. A new client arrives with a new problem, a new vector store is trending on Hacker News, a new agent framework just dropped — and the instinct is to start fresh, incorporate everything new, build something impressive.
The more durable instinct is to ask: what part of this problem have I solved before? What decision did I make three engagements ago that I should be able to reuse today? What governance control did I build last year that every client since then has needed?
The RAG framework I’ve described here took time to build. But it now represents a compounding asset — each engagement makes it more capable, more documented, and more configurable. Each ADR I write makes the next vendor evaluation faster. Each governance component I build makes the next regulated deployment safer.
That’s the difference between an AI project and an AI practice.
Check the Details
The financial services configuration is open source — architecture diagrams, ADRs, and working code available on GitHub.