When I set out to build a serious body of work in enterprise GenAI architecture, I faced a design question: build one bespoke demonstration of how it should be done, or build something reusable that encodes the patterns once and configures them per context. The instinct most architects start with is the former — new problem, new ingestion pipeline, new retrieval logic, new governance layer. Three months of work, every time.
But the second enterprise GenAI engagement an architect imagines is almost identical to the first — a system to query regulated documents in natural language, with citations, confidence scoring, and a full audit trail. The underlying problem is the same. The technology is the same. The only things that change are the document corpus and the cloud provider.
So I built a framework instead of a one-time project.
What the Framework Actually Is
The Enterprise RAG Architecture Framework is a production-grade, configurable retrieval-augmented generation platform — built as a reference implementation designed to be deployed and adapted per engagement. It handles the foundational concerns that every enterprise AI deployment shares — secure ingestion, governed retrieval, auditable LLM orchestration, and a REST API with access control baked in.
Think of it the way a systems integrator brings a reference architecture to an engagement, rather than designing from first principles every time. The framework is the reference architecture. Each deployment is a configured instance of it.
What makes it a framework rather than a project is the deliberate abstraction at every layer. The LLM provider, vector store, document corpus, agent workflows, and governance configuration are all externalised. Changing them requires configuration, not code.
The Four Layers — and Why Each One Exists
PDF loading, text cleaning, semantic chunking, metadata tagging, and embedding generation. Idempotent — safe to re-run when documents are updated.
Vector search plus BM25 keyword search, fused using Reciprocal Rank Fusion. Query analyser classifies intent and applies metadata filters before retrieval.
LangChain RAG chain for factual queries. LangGraph multi-agent graph for complex workflows — comparison, gap analysis, multi-document synthesis.
RBAC with role-based jurisdiction restrictions, prompt injection guard, append-only audit log, confidence scoring, and citation enforcement. Ships standard.
Governance is a first-class framework component — not a client-specific add-on. Every enterprise deployment, regardless of sector, needs identity and access control, an audit trail, prompt safety controls, and confidence signalling. Building these once and configuring them per engagement is the only sensible approach at consulting scale.
The Financial Services Configuration
The configuration I'll walk through here is the financial services one — built for regulatory compliance Q&A as a reference implementation. The corpus is publicly available regulatory documents: Basel Framework publications from the Bank for International Settlements and OSFI guidelines from Canada's federal banking regulator.
The use case is concrete: compliance analysts at a universal bank spend 60–80% of their research time manually searching regulatory PDFs to answer questions like "What are the minimum CET1 capital requirements under Basel III?" or "Does our current liquidity framework satisfy OSFI B-10 obligations?" The framework answers both — with citations, confidence scores, and a full audit trail of who asked what and when.
Here is a sample response from the prototype against the public Basel corpus:
Every factual claim is cited. Every response carries a confidence assessment. Every query is logged to an immutable audit trail. These aren't features — they're requirements in a regulated environment, and they're delivered by the framework, not built per engagement.
The Decision That Changes Everything — Hybrid Retrieval
Most RAG implementations use pure vector search. It works well for conceptual queries. It fails badly for one specific class of query that is extremely common in enterprise contexts: exact citation lookup.
A question like "What is the spirit of Basel capital requirements?" is conceptual — vector search handles it well. A question like "Article 147(2)(b) counterparty credit risk weighting" requires exact term matching. Vector search will miss it if the semantically similar chunks don't happen to embed that precise citation.
The framework uses Reciprocal Rank Fusion — a rank-based merging algorithm that combines vector search candidates and BM25 keyword search candidates without requiring score normalisation. Documents appearing in both result sets receive a compounding boost. The result is a 15–25% improvement in retrieval precision over pure vector search, and a 35 percentage point improvement in citation recall.
The Agentic Layer — When a Single Retrieval Pass Isn't Enough
Simple factual queries — single retrieval, single generation. But enterprise use cases rarely stay simple. Gap analysis between an internal policy and a regulatory requirement needs multiple retrieval passes against different document sets, independent reasoning over each, and a synthesis step that produces a risk-rated output.
The framework's LangGraph multi-agent layer handles exactly this. A supervisor node classifies the query and routes it through the appropriate agent sequence:
| Query Type | Example | Agent Path |
|---|---|---|
| Factual | What is the CET1 minimum? | RAG Chain only |
| Procedural | How do I implement an ICAAP? | RAG Chain only |
| Comparative | Basel III vs Basel IV liquidity rules | Retrieval → Comparison → Summary |
| Gap Analysis | Does our policy satisfy OSFI B-10? | Retrieval → Gap Analysis → Summary |
The supervisor routing is rule-based, not LLM-driven. Routing decisions are free, instant, and fully auditable — I can explain every routing decision without inspecting LLM outputs. LLM tokens are spent on agents that the query actually requires, not on routing logic.
The Architecture Decisions That Made This a Framework
Three decisions separate a reusable framework from a one-time project. Each is documented in an Architecture Decision Record — a discipline that keeps decisions portable, reversible, and explainable.
The metrics shown below are measured against this prototype, running on the public Basel and OSFI corpus.
The Tech Stack
Orchestration
Cloud — Local / Production
Governance
What Makes It Reusable Across Industries
The same framework deployed against a different corpus becomes a different product. The architecture doesn't change — the configuration does.
| Industry | Corpus | Agent Workflows | Governance Config |
|---|---|---|---|
| Financial Services | Basel, OSFI, FCA rulebooks | Gap analysis, comparison | Jurisdiction RBAC, immutable audit |
| Healthcare | Clinical guidelines, formularies | Treatment comparison, protocol Q&A | HIPAA audit controls, role restrictions |
| Legal | Case law, contracts, statutes | Precedent search, clause analysis | Matter-level access control |
| Enterprise Internal | Policies, SOPs, knowledge base | Factual Q&A, procedural guidance | Department-level RBAC |
An engagement that would take 12 weeks to deliver from scratch should take 4–6 weeks with this framework as the starting point. The first two weeks would be corpus preparation and engagement-specific agent workflow design. The governance layer, retrieval architecture, and API layer are already built, tested, and documented. That time saving is the commercial case for framework thinking over project thinking.
The Lesson I'd Give Every AI Architect
The temptation in AI consulting is to let the technology lead. A new client arrives with a new problem, a new vector store is trending on Hacker News, a new agent framework just dropped — and the instinct is to start fresh, incorporate everything new, build something impressive.
The more durable instinct is to ask: what part of this problem looks like one I've reasoned through before? What decision did I make in an earlier piece of work that should be reusable today? What governance control could be built once and applied to every regulated use case after?
The RAG framework I've described here took time to build. But it now represents a compounding asset — each new context it's applied to makes it more capable, more documented, and more configurable. Each ADR I write makes the next vendor evaluation faster. Each governance component built makes the next regulated implementation safer.
That's the difference between an AI project and an AI practice.
Check the Details
The financial services configuration is open source — architecture diagrams, ADRs, and working code available on GitHub.