When I set out to build a serious body of work in enterprise GenAI architecture, I faced a design question: build one bespoke demonstration of how it should be done, or build something reusable that encodes the patterns once and configures them per context. The instinct most architects start with is the former — new problem, new ingestion pipeline, new retrieval logic, new governance layer. Three months of work, every time.

But the second enterprise GenAI engagement an architect imagines is almost identical to the first — a system to query regulated documents in natural language, with citations, confidence scoring, and a full audit trail. The underlying problem is the same. The technology is the same. The only things that change are the document corpus and the cloud provider.

So I built a framework instead of a one-time project.

The underlying problem is the same. The technology is the same. The only things that change are the document corpus and the cloud provider.

What the Framework Actually Is

The Enterprise RAG Architecture Framework is a production-grade, configurable retrieval-augmented generation platform — built as a reference implementation designed to be deployed and adapted per engagement. It handles the foundational concerns that every enterprise AI deployment shares — secure ingestion, governed retrieval, auditable LLM orchestration, and a REST API with access control baked in.

Think of it the way a systems integrator brings a reference architecture to an engagement, rather than designing from first principles every time. The framework is the reference architecture. Each deployment is a configured instance of it.

C4 Context Diagram — Regulatory Intelligence Platform
Fig 1 — C4 Context Diagram: Regulatory Intelligence Platform Architecture

What makes it a framework rather than a project is the deliberate abstraction at every layer. The LLM provider, vector store, document corpus, agent workflows, and governance configuration are all externalised. Changing them requires configuration, not code.

The Four Layers — and Why Each One Exists

LAYER 01
Ingestion Pipeline

PDF loading, text cleaning, semantic chunking, metadata tagging, and embedding generation. Idempotent — safe to re-run when documents are updated.

CONFIGURABLE
LAYER 02
Hybrid Retrieval

Vector search plus BM25 keyword search, fused using Reciprocal Rank Fusion. Query analyser classifies intent and applies metadata filters before retrieval.

TUNABLE
LAYER 03
LLM Orchestration

LangChain RAG chain for factual queries. LangGraph multi-agent graph for complex workflows — comparison, gap analysis, multi-document synthesis.

EXTENSIBLE
LAYER 04
Governance Layer

RBAC with role-based jurisdiction restrictions, prompt injection guard, append-only audit log, confidence scoring, and citation enforcement. Ships standard.

STANDARD
Design Principle

Governance is a first-class framework component — not a client-specific add-on. Every enterprise deployment, regardless of sector, needs identity and access control, an audit trail, prompt safety controls, and confidence signalling. Building these once and configuring them per engagement is the only sensible approach at consulting scale.

The Financial Services Configuration

The configuration I'll walk through here is the financial services one — built for regulatory compliance Q&A as a reference implementation. The corpus is publicly available regulatory documents: Basel Framework publications from the Bank for International Settlements and OSFI guidelines from Canada's federal banking regulator.

The use case is concrete: compliance analysts at a universal bank spend 60–80% of their research time manually searching regulatory PDFs to answer questions like "What are the minimum CET1 capital requirements under Basel III?" or "Does our current liquidity framework satisfy OSFI B-10 obligations?" The framework answers both — with citations, confidence scores, and a full audit trail of who asked what and when.

Here is a sample response from the prototype against the public Basel corpus:

Query: "What are the minimum capital requirements under Basel?" Response: The minimum capital requirements under Basel III are: · CET1 ≥ 4.5% of risk-weighted assets [SOURCE: basel_capital_requirements.pdf, Page 20] · Tier 1 Capital ≥ 6.0% of risk-weighted assets · Total Capital ≥ 8.0% of risk-weighted assets [SOURCE: basel_capital_requirements.pdf, Page 20] Capital conservation buffer: +2.5% brings CET1 minimum to 7.0% including buffer requirements. [SOURCE: basel_capital_requirements.pdf, Page 72] Confidence: HIGH Latency: 3,603ms Chunks used: 6

Every factual claim is cited. Every response carries a confidence assessment. Every query is logged to an immutable audit trail. These aren't features — they're requirements in a regulated environment, and they're delivered by the framework, not built per engagement.

The Decision That Changes Everything — Hybrid Retrieval

Most RAG implementations use pure vector search. It works well for conceptual queries. It fails badly for one specific class of query that is extremely common in enterprise contexts: exact citation lookup.

A question like "What is the spirit of Basel capital requirements?" is conceptual — vector search handles it well. A question like "Article 147(2)(b) counterparty credit risk weighting" requires exact term matching. Vector search will miss it if the semantically similar chunks don't happen to embed that precise citation.

The framework uses Reciprocal Rank Fusion — a rank-based merging algorithm that combines vector search candidates and BM25 keyword search candidates without requiring score normalisation. Documents appearing in both result sets receive a compounding boost. The result is a 15–25% improvement in retrieval precision over pure vector search, and a 35 percentage point improvement in citation recall.

The Agentic Layer — When a Single Retrieval Pass Isn't Enough

Simple factual queries — single retrieval, single generation. But enterprise use cases rarely stay simple. Gap analysis between an internal policy and a regulatory requirement needs multiple retrieval passes against different document sets, independent reasoning over each, and a synthesis step that produces a risk-rated output.

The framework's LangGraph multi-agent layer handles exactly this. A supervisor node classifies the query and routes it through the appropriate agent sequence:

Query Type Example Agent Path
Factual What is the CET1 minimum? RAG Chain only
Procedural How do I implement an ICAAP? RAG Chain only
Comparative Basel III vs Basel IV liquidity rules Retrieval → Comparison → Summary
Gap Analysis Does our policy satisfy OSFI B-10? Retrieval → Gap Analysis → Summary

The supervisor routing is rule-based, not LLM-driven. Routing decisions are free, instant, and fully auditable — I can explain every routing decision without inspecting LLM outputs. LLM tokens are spent on agents that the query actually requires, not on routing logic.

The Architecture Decisions That Made This a Framework

Three decisions separate a reusable framework from a one-time project. Each is documented in an Architecture Decision Record — a discipline that keeps decisions portable, reversible, and explainable.

The metrics shown below are measured against this prototype, running on the public Basel and OSFI corpus.

ADR-001
Vector Store Selection
Azure AI Search over Pinecone/Qdrant
Why it matters
Data residency, Entra ID RBAC integration, hybrid search native — all in one service
Portability
Provider abstraction means swap is one config change. Qdrant in dev, Azure AI Search in prod.
ADR-002
Chunking Strategy
Semantic over fixed-size
Why it matters
Fixed-size splits regulatory clauses mid-sentence. A half-clause retrieved in isolation is a compliance risk.
Result
94% clause integrity vs 67% with fixed-size. Measurable. Documented. Repeatable.
ADR-003
Retrieval Strategy
Hybrid RRF over pure vector
Why it matters
Enterprise text has two retrieval modes — semantic and exact citation. One retriever cannot handle both optimally.
Result
Precision@6 of 0.84 vs 0.71 vector-only. Citation recall +35 percentage points.

The Tech Stack

Orchestration

LangChain 0.3 LangGraph 0.2 LangSmith FastAPI Python 3.11

Cloud — Local / Production

Qdrant (local) Azure AI Search Azure OpenAI (GPT-4o) Azure AI Foundry OpenRouter (dev)

Governance

RBAC (5 roles) Audit Logger Prompt Guard Citation Enforcement Confidence Scoring

What Makes It Reusable Across Industries

The same framework deployed against a different corpus becomes a different product. The architecture doesn't change — the configuration does.

Industry Corpus Agent Workflows Governance Config
Financial Services Basel, OSFI, FCA rulebooks Gap analysis, comparison Jurisdiction RBAC, immutable audit
Healthcare Clinical guidelines, formularies Treatment comparison, protocol Q&A HIPAA audit controls, role restrictions
Legal Case law, contracts, statutes Precedent search, clause analysis Matter-level access control
Enterprise Internal Policies, SOPs, knowledge base Factual Q&A, procedural guidance Department-level RBAC
The Consulting Advantage

An engagement that would take 12 weeks to deliver from scratch should take 4–6 weeks with this framework as the starting point. The first two weeks would be corpus preparation and engagement-specific agent workflow design. The governance layer, retrieval architecture, and API layer are already built, tested, and documented. That time saving is the commercial case for framework thinking over project thinking.

The Lesson I'd Give Every AI Architect

The temptation in AI consulting is to let the technology lead. A new client arrives with a new problem, a new vector store is trending on Hacker News, a new agent framework just dropped — and the instinct is to start fresh, incorporate everything new, build something impressive.

The more durable instinct is to ask: what part of this problem looks like one I've reasoned through before? What decision did I make in an earlier piece of work that should be reusable today? What governance control could be built once and applied to every regulated use case after?

The RAG framework I've described here took time to build. But it now represents a compounding asset — each new context it's applied to makes it more capable, more documented, and more configurable. Each ADR I write makes the next vendor evaluation faster. Each governance component built makes the next regulated implementation safer.

That's the difference between an AI project and an AI practice.

Check the Details

The financial services configuration is open source — architecture diagrams, ADRs, and working code available on GitHub.