Faiz Faruqi · Enterprise AI Architecture

Enterprise RAG · Knowledge Architecture

Building the Institutional Brain — A RAG Reference Architecture for the AI-Native Enterprise

McKinsey estimates knowledge workers spend 19% of their time searching for information that already exists inside their own organisation. This is the architecture that closes that gap — without autonomous agents, hallucinated answers, or uncitable claims.

RAG MCP Weaviate Enterprise AI

For a 10,000-person enterprise, knowledge workers spending 19% of their time searching for information is not an inefficiency. It is structural value destruction. Every new engineer who joins inherits the same fragmented knowledge landscape. Every architectural decision made without institutional context risks repeating mistakes that were documented, debated, and resolved two years ago — in an ADR that nobody could find.

The promise of Generative AI in the enterprise is usually framed around chatbots, code generation, or content summarisation. That framing is too narrow. The deeper opportunity is the institutional brain — an AI system that holds the collective memory of your engineering organisation and makes it conversationally accessible, cited, and auditable.

The Enterprise Knowledge Navigator is a reference architecture for exactly that system. It combines Retrieval-Augmented Generation, the Model Context Protocol, and a vector-native knowledge layer into a five-layer design that scales from prototype to production without re-engineering.

Why RAG — and Why Not the Alternatives

Two alternatives to RAG are common in early enterprise AI discussions. Both fail at scale for different reasons.

Fine-tuning embeds knowledge into model weights. Every knowledge update requires a retraining cycle, costs significant compute, and — critically — produces a model that cannot cite its sources. For regulated industries, an answer that cannot be traced to a specific document is not an answer. It is a liability.

Context-window stuffing hits scale limits immediately. A mid-size engineering organisation with three years of ADRs, runbooks, and design documents will far exceed any model’s context window. Latency and cost per query become prohibitive well before you reach meaningful coverage.

RAG threads this needle. Knowledge lives in a vector database — fresh, versionable, and source-traceable. Each query retrieves only the relevant documents. Context stays tight, costs stay predictable, and every generated answer is grounded in specific source documents with explicit citations. This is the architecture that survives contact with enterprise scale.

The institutional brain is not a chatbot. It is a knowledge infrastructure — one that makes your organisation’s collective memory conversationally accessible, while keeping the human firmly in the decision seat.

The Reference Architecture — Five Layers

ARCHITECTURE · Enterprise Knowledge Navigator · Five-Layer Reference Design
LAYER 5 Next.js Operator Interface Knowledge Search Recommendations Compliance Check LAYER 4 API & Capability Surface (Express / FastAPI) /api/search /api/recommendations /api/compliance-check LAYER 3 RAG Orchestration Pipeline Query Embed → Vector Retrieve → Context Rank → LLM Synthesise + Cite LAYER 2 Vector Knowledge Store Weaviate 1.24 · cosine similarity all-MiniLM-L6-v2 · 384-dim · local ADRs · Docs · Patterns · Standards LAYER 1 (PARTIAL) MCP Server Network GitHub MCP Confluence MCP Jira MCP Normalised doc schema → embedding pipeline LAYER 1 — KNOWLEDGE SOURCES GitHub Repos Confluence Wikis Jira Histories ADR Libraries Internal Runbooks LLM GATEWAY · OPENROUTER · PROVIDER-AGNOSTIC · ENV-VAR SWAP · PostgreSQL AUDIT LOG

Layer 1 — Where Enterprise Knowledge Actually Lives

The architecture begins not with AI, but with reality: enterprise knowledge is distributed across GitHub repositories, Confluence wikis, Jira project histories, and internal runbooks. Each source speaks a different API dialect, uses different authentication mechanisms, and structures documents differently.

Each knowledge source connects through a dedicated Model Context Protocol (MCP) server — a lightweight Node.js service that abstracts source-specific APIs into a normalised document schema: { title, content, source, type, tags[] }. This normalisation step is unglamorous but load-bearing; without it, the embedding pipeline would need to handle source-specific quirks, and every new knowledge source would require changes to the core system.

The choice of MCP as the integration protocol is a deliberate long-term bet. As MCP becomes the standard for agent-to-tool communication — and it is moving in that direction quickly — an MCP-first knowledge layer means your enterprise knowledge graph is automatically compatible with any future AI agent or autonomous workflow that speaks the protocol. You are not building a knowledge base for today’s system. You are building the data plane for your entire future AI estate.

Layer 2 — The Vector Knowledge Store

Documents from MCP sources pass through an embedding pipeline that converts text into 384-dimensional vectors using sentence-transformers/all-MiniLM-L6-v2, running locally via @xenova/transformers. The local embedding decision is deliberate on two axes: it eliminates per-query API cost for embedding generation, and it removes a third-party dependency from a security-sensitive data path. Enterprise documents going to an external embedding API is a data governance conversation most CISOs would rather not have.

Weaviate 1.24 serves as the vector database, with the schema configured to accept externally provided vectors rather than delegating to Weaviate’s module system. This gives the architecture full control over the embedding model, enables model swaps without schema migrations, and keeps the vector layer portable across cloud and on-premise deployments.

Similarity search uses cosine distance with a configurable certainty threshold, returning the top semantically relevant documents per query. In production, threshold and retrieval count should be tuned per domain — legal and compliance use cases typically benefit from higher certainty thresholds; exploratory architectural research from broader retrieval.

Layer 3 — The RAG Pipeline

When a query arrives, the pipeline runs four steps in sequence. The query is embedded using the same model as the ingestion pipeline — semantic symmetry between ingestion and query embeddings is critical; mismatched models produce degraded retrieval that is difficult to diagnose. Weaviate returns the top semantically relevant documents with certainty scores. A structured context block is assembled, preserving document titles, types, and relevance rankings. A prompt instructs the LLM to synthesise a grounded answer, cite document numbers explicitly, acknowledge gaps where context is absent, and surface competing approaches rather than collapsing them into a single recommendation.

The LLM layer is provider-agnostic by design. The RAG service abstracts across OpenAI, Anthropic Claude, and any OpenRouter-compatible model. Switching providers requires a single environment variable change with no code modification. This is not an accident — LLM commoditisation is accelerating, and any architecture that hard-codes a specific provider will need re-engineering within eighteen months.

Layer 4 — Three Distinct API Surfaces

POST /api/search
Knowledge Search
Natural language query returns a cited AI answer, retrieved sources with relevance scores, and a breakdown by originating MCP system. The primary interface for architects doing exploratory discovery.
GET /api/recommendations
Pattern Recommendations
Returns patterns extracted from historical project data, ranked by cross-project frequency. Answers “what have teams like mine actually done” rather than “what should I do in theory.”
POST /api/compliance-check
Compliance Evaluation
Evaluates a plain-English architecture description against enterprise standards covering security, scalability, observability, and operational readiness. Produces an itemised, auditable gap report.

The compliance-check surface deserves particular attention. Architecture Review Boards are a common bottleneck in enterprise delivery — senior architects spending time on designs that have not been pre-screened for basic structural gaps. This endpoint allows teams to self-evaluate before formal review, surfacing common gaps without consuming senior architect time. The output is an itemised report that can be attached to a design document, making the governance process faster and the evidence trail explicit.

Layer 5 — Three Personas, One Interface

The Next.js frontend is designed around three primary user personas with distinct information needs. Architects use Knowledge Search to explore decision history before proposing new solutions — surfacing relevant ADRs, prior design decisions, and pattern documentation, with the AI synthesising a coherent cited answer. Tech Leads use Recommendations to identify proven patterns for new projects without starting from first principles. Architecture Review Boards use Compliance to pre-evaluate designs, reducing review cycle time and catching structural gaps before the formal session.

The Infrastructure Topology

Component	Technology	Key Rationale
Vector DB	Weaviate 1.24	Production-grade, cloud-native, GraphQL query API, schema portability
Embeddings	@xenova/transformers (local)	Zero API cost, no data egress, model portability
LLM Gateway	OpenRouter	Multi-provider abstraction; swap models via single env var
MCP Servers	Node.js / Express	Lightweight, protocol-native, independently deployable per source
Relational DB	PostgreSQL 15	Audit logs, user context, structured metadata — immutable query history
Frontend	Next.js 14	SSR, production build, TypeScript throughout
Orchestration	Docker Compose	Dev and staging; Kubernetes-ready manifest pattern

The architecture is deliberately cloud-portable. There are no hard dependencies on any specific provider’s managed services. The entire stack can run on-premise, on AWS, on Azure, or in a hybrid topology with storage and compute separated by data classification boundary — a requirement that appears frequently in financial services and public sector engagements.

Why This Is Deliberately Not an Agent

This reference architecture does not write code, execute workflows, or take actions on behalf of users. Every output is grounded in source documents. Every answer includes citations. The human remains the decision-maker. This is not a limitation — it is a strategy.

The Adoption Principle

The fastest path to organisational trust in AI systems is demonstrating that they make humans more effective — not that they replace human judgement. A system that returns a well-cited answer with identified trade-offs, and then gets out of the way, will see faster adoption in 2025 than one that tries to close the loop autonomously. Trust compounds; so does resistance.

The agentic layer comes next. Once the knowledge infrastructure is trusted and in production, it becomes the memory system for more sophisticated agents. The MCP server topology already provides the agent-to-tool interface. The vector knowledge layer already provides the retrieval. Adding an orchestration layer — LangGraph, Claude Agents, or a custom workflow engine — becomes an integration exercise rather than a re-architecture. The foundation is already built to support it.

What Agents Built on This Infrastructure Inherit

The deliberate non-agentic design of this reference architecture is a phase decision, not a permanent constraint. The more important point is what an agent built on top of this knowledge infrastructure would inherit automatically — without re-engineering.

An agent that uses the MCP server topology as its tool-calling layer already has normalised access to GitHub, Confluence, and Jira through a defined JSON-RPC interface. It does not need to learn each source’s API. It does not need to handle authentication per-source. The normalisation contract — { title, content, source, type, tags[] } — is already the schema every MCP server outputs. An agent calling confluence_mcp.search(query) gets the same structured document shape as one calling github_mcp.get_file(path). That consistency is what makes multi-source agent reasoning tractable at enterprise scale.

The vector knowledge layer is the agent’s long-term memory. Every document ever ingested — ADRs, runbooks, design decisions, past incident reports — is semantically indexed and retrievable with a single embedding call. An agent planning an architecture change does not start from scratch. It queries the knowledge layer first, surfaces every prior decision relevant to its task, and reasons over that context before acting. The retrieval is grounded, cited, and auditable — the same guarantees the human-facing system provides carry over to the agent automatically.

The Agent Readiness Principle

The fastest path to trustworthy enterprise agents is not building agents first — it is building the knowledge infrastructure they need to operate reliably, then adding the orchestration layer on top. An agent without a structured, cited, auditable knowledge layer is guessing. An agent with one is reasoning. This architecture is the difference between those two outcomes.

The Governance Layer Agents Inherit

Governance is the overlooked dimension of agent readiness. Most discussions of enterprise agents focus on capability — what tasks can the agent complete? The harder question for a regulated enterprise is: what does the agent know, when did it know it, and who authorised it to act?

An agent built on this infrastructure inherits three governance properties by construction. First, immutable query history: every retrieval call the agent makes is logged to the PostgreSQL audit trail with the agent identity, query text, documents retrieved, and timestamp. If an agent-generated architecture recommendation is later disputed, the full retrieval trace is recoverable. This is not a feature added for compliance — it is a property of the retrieval service itself.

Second, mandatory citation: the RAG pipeline’s generation prompt requires explicit source attribution for every factual claim. An agent that synthesises a design recommendation cannot produce an uncited assertion — the prompt architecture enforces it. In a regulated environment where “the AI said so” is not an acceptable audit response, this constraint is not optional.

Third, access boundary enforcement: because all knowledge retrieval flows through the MCP server topology and the retrieval service, access control is centralised. An agent operating under a restricted role cannot retrieve documents outside its authorised scope — not because the agent was programmed to respect boundaries, but because the infrastructure enforces them at the retrieval layer. This is the architectural difference between a system where governance is a rule the agent follows and one where governance is a constraint the agent cannot violate.

These three properties — audit trail, citation enforcement, and retrieval-layer access control — are what allow a regulated enterprise to move from human-in-the-loop AI to supervised agentic workflows without rebuilding their governance posture from scratch. The foundation carries the governance forward. The agentic layer inherits it.

The Tech Stack

Knowledge & Retrieval

Weaviate 1.24 all-MiniLM-L6-v2 @xenova/transformers OpenRouter

MCP Integration

GitHub MCP Confluence MCP Jira MCP JSON-RPC

Platform

Next.js 14 TypeScript PostgreSQL 15 Docker Compose

Governance

Immutable Audit Log JWT Auth Mandatory Citations Rate Limiting

Fork the Reference Architecture

Full implementation — MCP servers, RAG pipeline, vector store configuration, and the complete frontend — available on GitHub. This is not a whiteboard architecture. It runs.

View on GitHub Get in Touch