The Institutional Brain
Enterprise RAG · Knowledge Architecture
Building the Institutional Brain — A RAG Reference Architecture for the AI-Native Enterprise
McKinsey estimates knowledge workers spend 19% of their time searching for information that already exists inside their own organisation. This is the architecture that closes that gap — without autonomous agents, hallucinated answers, or uncitable claims.
RAG MCP Weaviate Enterprise AI
For a 10,000-person enterprise, knowledge workers spending 19% of their time searching for information is not an inefficiency. It is structural value destruction. Every new engineer who joins inherits the same fragmented knowledge landscape. Every architectural decision made without institutional context risks repeating mistakes that were documented, debated, and resolved two years ago — in an ADR that nobody could find.
The promise of Generative AI in the enterprise is usually framed around chatbots, code generation, or content summarisation. That framing is too narrow. The deeper opportunity is the institutional brain — an AI system that holds the collective memory of your engineering organisation and makes it conversationally accessible, cited, and auditable.
The Enterprise Knowledge Navigator is a reference architecture for exactly that system. It combines Retrieval-Augmented Generation, the Model Context Protocol, and a vector-native knowledge layer into a five-layer design that scales from prototype to production without re-engineering.
Why RAG — and Why Not the Alternatives
Two alternatives to RAG are common in early enterprise AI discussions. Both fail at scale for different reasons.
Fine-tuning embeds knowledge into model weights. Every knowledge update requires a retraining cycle, costs significant compute, and — critically — produces a model that cannot cite its sources. For regulated industries, an answer that cannot be traced to a specific document is not an answer. It is a liability.
Context-window stuffing hits scale limits immediately. A mid-size engineering organisation with three years of ADRs, runbooks, and design documents will far exceed any model’s context window. Latency and cost per query become prohibitive well before you reach meaningful coverage.
RAG threads this needle. Knowledge lives in a vector database — fresh, versionable, and source-traceable. Each query retrieves only the relevant documents. Context stays tight, costs stay predictable, and every generated answer is grounded in specific source documents with explicit citations. This is the architecture that survives contact with enterprise scale.
The institutional brain is not a chatbot. It is a knowledge infrastructure — one that makes your organisation’s collective memory conversationally accessible, while keeping the human firmly in the decision seat.
The Reference Architecture — Five Layers
ARCHITECTURE · Enterprise Knowledge Navigator · Five-Layer Reference Design
LAYER 5 Next.js Operator Interface Knowledge Search Recommendations Compliance Check LAYER 4 API & Capability Surface (Express / FastAPI) /api/search /api/recommendations /api/compliance-check LAYER 3 RAG Orchestration Pipeline Query Embed → Vector Retrieve → Context Rank → LLM Synthesise + Cite LAYER 2 Vector Knowledge Store Weaviate 1.24 · cosine similarity all-MiniLM-L6-v2 · 384-dim · local ADRs · Docs · Patterns · Standards LAYER 1 (PARTIAL) MCP Server Network GitHub MCP Confluence MCP Jira MCP Normalised doc schema → embedding pipeline LAYER 1 — KNOWLEDGE SOURCES GitHub Repos Confluence Wikis Jira Histories ADR Libraries Internal Runbooks LLM GATEWAY · OPENROUTER · PROVIDER-AGNOSTIC · ENV-VAR SWAP · PostgreSQL AUDIT LOG
Layer 1 — Where Enterprise Knowledge Actually Lives
The architecture begins not with AI, but with reality: enterprise knowledge is distributed across GitHub repositories, Confluence wikis, Jira project histories, and internal runbooks. Each source speaks a different API dialect, uses different authentication mechanisms, and structures documents differently.
Each knowledge source connects through a dedicated Model Context Protocol (MCP) server — a lightweight Node.js service that abstracts source-specific APIs into a normalised document schema: { title, content, source, type, tags[] }. This normalisation step is unglamorous but load-bearing; without it, the embedding pipeline would need to handle source-specific quirks, and every new knowledge source would require changes to the core system.
The choice of MCP as the integration protocol is a deliberate long-term bet. As MCP becomes the standard for agent-to-tool communication — and it is moving in that direction quickly — an MCP-first knowledge layer means your enterprise knowledge graph is automatically compatible with any future AI agent or autonomous workflow that speaks the protocol. You are not building a knowledge base for today’s system. You are building the data plane for your entire future AI estate.
Layer 2 — The Vector Knowledge Store
Documents from MCP sources pass through an embedding pipeline that converts text into 384-dimensional vectors using sentence-transformers/all-MiniLM-L6-v2, running locally via @xenova/transformers. The local embedding decision is deliberate on two axes: it eliminates per-query API cost for embedding generation, and it removes a third-party dependency from a security-sensitive data path. Enterprise documents going to an external embedding API is a data governance conversation most CISOs would rather not have.
Weaviate 1.24 serves as the vector database, with the schema configured to accept externally provided vectors rather than delegating to Weaviate’s module system. This gives the architecture full control over the embedding model, enables model swaps without schema migrations, and keeps the vector layer portable across cloud and on-premise deployments.
Similarity search uses cosine distance with a configurable certainty threshold, returning the top semantically relevant documents per query. In production, threshold and retrieval count should be tuned per domain — legal and compliance use cases typically benefit from higher certainty thresholds; exploratory architectural research from broader retrieval.
Layer 3 — The RAG Pipeline
When a query arrives, the pipeline runs four steps in sequence. The query is embedded using the same model as the ingestion pipeline — semantic symmetry between ingestion and query embeddings is critical; mismatched models produce degraded retrieval that is difficult to diagnose. Weaviate returns the top semantically relevant documents with certainty scores. A structured context block is assembled, preserving document titles, types, and relevance rankings. A prompt instructs the LLM to synthesise a grounded answer, cite document numbers explicitly, acknowledge gaps where context is absent, and surface competing approaches rather than collapsing them into a single recommendation.
The LLM layer is provider-agnostic by design. The RAG service abstracts across OpenAI, Anthropic Claude, and any OpenRouter-compatible model. Switching providers requires a single environment variable change with no code modification. This is not an accident — LLM commoditisation is accelerating, and any architecture that hard-codes a specific provider will need re-engineering within eighteen months.
Layer 4 — Three Distinct API Surfaces
POST /api/search
Knowledge Search
Natural language query returns a cited AI answer, retrieved sources with relevance scores, and a breakdown by originating MCP system. The primary interface for architects doing exploratory discovery.
GET /api/recommendations
Pattern Recommendations
Returns patterns extracted from historical project data, ranked by cross-project frequency. Answers “what have teams like mine actually done” rather than “what should I do in theory.”
POST /api/compliance-check
Compliance Evaluation
Evaluates a plain-English architecture description against enterprise standards covering security, scalability, observability, and operational readiness. Produces an itemised, auditable gap report.
The compliance-check surface deserves particular attention. Architecture Review Boards are a common bottleneck in enterprise delivery — senior architects spending time on designs that have not been pre-screened for basic structural gaps. This endpoint allows teams to self-evaluate before formal review, surfacing common gaps without consuming senior architect time. The output is an itemised report that can be attached to a design document, making the governance process faster and the evidence trail explicit.
Layer 5 — Three Personas, One Interface
The Next.js frontend is designed around three primary user personas with distinct information needs. Architects use Knowledge Search to explore decision history before proposing new solutions — surfacing relevant ADRs, prior design decisions, and pattern documentation, with the AI synthesising a coherent cited answer. Tech Leads use Recommendations to identify proven patterns for new projects without starting from first principles. Architecture Review Boards use Compliance to pre-evaluate designs, reducing review cycle time and catching structural gaps before the formal session.
The Infrastructure Topology
| Component | Technology | Key Rationale |
|---|---|---|
| Vector DB | Weaviate 1.24 | Production-grade, cloud-native, GraphQL query API, schema portability |
| Embeddings | @xenova/transformers (local) | Zero API cost, no data egress, model portability |
| LLM Gateway | OpenRouter | Multi-provider abstraction; swap models via single env var |
| MCP Servers | Node.js / Express | Lightweight, protocol-native, independently deployable per source |
| Relational DB | PostgreSQL 15 | Audit logs, user context, structured metadata — immutable query history |
| Frontend | Next.js 14 | SSR, production build, TypeScript throughout |
| Orchestration | Docker Compose | Dev and staging; Kubernetes-ready manifest pattern |
The architecture is deliberately cloud-portable. There are no hard dependencies on any specific provider’s managed services. The entire stack can run on-premise, on AWS, on Azure, or in a hybrid topology with storage and compute separated by data classification boundary — a requirement that appears frequently in financial services and public sector engagements.
Why This Is Deliberately Not an Agent
This reference architecture does not write code, execute workflows, or take actions on behalf of users. Every output is grounded in source documents. Every answer includes citations. The human remains the decision-maker. This is not a limitation — it is a strategy.
The Adoption Principle
The fastest path to organisational trust in AI systems is demonstrating that they make humans more effective — not that they replace human judgement. A system that returns a well-cited answer with identified trade-offs, and then gets out of the way, will see faster adoption in 2025 than one that tries to close the loop autonomously. Trust compounds; so does resistance.
The agentic layer comes next. Once the knowledge infrastructure is trusted and in production, it becomes the memory system for more sophisticated agents. The MCP server topology already provides the agent-to-tool interface. The vector knowledge layer already provides the retrieval. Adding an orchestration layer — LangGraph, Claude Agents, or a custom workflow engine — becomes an integration exercise rather than a re-architecture. The foundation is already built to support it.
What Agents Built on This Infrastructure Inherit
The deliberate non-agentic design of this reference architecture is a phase decision, not a permanent constraint. The more important point is what an agent built on top of this knowledge infrastructure would inherit automatically — without re-engineering.
An agent that uses the MCP server topology as its tool-calling layer already has normalised access to GitHub, Confluence, and Jira through a defined JSON-RPC interface. It does not need to learn each source’s API. It does not need to handle authentication per-source. The normalisation contract — { title, content, source, type, tags[] } — is already the schema every MCP server outputs. An agent calling confluence_mcp.search(query) gets the same structured document shape as one calling github_mcp.get_file(path). That consistency is what makes multi-source agent reasoning tractable at enterprise scale.
The vector knowledge layer is the agent’s long-term memory. Every document ever ingested — ADRs, runbooks, design decisions, past incident reports — is semantically indexed and retrievable with a single embedding call. An agent planning an architecture change does not start from scratch. It queries the knowledge layer first, surfaces every prior decision relevant to its task, and reasons over that context before acting. The retrieval is grounded, cited, and auditable — the same guarantees the human-facing system provides carry over to the agent automatically.
The Agent Readiness Principle
The fastest path to trustworthy enterprise agents is not building agents first — it is building the knowledge infrastructure they need to operate reliably, then adding the orchestration layer on top. An agent without a structured, cited, auditable knowledge layer is guessing. An agent with one is reasoning. This architecture is the difference between those two outcomes.
The Governance Layer Agents Inherit
Governance is the overlooked dimension of agent readiness. Most discussions of enterprise agents focus on capability — what tasks can the agent complete? The harder question for a regulated enterprise is: what does the agent know, when did it know it, and who authorised it to act?
An agent built on this infrastructure inherits three governance properties by construction. First, immutable query history: every retrieval call the agent makes is logged to the PostgreSQL audit trail with the agent identity, query text, documents retrieved, and timestamp. If an agent-generated architecture recommendation is later disputed, the full retrieval trace is recoverable. This is not a feature added for compliance — it is a property of the retrieval service itself.
Second, mandatory citation: the RAG pipeline’s generation prompt requires explicit source attribution for every factual claim. An agent that synthesises a design recommendation cannot produce an uncited assertion — the prompt architecture enforces it. In a regulated environment where “the AI said so” is not an acceptable audit response, this constraint is not optional.
Third, access boundary enforcement: because all knowledge retrieval flows through the MCP server topology and the retrieval service, access control is centralised. An agent operating under a restricted role cannot retrieve documents outside its authorised scope — not because the agent was programmed to respect boundaries, but because the infrastructure enforces them at the retrieval layer. This is the architectural difference between a system where governance is a rule the agent follows and one where governance is a constraint the agent cannot violate.
These three properties — audit trail, citation enforcement, and retrieval-layer access control — are what allow a regulated enterprise to move from human-in-the-loop AI to supervised agentic workflows without rebuilding their governance posture from scratch. The foundation carries the governance forward. The agentic layer inherits it.
The Tech Stack
Knowledge & Retrieval
Weaviate 1.24 all-MiniLM-L6-v2 @xenova/transformers OpenRouter
MCP Integration
GitHub MCP Confluence MCP Jira MCP JSON-RPC
Platform
Next.js 14 TypeScript PostgreSQL 15 Docker Compose
Governance
Immutable Audit Log JWT Auth Mandatory Citations Rate Limiting
Fork the Reference Architecture
Full implementation — MCP servers, RAG pipeline, vector store configuration, and the complete frontend — available on GitHub. This is not a whiteboard architecture. It runs.