The Institutional Brain

This is the second in a three-part governed AI series. The Enterprise RAG Framework established the retrieval and citation patterns. This build extends that foundation with MCP integration into developer tooling — GitHub, Confluence, Jira — while deliberately stopping short of an agent decision layer. Non-agentic by design: the system makes humans more effective at finding and using institutional knowledge; it does not act for them. The multi-agent orchestration layer, with human approval gates and a full audit trail, comes next in the Regulatory Intelligence Agent ↗.

For a 10,000-person enterprise, knowledge workers spending 19% of their time searching for information is not an inefficiency. It is structural value destruction. Every new engineer who joins inherits the same fragmented knowledge landscape. Every architectural decision made without institutional context risks repeating mistakes that were documented, debated, and resolved two years ago — in an ADR that nobody could find.

The promise of Generative AI in the enterprise is usually framed around chatbots, code generation, or content summarisation. That framing is too narrow. The deeper opportunity is the institutional brain — an AI system that holds the collective memory of your engineering organisation and makes it conversationally accessible, cited, and auditable.

The Enterprise Knowledge Navigator is a reference architecture for exactly that system. It combines Retrieval-Augmented Generation, the Model Context Protocol, and a vector-native knowledge layer into a five-layer design that scales from prototype to production without re-engineering.

The institutional brain is not a chatbot. It is a knowledge infrastructure — one that makes your organisation's collective memory conversationally accessible, while keeping the human firmly in the decision seat.

Why RAG — and Why Not the Alternatives

Two alternatives to RAG are common in early enterprise AI discussions. Both fail at scale for different reasons.

Fine-tuning embeds knowledge into model weights. Every knowledge update requires a retraining cycle, costs significant compute, and — critically — produces a model that cannot cite its sources. For regulated industries, an answer that cannot be traced to a specific document is not an answer. It is a liability.

Context-window stuffing hits scale limits immediately. A mid-size engineering organisation with three years of ADRs, runbooks, and design documents will far exceed any model's context window. Latency and cost per query become prohibitive well before you reach meaningful coverage.

RAG threads this needle. Knowledge lives in a vector database — fresh, versionable, and source-traceable. Each query retrieves only the relevant documents. Context stays tight, costs stay predictable, and every generated answer is grounded in specific source documents with explicit citations. This is the architecture that survives contact with enterprise scale.

The Reference Architecture — Five Layers

Enterprise Knowledge Navigator — Five Layer Architecture

LAYER 01

Knowledge Sources

GitHub repos, Confluence wikis, Jira histories, internal runbooks — the distributed institutional memory.

SOURCE

LAYER 02

Vector Knowledge Store

Weaviate 1.24 with cosine similarity, local embeddings via @xenova/transformers, normalized document schema.

INDEX

LAYER 03

RAG Orchestration

Queries embedded, vector-retrieved, context-ranked, then synthesised by an LLM with mandatory citations.

PIPELINE

LAYER 04

API Surface

Three distinct endpoints: /search, /recommendations, /compliance-check — each serving a specific persona.

INTERFACE

LAYER 05

Operator Interface

Next.js frontend tailored for architects, tech leads, and architecture review boards.

Layer 1 — Where Enterprise Knowledge Actually Lives

The architecture begins not with AI, but with reality: enterprise knowledge is distributed across GitHub repositories, Confluence wikis, Jira project histories, and internal runbooks. Each source speaks a different API dialect, uses different authentication mechanisms, and structures documents differently.

Each knowledge source connects through a dedicated Model Context Protocol (MCP) server — a lightweight Node.js service that abstracts source-specific APIs into a normalised document schema: { title, content, source, type, tags[] }. This normalisation step is unglamorous but load-bearing; without it, the embedding pipeline would need to handle source-specific quirks, and every new knowledge source would require changes to the core system.

The choice of MCP as the integration protocol is a deliberate long-term bet. As MCP becomes the standard for agent-to-tool communication — and it is moving in that direction quickly — an MCP-first knowledge layer means your enterprise knowledge graph is automatically compatible with any future AI agent or autonomous workflow that speaks the protocol.

Layer 2 — The Vector Knowledge Store

Documents from MCP sources pass through an embedding pipeline that converts text into 384-dimensional vectors using sentence-transformers/all-MiniLM-L6-v2, running locally via @xenova/transformers. The local embedding decision is deliberate on two axes: it eliminates per-query API cost for embedding generation, and it removes a third-party dependency from a security-sensitive data path.

Weaviate 1.24 serves as the vector database, with the schema configured to accept externally provided vectors rather than delegating to Weaviate's module system. This gives the architecture full control over the embedding model, enables model swaps without schema migrations, and keeps the vector layer portable across cloud and on-premise deployments.

Similarity search uses cosine distance with a configurable certainty threshold, returning the top semantically relevant documents per query. In production, threshold and retrieval count should be tuned per domain — legal and compliance use cases typically benefit from higher certainty thresholds; exploratory architectural research from broader retrieval.

Layer 3 — The RAG Pipeline

When a query arrives, the pipeline runs four steps in sequence. The query is embedded using the same model as the ingestion pipeline — semantic symmetry between ingestion and query embeddings is critical; mismatched models produce degraded retrieval that is difficult to diagnose. Weaviate returns the top semantically relevant documents with certainty scores. A structured context block is assembled, preserving document titles, types, and relevance rankings. A prompt instructs the LLM to synthesise a grounded answer, cite document numbers explicitly, acknowledge gaps where context is absent, and surface competing approaches rather than collapsing them into a single recommendation.

The LLM layer is provider-agnostic by design. The RAG service abstracts across OpenAI, Anthropic Claude, and any OpenRouter-compatible model. Switching providers requires a single environment variable change with no code modification. This is not an accident — LLM commoditisation is accelerating, and any architecture that hard-codes a specific provider will need re-engineering within eighteen months.

Layer 4 — Three Distinct API Surfaces

Endpoint	Purpose	Persona
POST /api/search	Natural language query returns cited AI answer, retrieved sources with relevance scores, and breakdown by originating MCP system	Architects doing exploratory discovery
GET /api/recommendations	Returns patterns extracted from historical project data, ranked by cross-project frequency	Tech Leads seeking proven patterns
POST /api/compliance-check	Evaluates plain-English architecture description against enterprise standards, produces auditable gap report	Architecture Review Boards

Layer 5 — Three Personas, One Interface

The Next.js frontend is designed around three primary user personas with distinct information needs. Architects use Knowledge Search to explore decision history before proposing new solutions — surfacing relevant ADRs, prior design decisions, and pattern documentation, with the AI synthesising a coherent cited answer. Tech Leads use Recommendations to identify proven patterns for new projects without starting from first principles. Architecture Review Boards use Compliance to pre-evaluate designs, reducing review cycle time and catching structural gaps before the formal session.

The Infrastructure Topology

Component	Technology	Key Rationale
Vector DB	Weaviate 1.24	Production-grade, cloud-native, GraphQL query API, schema portability
Embeddings	@xenova/transformers (local)	Zero API cost, no data egress, model portability
LLM Gateway	OpenRouter	Multi-provider abstraction; swap models via single env var
MCP Servers	Node.js / Express	Lightweight, protocol-native, independently deployable per source
Relational DB	PostgreSQL 15	Audit logs, user context, structured metadata — immutable query history
Frontend	Next.js 14	SSR, production build, TypeScript throughout
Orchestration	Docker Compose	Dev and staging; Kubernetes-ready manifest pattern

The architecture is deliberately cloud-portable. There are no hard dependencies on any specific provider's managed services. The entire stack can run on-premise, on AWS, on Azure, or in a hybrid topology with storage and compute separated by data classification boundary.

Design Principle

Governance is a first-class framework component — not an add-on. Immutable query history, mandatory citation enforcement, and retrieval-layer access control are built into the architecture, making supervised agentic workflows possible without rebuilding governance.

Why This Is Deliberately Not an Agent

This reference architecture does not write code, execute workflows, or take actions on behalf of users. Every output is grounded in source documents. Every answer includes citations. The human remains the decision-maker. This is not a limitation — it is a strategy.

The fastest path to organisational trust in AI systems is demonstrating that they make humans more effective — not that they replace human judgement. A system that returns a well-cited answer with identified trade-offs, and then gets out of the way, will see faster adoption than one that tries to close the loop autonomously. Trust compounds; so does resistance.

The agentic layer comes next. Once the knowledge infrastructure is trusted and in production, it becomes the memory system for more sophisticated agents. The MCP server topology already provides the agent-to-tool interface. The vector knowledge layer already provides the retrieval. Adding an orchestration layer becomes an integration exercise rather than a re-architecture.

What Agents Built on This Infrastructure Inherit

The deliberate non-agentic design of this reference architecture is a phase decision, not a permanent constraint. The more important point is what an agent built on top of this knowledge infrastructure would inherit automatically.

An agent that uses the MCP server topology as its tool-calling layer already has normalised access to GitHub, Confluence, and Jira through a defined JSON-RPC interface. The normalisation contract — { title, content, source, type, tags[] } — is already the schema every MCP server outputs. That consistency is what makes multi-source agent reasoning tractable at enterprise scale.

The vector knowledge layer is the agent's long-term memory. Every document ever ingested — ADRs, runbooks, design decisions, past incident reports — is semantically indexed and retrievable with a single embedding call. An agent planning an architecture change does not start from scratch. It queries the knowledge layer first, surfaces every prior decision relevant to its task, and reasons over that context.

The Governance Layer Agents Inherit

Governance is the overlooked dimension of agent readiness. Most discussions of enterprise agents focus on capability — what tasks can the agent complete? The harder question for a regulated enterprise is: what does the agent know, when did it know it, and who authorised it to act?

An agent built on this infrastructure inherits three governance properties by construction. First, immutable query history: every retrieval call the agent makes is logged to the PostgreSQL audit trail with the agent identity, query text, documents retrieved, and timestamp. If an agent-generated architecture recommendation is later disputed, the full retrieval trace is recoverable.

Second, mandatory citation: the RAG pipeline's generation prompt requires explicit source attribution for every factual claim. An agent that synthesises a design recommendation cannot produce an uncited assertion — the prompt architecture enforces it.

Third, access boundary enforcement: because all knowledge retrieval flows through the MCP server topology and the retrieval service, access control is centralised. An agent operating under a restricted role cannot retrieve documents outside its authorised scope.

The Tech Stack

Knowledge & Retrieval

Weaviate 1.24 all-MiniLM-L6-v2 @xenova/transformers OpenRouter

MCP Integration

GitHub MCP Confluence MCP Jira MCP JSON-RPC

Platform

Next.js 14 TypeScript PostgreSQL 15 Docker Compose

Governance

Immutable Audit Log JWT Auth Mandatory Citations Rate Limiting

Fork the Reference Architecture

Full implementation — MCP servers, RAG pipeline, vector store configuration, and the complete frontend — available on GitHub. This is not a whiteboard architecture. It runs.

View on GitHub Get in Touch