AI-Powered Enterprise Architecture Advisor

A comprehensive enterprise architecture recommendation — one that spans business requirements, solution design, security risk, and infrastructure cost — typically requires three to four specialist consultants, a discovery workshop, and anywhere from two to eight weeks of structured analysis. This system compresses the first-pass synthesis into under five minutes, then hands the output to the human architect for validation and refinement.

The EA Advisor Agent is a live, deployed multi-agent AI system engineered to production disciplines in which four GPT-4o-mini-powered specialists each contribute a distinct analytical lens over the same set of requirements, before their outputs are assembled into a unified advisory report. It is not a replacement for human architects. It is a compression tool — one that shifts the architect’s effort from discovery to high-judgement decisions.

Why Multi-Agent, Not a Single Prompt

The obvious starting point is a single prompt: “You are a senior enterprise architect. Design a system for these requirements.” It works — up to a point. Large language models are capable of producing reasonable architecture recommendations when given structured input. The problem is cognitive role conflict.

When a single model is asked to simultaneously optimise for solution elegance, security posture, and cost efficiency, those objectives pull against each other. A business analyst and a CISO approach the same system very differently — and that productive tension, in human teams, is where the best architecture emerges. Collapsing all four perspectives into one generation pass flattens that tension. You get a compromise, not a synthesis.

The multi-agent approach preserves the tension. Each agent has a defined scope, a specialised system prompt, and no awareness of trade-offs outside its domain. The orchestration layer — not any single agent — is where the synthesis happens.

The value of multi-agent architecture is not raw intelligence — it’s structured role separation. Each agent optimises for a different objective, and the tension between them produces better outputs than any single agent averaging across all four.

The Four-Agent Pipeline

The system runs a sequential pipeline using the CrewAI framework. Sequentiality is a deliberate design choice: each agent receives not just the original user requirements, but the full output of every preceding agent. Context accumulates. By the time the Cost Optimisation Specialist runs, it has access to the business requirements, the proposed architecture, and the risk assessment — allowing it to price trade-offs, not just components.

EA Advisor Agent · Sequential Pipeline

User Requirements → 4 Specialist Agents → Unified Report

Agent 01 · GPT-4o-mini

Senior Business Analyst

Extracts core objectives, functional and non-functional requirements
Maps stakeholder groups: operations, IT, compliance, executive leadership
Surfaces constraints, third-party dependencies, and regulatory exposure
Defines measurable success criteria before any architecture is proposed

↓

Agent 02 · GPT-4o-mini

Lead Solution Architect

Selects architecture pattern (microservices, event-driven, serverless, hybrid) with explicit rationale
Recommends named technology stack: specific products, not categories
Defines service boundaries, integration patterns, and data flow
Generates a C4-style architecture diagram in Mermaid, rendered live in the UI

↓

Agent 03 · GPT-4o-mini

Enterprise Risk Assessment Specialist

Evaluates authentication, data protection, and PII handling posture
Identifies single points of failure and scalability bottlenecks
Names domain-specific regulations: PCI DSS, HIPAA/PHIPA, GDPR, NERC CIP, ISO 20022
Rates each risk High / Medium / Low with specific mitigation controls

↓

Agent 04 · GPT-4o-mini

Cloud Cost Optimisation Specialist

Real dollar amounts — component-level monthly breakdown using current AWS/Azure/GCP pricing
Annual TCO and scaling projections at 2× and 5× current volume
Reserved vs on-demand recommendation with break-even analysis
Top 3 cost risks with mitigation strategies

The system ships with four pre-built prompts spanning different domains and compliance regimes, demonstrating that the agents adapt their analysis to the specific regulatory and technical context of each industry.

Finance

Real-Time Payments Modernisation

800 TPS baseline, 3,000 TPS peak. ISO 20022 rails, fraud detection under 50ms, PCI DSS Level 1, active-active across two regions.

Healthcare

Clinical Data Platform

12 hospitals, 6 siloed EHR systems. HL7 FHIR R4 aggregation, 2-minute care-gap alerts, HIPAA + PHIPA + provincial data residency.

Utility

Smart Grid Meter Platform

2 million customers, 15-minute smart meter ingest, 5-minute outage detection, SAP billing integration, NERC CIP compliance.

Retail

Omnichannel Commerce

200 stores, 1.5M orders/year. Black Friday peak scaling, sub-1-second BOPIS checkout, unified customer profile for personalisation.

The Engineering Decisions That Matter

Building a multi-agent system that actually completes reliably in production — rather than timing out, looping indefinitely, or silently returning the wrong output — required a set of specific engineering decisions. These are not configuration details; they are the difference between a demo that works once and a system that runs for every user.

The Timeout Problem — And How to Solve It

A four-agent sequential LLM chain takes two to four minutes to complete. Holding a single HTTP connection open for that duration triggers proxy timeouts on any infrastructure — reverse proxies, load balancers, and API gateways all enforce idle connection limits. The naive solution (tuning timeout thresholds) papers over the symptom without solving the architecture problem.

The correct solution is an async job pattern: decouple the browser from the long-running process entirely.

POST /analyze/start → FastAPI → background thread started
← { job_id: "uuid", status: "pending" } # returns in <1 second

# browser polls every 4 seconds
GET /analyze/status/{job_id} → { status: "running" }
GET /analyze/status/{job_id} → { status: "running" }
GET /analyze/status/{job_id} → { status: "done", result: "## Business Analysis..." }

Each poll is a short-lived GET request. The analysis runs in a daemon thread on the Railway container. No connection is held open for more than a second, regardless of how long the LLM chain takes. This pattern also makes the loading state meaningful — the UI shows which agent is currently running, advancing through four steps as time elapses.

The Output Problem — Collecting All Four Agents

CrewAI’s crew.kickoff() return value contains only the last agent’s output. In a naive implementation, the C4 architecture diagram generated by Agent 02, the risk register from Agent 03, and the requirements analysis from Agent 01 are all silently discarded — only the cost analysis reaches the user.

The fix is to access task.output.raw_output on each task object after kickoff, and assemble all four sections into a single markdown document. This required identifying the correct attribute name: in CrewAI 0.28.x, the content lives in raw_output, not the raw alias that appears in documentation for other versions.

ADR-001

Async Job Pattern

Start → poll over synchronous HTTP

—

Why it matters

Proxy timeouts kill long-running LLM chains. A 4-minute analysis cannot survive in a synchronous request. Background thread + polling removes the dependency entirely.

—

Result

Zero timeout failures in production. Upgrade to SSE or WebSocket streaming is a drop-in swap when needed.

ADR-002

Output Collection

task.output.raw_output × 4

—

Why it matters

kickoff() returns only the last task. Without explicit collection, the C4 diagram, risk register, and business analysis are silently discarded before reaching the user.

—

Result

All four sections, including the Mermaid diagram, appear in the final report.

ADR-003

Delegation

allow_delegation=False on all agents

—

Why it matters

With delegation enabled, CrewAI agents route tasks to each other indefinitely. The pipeline never terminates. This is not a misconfiguration — it is the default behaviour.

—

Result

Hard scope per agent. Each runs once, produces its output, and passes context to the next.

ADR-004

LLM Gateway

OpenRouter / OpenAI-compatible

—

Why it matters

All four agents use a ChatOpenAI client pointed at OpenRouter. Switching to direct OpenAI, Azure OpenAI, or Amazon Bedrock is a single environment variable change — no application code changes.

—

Result

Provider-agnostic from day one. The demo runs on GPT-4o-mini at low cost; production swap to Claude or Bedrock requires no refactoring.

The Tech Stack

Agent Orchestration & API

CrewAI 0.28.8 GPT-4o-mini OpenRouter FastAPI Python 3.11

Frontend

Next.js 14 App Router TypeScript Tailwind CSS NextAuth.js Mermaid.js ReactMarkdown

Deployment & Infrastructure

Vercel (frontend) Railway (backend) Custom domain — cloudkraft.com JWT auth + access code gate Formspree lead capture

Architecture Diagram — Generated by the Agent

One of the more interesting product decisions was asking the Solution Architect agent to include a C4-style container diagram in its output, written in Mermaid syntax. The frontend renders this live using Mermaid.js — so the architecture diagram the user sees is not a static template. It is generated specifically for their requirements, with the correct component names and technology labels.

Implementation Detail

LLMs reliably generate the Mermaid diagram body but inconsistently include the opening code fence. A backend regex — re.sub(r'(?<!`)\nmermaid\n', '\n```mermaid\n', raw) — adds the fence when missing, ensuring the frontend parser always receives a valid fenced code block regardless of model output variation.

What a Production Deployment Would Add

This system is a working portfolio demonstration. For an enterprise deployment where the recommendations influence real infrastructure spend or real compliance posture, four categories of capability would need to be added.

Evaluation

Ground truth scoring against validated architecture recommendations. LLM-as-judge for qualitative dimensions. Human-in-the-loop review gates before output is acted upon. LangSmith or Arize Phoenix for tracing.

Drift Detection

Pin model versions explicitly. Monitor output quality across LLM updates. Embed input queries and track distribution shifts — when queries move out-of-distribution, quality degrades silently without this signal.

Cost Governance

Token budgets per agent. Output caching on input hash. Model tiering — lighter models for structured extraction, stronger models for nuanced reasoning. At 1,000 queries/day, ungoverned GPT-4 spend reaches $30k/month.

Control	Implementation	Estimated Saving
Output Caching	Redis on input hash; 24–72 hour TTL for repeated queries	30–50%
Model Tiering	GPT-4o-mini for BA and Cost; GPT-4o for Architect and Risk	40–60%
Token Budgets	Hard max_tokens per agent with graceful truncation	15–25%
Prompt Injection Guards	Pre-processing classification step before requirements enter the pipeline	Security control, not a cost saving
Audit Trail	Immutable log of inputs, intermediate outputs, and final recommendations with user identity	Governance control, not a cost saving

The Honest Value Proposition

Multi-agent architecture analysis works. For early-stage feasibility work, internal innovation reviews, or rapid client discovery, a system like this compresses a meaningful amount of structured expert thinking into a time frame that human teams cannot match. The Business Analyst agent does not get tired. The Risk Assessor does not forget to check the compliance section. The Cost Specialist does not assume the client has negotiated enterprise pricing.

The value is not in replacing the architect. It is in ensuring that the architect starts every engagement with a comprehensive first-pass analysis that covers all four domains — so they can spend their time on the decisions that genuinely require human judgement: organisational politics, existing technology debt, vendor relationships, team capability. That is a commercially real use case, and the production readiness work described above is the gap between demonstrating it and deploying it.

Try It or Read the Code

Live demo at ea-advisor-agent.cloudkraft.com. Access credentials on the sign-in page. Source code, agent prompts, and deployment configuration on GitHub.

Live Demo GitHub Get in Touch

Four AI Agents. One Architecture Report.

Why Multi-Agent, Not a Single Prompt

The Four-Agent Pipeline

The Engineering Decisions That Matter

The Timeout Problem — And How to Solve It

The Output Problem — Collecting All Four Agents

The Tech Stack

Agent Orchestration & API

Frontend

Deployment & Infrastructure

Architecture Diagram — Generated by the Agent

What a Production Deployment Would Add

The Honest Value Proposition

Try It or Read the Code