AegisAgent — MCP Defense Architecture¶
Issue: #1338
Read first (general trust boundaries): security-model.md, specifically boundary B5 (Gateway → External Tool / MCP Server).
This document describes how AegisAgent treats the Model Context Protocol (MCP) as an untrusted-supply-chain surface: every MCP server is registered, every tool it advertises is pinned to a hashed manifest, drift is detected and auto-contained, and unknown or unapproved tools fail closed. There is no separate "MCP proxy" process — MCP defense is built directly into the gateway's /v1/authorize hot path and the mcp_servers/mcp_tools tables (gateway/src/db.rs, gateway/src/routes.rs).
1. Threat model: why MCP needs its own gate¶
MCP servers are typically third-party or community-maintained processes an agent's runtime connects to for extra tools (filesystem, GitHub, databases, SaaS APIs). Unlike a tool AegisAgent's operator hand-registers and risk-tiers themselves, an MCP server's tool manifest — the list of tools, their names, descriptions, and risk metadata — is supplied by that external process and can change between deployments, versions, or compromises.
This creates a supply-chain attack surface distinct from the confused-deputy / indirect-prompt-injection class the trust-provenance gate defends against (see security-model.md §1, boundary B5):
- Manifest drift: an MCP server silently adds a new tool, widens an existing tool's capability, or removes a tool — any of which can happen without an operator re-running discovery review to notice.
- Unknown tool invocation: an agent (or a confused/compromised agent) calls an MCP tool that was never discovered/registered at all.
- Unapproved tool invocation: a tool was discovered but an operator hasn't yet reviewed and approved it.
- Compromised server: the MCP server itself starts returning a different, malicious manifest on a later discovery call.
AegisAgent's answer: pin the manifest hash on first discovery, fail closed on every gap, and require an explicit operator action to re-trust a server after drift.
2. Lifecycle: register → discover → pin → gate¶
1. POST /v1/mcp/servers Operator registers the MCP server (trust_level, endpoint, transport)
2. POST /v1/mcp/servers/:key/tools Discovery: server's tool manifest is submitted
→ upserted into mcp_tools (one row per tool)
→ manifest hash computed + snapshotted
→ first discovery: hash pinned, no drift possible yet
→ later discovery: hash compared against the pin
3. POST .../tools/:tool_key/approve Operator reviews and approves each discovered tool
(tools default to NOT approved — fail closed)
4. POST /v1/authorize (tool="mcp:<key>") Every call gated inline: server status, tool status, manifest pin
Steps 1–3 are administrative (gateway/src/routes.rs: register_mcp_server, discover_mcp_tools, approve_mcp_tool/disable_mcp_tool). Step 4 is the enforcement hot path every /v1/authorize call for an MCP tool passes through, described in §4.
2.1 Manifest hashing and pinning¶
discover_mcp_tools computes compute_mcp_manifest_hash(&payload.tools) — a deterministic hash over the submitted tool list — on every discovery call:
- First discovery for a server:
db::get_mcp_server_manifest_hashreturns an empty pin, so the new hash is simply pinned viadb::set_mcp_server_manifest_hash. No drift event fires (there is nothing to drift from yet). - Subsequent discovery: the new hash is compared against the pin. A mismatch fires drift handling (§3). Whether or not drift fired, the new hash always becomes the pin afterward — each distinct manifest change alerts exactly once, not on every poll.
- Every discovery call additionally writes a full manifest snapshot (
db::insert_mcp_manifest_snapshot) — not just the hash — so a drift event can be diffed against the prior version's actual tool list, not just told "something changed."
2.2 Tools default to unapproved¶
discover_mcp_tools registers each discovered tool with default_decision = "require_approval" (if the manifest marks it approval_required) or "policy" otherwise, but the tool's status itself starts unapproved — approve_mcp_tool/disable_mcp_tool are the only way to move it to "approved". A discovered-but-unreviewed tool is denied at the gate (§4.2), not silently allowed because it parsed successfully.
3. Drift detection and auto-containment¶
When a discovery call's computed manifest hash differs from the pinned value, discover_mcp_tools (gateway/src/routes.rs):
- Classifies the drift via
classify_manifest_drift(old_tools, new_tools)(#1336), diffing the two most recent manifest snapshots: tool_added/tool_removed→ high severity (a tool appearing or disappearing is the strongest hijack signal)tool_modified(description/risk/mutates_state changed on an existing tool) → medium severity- anything else that still hashes differently → low severity
- Emits an SOC event (
kind: "mcp_manifest_drift",decision: "flag"— not"deny", since drift is a server-integrity signal kept out of the deny-storm correlation engine, design law 1) carrying the server key, old/new hash, classification, and diff — never the raw tool payload. - Auto-quarantines the server (
db::set_mcp_server_status(... "quarantined")) and writes a dedicatedmcp_server_auto_quarantinedaudit event distinct from a manually-triggeredmcp_server_quarantinedevent, so an operator reviewing the audit trail can tell why a server went into quarantine. - Re-pins the new hash regardless of the drift outcome, so the next discovery call diffs against this version, not the stale one.
This is fail-closed by construction: quarantining happens at discovery time, and the gate in §4.1 denies every tool call from a quarantined server on the very next /v1/authorize call — there is no window where a drifted manifest is silently trusted pending review. An operator must explicitly call POST /v1/mcp/servers/:server_key/restore after investigating out-of-band.
The default YAML detection rules (rule_dsl::default_rules(), see event-schema.md) include three severity-banded rules keyed on this event — mcp_manifest_drift_high (min_risk_score: 75), _medium (40-74), _low (<40) — so the live SOC pipeline surfaces drift at the right urgency without an operator having to read raw event payloads.
4. Enforcement: the /v1/authorize gate for MCP calls¶
Every /v1/authorize call whose tool_call.tool resolves to an MCP server key (mcp_server_key_from_tool, matching the mcp:<server_key> convention) passes through three fail-closed checks, in order, before Cedar policy evaluation ever runs:
4.1 Server-status gate¶
db::get_mcp_server_by_key(...).status == "quarantined"
→ deny, matched_policies = ["mcp_server_quarantined"], risk_score = 100 (critical)
A quarantined server — whether quarantined manually or auto-quarantined on drift (§3) — denies every tool call it advertises, regardless of any individual tool's approved status. Without this server-level gate, a quarantine event would be recorded but not actually enforced on the hot path.
4.2 Tool-status gate¶
A tool that was discovered but not yet approved (or has been explicitly disabled) is denied — fail closed on "haven't reviewed it yet," not fail open.
4.3 Unknown-tool gate¶
db::get_mcp_tool_by_key(...) == None
→ deny, matched_policies = ["mcp_unknown_tool"], risk_score = 100 (critical)
A tool call naming an action that was never discovered for this server at all — including an attempt to disguise the mcp: prefix to dodge the mcp_server_key_from_tool match — is denied at maximum risk score, distinct from the merely-unapproved case in §4.2.
Only after all three checks pass does the call proceed to Cedar policy evaluation and the rest of the normal /v1/authorize pipeline (trust-provenance gate, risk scoring, approval routing). The critical_deny_policy default detection rule additionally watches for matched_policy_contains: [mcp_unknown_tool, critical] so an unknown-MCP-tool attempt also surfaces on the live SOC feed, not just as a single denied decision.
5. API surface¶
| Endpoint | Purpose |
|---|---|
POST /v1/mcp/servers (register) |
Register an MCP server: server_key, name, transport, trust_level, endpoint |
GET /v1/mcp/servers |
List servers with status and pinned manifest_hash |
GET\|PUT /v1/mcp/servers/:server_key |
Read/update server metadata |
POST /v1/mcp/servers/:server_key/quarantine |
Manually quarantine (denies all of that server's tool calls immediately) |
POST /v1/mcp/servers/:server_key/restore |
Reactivate after operator review (manual or post-drift) |
POST /v1/mcp/servers/:server_key/tools (discover) |
Submit the tool manifest; triggers hashing, pinning, and drift detection |
GET /v1/mcp/servers/:server_key/tools |
Read the current discovered tool manifest |
POST .../tools/:tool_key/approve |
Move a discovered tool to "approved" |
POST .../tools/:tool_key/disable |
Move a tool to "disabled" (denied at the gate, same as never-approved) |
All routes are tenant-scoped (TenantId extractor, parameterized SQLx) — see security-model.md boundary B7.
6. Audit trail¶
Every state transition in this lifecycle writes a distinct, queryable audit_events row (GET /v1/audit/events):
event_type |
Written by | When |
|---|---|---|
mcp_tool_discovered |
discover_mcp_tools |
Once per tool, every discovery call |
mcp_server_quarantined / mcp_server_active |
update_mcp_server_quarantine |
Manual quarantine/restore via the API |
mcp_server_auto_quarantined |
discover_mcp_tools (drift handling) |
Automatic, on manifest drift, carrying classification + diff |
Combined with the mcp_manifest_drift SOC event stream (§3) and the evidence graph's mcp_server node type (graph.rs, #1271), an operator can reconstruct the full chain — which manifest changed, what changed, when it was auto-quarantined, and which denied /v1/authorize calls happened while it was quarantined — without re-deriving it from raw event payloads.
7. What this does not cover¶
- MCP response inspection (validating tool output, not just gating the call) is a separate, not-yet-built concern (#1333) — AegisAgent is pre-execution-only today; it gates whether a call happens, not what a tool returns.
- A standalone MCP proxy process does not exist. All MCP defense lives in the gateway's existing
/v1/authorizepath andmcp_servers/mcp_toolstables — there is nothing to deploy or operate separately. - Transport-level security (TLS to the MCP server, credential management for the
endpoint) is the operator's responsibility; AegisAgent's gate operates on the manifest and call metadata, not the wire transport.