AegisAgent — Security Model¶
Version: v1.0
Date: 2026-06-16
Issue: #1404
Read first (detailed threats + assurance): AegisAgent_Threat_Model.md
This document states what AegisAgent guarantees, what it does not guarantee, where every trust boundary sits, which threat actors it models, and how those map to OWASP LLM Top 10 and MITRE ATLAS. The companion threat model enumerates specific threat IDs (T-A through T-D) with mitigations; this document is the structured policy reference a deployer or security reviewer reads first.
1. Trust boundaries¶
Every request crosses one or more of the following trust boundaries. Each boundary has a specific security control.
| ID | Boundary | Control |
|---|---|---|
| B1 | User / App → Agent Runtime | Out of scope for AegisAgent; the caller is untrusted by assumption |
| B2 | Agent Runtime → AegisAgent SDK | SDK intercepts every tool call before execution; agent process is assumed potentially compromised (indirect prompt injection) |
| B3 | SDK → Gateway (POST /v1/authorize) |
Mutual authentication via hashed agent tokens + tenant ID header; TLS in production; SDK enforces fail-closed action_hash check locally before execution |
| B4 | Gateway → SQLite / PostgreSQL | Private localhost bind (127.0.0.1); parameterized SQLx queries; tenant_id on every row and every query; WAL isolation |
| B5 | Gateway → External Tool / MCP Server | Tool output is untrusted data; MCP manifest pinned and hashed; drift triggers downgrade + SOC event |
| B6 | Gateway → Approval channel (Slack / Teams / dashboard) | HMAC-SHA256 callback signature; approval bound to action_hash; approver identity verified via SSO/OIDC |
| B7 | Tenant A data → Tenant B data | tenant_id filter on every SQL query; parameterized SQLx only; middleware-layer scoping; per-tenant SOC indices and receipt partitions |
| B8 | SOC evidence → RCA narrator (LLM) | Evidence passed as inert structured data, never as instructions; narrator has no tools and no enforcement authority; output is display-only |
| B9 | Response Engine → Gateway control API (/freeze, /revoke, /quarantine) |
Authenticated; tenant-scoped; every containment action is audited + reversible; optional two-person confirm for revoke |
How boundaries compose¶
[User / external content]
↓ B1 (untrusted)
[Agent Runtime]
↓ B2 (SDK intercept — fail-closed action_hash check)
[Gateway — authorize / policy / risk]
↙ B4 (DB — tenant-scoped)
[SQLite / PG]
↘ B5 (MCP / tools — manifest-pinned)
[External Tool]
↓ B6 (approval callback — HMAC-verified)
[Human Approver]
↑ B9 (response engine — freeze / revoke / quarantine)
[SOC plane] ← B7 (tenant isolation) → B8 (LLM narrator — inert data only)
2. Threat actors¶
| Actor | Description | Primary threat class |
|---|---|---|
| Malicious agent | An agent that has been prompt-injected or is intentionally trying to execute unauthorized actions (approve-then-swap, replay, parameter tampering) | T-A: Approval manipulation |
| Indirect prompt injection attacker | External content (GitHub issue, email, webpage, MCP tool output) that hijacks an honest agent to perform a privileged action on the attacker's behalf | T-B: Confused deputy via provenance |
| Compromised MCP server | A third-party MCP server whose manifest has been tampered with or whose tool definitions have been silently swapped to new, more-permissive ones | T-B5: MCP manifest drift / tool poisoning |
| Insider / tenant admin | A user with legitimate access who attempts to tamper with receipts, alter audit logs, approve their own actions, or access cross-tenant data | T-C: Evidence tampering; T-A7: Self-approval; B7: Tenant isolation |
| SOC poisoner | An attacker who crafts evidence (event bodies, alert payloads, RCA inputs) to manipulate the detection/response plane — either to evade detection, trigger false containment, or inject into the RCA LLM | T-D: Attacks on the SOC |
| External network attacker | An attacker with network access to the gateway who tries to replay approvals, forge callback signatures, or brute-force agent tokens | T-A3: Replay; T-A5: Callback forgery |
3. Security guarantees¶
These are properties AegisAgent enforces by construction — not by configuration or policy best-effort.
G1 — Fail closed¶
Every unrecoverable uncertainty produces a deny:
- Unknown agent → deny (no registration row)
- Unknown tool or MCP tool → deny (not in tool registry)
- Gateway unreachable → SDK refuses to execute mutating / high-risk actions
action_hashmismatch → SDK refuses to execute (approve-then-swap blocked)- Approval expired → deny (gateway returns
EXPIRED) - Approval already consumed → deny (gateway returns 409; SDK fails closed)
- Audit writer unavailable → high-risk action denied with
audit_writer_unavailablereason source_trust∈ {untrusted_external,malicious_suspected} +mutates_state→ deterministic Cedarforbid
G2 — Hash-bound approval integrity¶
Every human approval is cryptographically bound to the exact canonical action that was presented to the approver (aegis-jcs-1 JCS — keys sorted by Unicode code point, compact separators, raw UTF-8, rejects non-finite floats). The SHA-256 hash is stored at approval time and re-verified by the SDK before execution. The hash cannot be stripped, replaced, or forged without breaking the approval chain.
Three additional sub-guarantees:
| Sub-guarantee | Mechanism |
|---|---|
| No approve-then-swap | SDK recomputes action_hash at execution time; mismatch → fail closed |
| No replay | Approvals are single-use; atomic consume with consumed_at guard; 409 on re-use |
| No expiry bypass | expires_at enforced at both gateway (GET /v1/approvals/:id) and SDK |
Canonicalization must remain byte-identical across all SDKs and the gateway. This is locked by the shared corpus (tests/canonical_action_vectors.json) and a CI byte-equality gate.
G3 — Deterministic trust-provenance gating¶
Authorization is gated on the source trust level of the triggering content — a 6-level deterministic label, not a score:
trusted_internal_signed → trusted_internal_unsigned → semi_trusted_customer
→ untrusted_external → malicious_suspected → unknown
Classifiers may only tighten a label, never loosen it. Cedar policies enforce at the policy level: mutating actions from untrusted_external or malicious_suspected are forbid-ed regardless of any ML score or advisory metadata. This is the confused-deputy defense — it closes the Invariant Labs class of attacks by construction.
G4 — Verifiable, hash-chained receipts¶
Every authorization decision that executes produces a receipt: receipt_id, action_hash, decision_hash, prev_hash, timestamp, optional Ed25519 signature. Receipts form a per-tenant hash chain. Alteration, gap, or forgery is detectable via GET /v1/receipts/:id/verify and POST /v1/receipts/verify-chain. This provides compliance evidence for SOC 2 and EU AI Act Article 14.
G5 — SOC is out-of-band and deterministic¶
The detection / correlation / response pipeline is strictly asynchronous (tokio::mpsc fire-and-forget). Four design laws are enforced by construction:
| Law | Statement |
|---|---|
| Law 1 | Advisory scores (composite_risk_score, graph data, incident metadata) are display/audit metadata only — they never gate the allow / deny / require_approval decision. Cedar and trust-provenance decide. |
| Law 2 | Only one LLM exists in the system: the post-incident RCA narrator. It is sandboxed, has no tools or enforcement authority, and receives evidence as inert structured data only. All triage, correlation, and response are deterministic code. |
| Law 3 | SOC processing is always out-of-band. An SOC outage degrades monitoring; it can never make the action path fail-open. |
| Law 4 | Every agent containment action (freeze / revoke / quarantine) is tenant-scoped, authenticated, audited, and reversible. |
G6 — Multi-tenant isolation¶
All data is partitioned by tenant_id. Every SQL query in the gateway binds tenant_id as a parameterized argument. There are no raw string concatenation queries. Middleware extracts and validates tenant_id from authenticated headers before any data access. Cross-tenant requests return 404, not 403 (no information leakage about existence).
4. Non-guarantees¶
AegisAgent explicitly does not guarantee the following:
| What AegisAgent does NOT protect against | Why / notes |
|---|---|
| Agent intent or correctness | AegisAgent is an integrity and oversight layer, not an intent classifier. It enforces whether an approved action runs as approved, not whether the action was a good idea. |
| Prompt injection into the agent LLM itself | AegisAgent detects and blocks the downstream tool actions that a prompt-injected agent would take, but it does not sanitize or filter the agent's inputs. Provenance gating reduces blast radius; it does not prevent the injection. |
| A compromised SDK binary | If the SDK itself is replaced with a malicious version that skips the action_hash check, the fail-closed guarantee breaks at B2–B3. Mitigation: signed releases, SBOM, supply-chain verification. |
| Correctness of Cedar policies | Incorrect policies (e.g., a policy that accidentally allows a high-risk action) will be enforced as written. AegisAgent enforces whatever policy is loaded; policy authoring is the operator's responsibility. |
| Availability under adversarial load | AegisAgent is not a DDoS mitigator. Under extreme load, the gateway may reject requests (429) or become unavailable; the SDK fails closed (denies execution) for mutating/high-risk actions, which is safe but impacts availability. |
| Confidentiality of action parameters | By default, action parameters are hashed and stored; raw payloads are not persisted. However, AegisAgent does not encrypt data at rest beyond what the host platform provides. |
| Novel attack shapes not yet in detection rules | Deterministic detection rules catch known patterns; they do not generalize to novel attack shapes. Behavioral baselining (SOC-007) adds anomaly signal as advisory-only. |
| Integrity of the host operating system | If the OS is compromised, the gateway process or DB files may be altered. AegisAgent assumes a trusted host. |
| Single-agent environments without a gateway | AegisAgent requires the gateway to be reachable. Offline or air-gapped deployments must pre-configure the SDK to fail closed and use a local gateway instance. |
5. OWASP LLM Top 10 mapping¶
| OWASP LLM | Name | AegisAgent coverage |
|---|---|---|
| LLM01 | Prompt Injection | Closes via provenance. Indirect prompt injection drives a tool action → provenance label is untrusted_external → Cedar forbid for mutating actions (T-B1). Direct injection into the agent LLM is out of scope (see Non-guarantees). |
| LLM02 | Sensitive Information Disclosure | Mitigates. Action parameters are hashed, not stored as plaintext. source_trust=untrusted_external cannot trigger cross-repo data movement (T-B2). RCA evidence is redacted before the LLM sees it (T-D7). |
| LLM03 | Supply Chain Vulnerabilities | Partial. MCP manifest pinning + hash verification detects server-side supply chain compromise (T-B5). SDK and gateway release signing + SBOM address the software supply chain. Runtime compromise of the SDK binary is a residual risk. |
| LLM04 | Data and Model Poisoning | Advisory coverage. Memory/RAG poisoning is a roadmap item (T-B6). Current coverage: provenance-gating writes that originate from untrusted sources; SOC correlation for suspicious read-then-write sequences. |
| LLM05 | Improper Output Handling | Closes via approval integrity. Tool output from an LLM that has been hijacked must still pass the action_hash check and Cedar evaluation before executing. Unfiltered output cannot bypass B3. |
| LLM06 | Excessive Agency | Closes via fail-closed + approval. High-risk and mutating actions require explicit human approval bound to a specific action hash. Unknown tools are denied by default. Runaway agents are detected (SOC runaway incident, ≥10 actions in 30s). |
| LLM07 | System Prompt Leakage | Out of scope. AegisAgent does not inspect or protect LLM system prompts. |
| LLM08 | Vector and Embedding Weaknesses | Out of scope (roadmap). AegisAgent does not currently inspect RAG inputs. Provenance gating limits blast radius of a poisoned retrieval result (T-B6). |
| LLM09 | Misinformation | Out of scope. AegisAgent does not evaluate the factual accuracy of agent outputs. |
| LLM10 | Unbounded Consumption | Mitigates. Per-agent rate limiting (token bucket: AEGIS_RATE_LIMIT_CAPACITY + AEGIS_RATE_LIMIT_REFILL_RATE). SOC runaway and deny_storm incidents detect and auto-contain runaway agents. |
6. MITRE ATLAS mapping¶
MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) tactics mapped to AegisAgent's threat classes and controls.
| ATLAS Tactic | Relevant techniques | AegisAgent control |
|---|---|---|
| Initial Access | Phishing via LLM, supply chain compromise of ML artifacts | MCP manifest pinning (T-B5); SDK signing; provenance gating entry point (B1) |
| Execution | LLM prompt injection (direct/indirect), exploit public-facing ML API | Trust-provenance deterministic forbid (G3); fail-closed B3 (G1) |
| Persistence | Backdoor ML model, poison training data | Out of scope (roadmap); memory/RAG provenance (T-B6) |
| Privilege Escalation | Prompt injection → privilege escalation, abuse agent function calls | Confused-deputy defense (T-B1–T-B4); approval integrity (G2); Cedar forbid for untrusted-triggered mutations |
| Defense Evasion | Evade ML detection, craft adversarial examples against ML classifiers | Deterministic detection (Law 2); provenance labels are not ML outputs (G3); score-gating prohibited (Law 1) |
| Credential Access | Steal ML API keys, hijack model endpoints | Token Broker pattern (agents never hold raw tool credentials); hashed agent tokens; short-lived credentials |
| Discovery | Discover AI model APIs, enumerate ML pipeline | 404 (not 403) for cross-tenant requests; no information leakage about other tenants' resources |
| Collection | Exfiltrate training data via model inversion, LLM data leakage | Provenance gating prevents untrusted-triggered read→public-write (T-B2); receipts provide audit trail |
| Exfiltration | Data exfiltration via LLM outputs | Action-level gating: cross-boundary data movement requires approval + provenance; SOC sequence correlation (AEG-3007) |
| Impact | Manipulate ML model output, denial of ML service | Approval integrity (G2) prevents execute-different-action; SOC response engine contains runaway agents (G5 Law 4) |
7. Security controls summary¶
The following table maps security properties to their implementation location for auditors.
| Property | Implementation |
|---|---|
| Fail-closed on gateway unreachable | sdk-python/aegisagent/decorator.py, sdk-go/aegis/protect.go, sdk-typescript/src/protect.ts |
action_hash canonicalization |
sdk-python/aegisagent/canon.py, sdk-go/canon/canon.go, sdk-typescript/src/canon.ts, gateway/src/routes.rs |
| Byte-equality gate | tests/canonical_action_vectors.json, cross-language CI |
| Single-use approval (replay defense) | gateway/src/db.rs → consume_approval; gateway/src/routes.rs → consume_approval handler |
| Approval expiry | gateway/src/routes.rs → get_approval (EXPIRED state); SDK decorator expiry check |
| Trust-provenance Cedar evaluation | gateway/src/policy.rs; gateway/policies.cedar |
| Tenant-scoped SQL | gateway/src/db.rs — every query binds tenant_id |
| Receipt hash chain | gateway/src/routes.rs → emit_action_receipt; gateway/src/sign.rs (Ed25519) |
| SOC out-of-band isolation | gateway/src/events.rs → tokio::mpsc channel; drain runs in dedicated Tokio task |
| Deterministic detection (no LLM) | gateway/src/detect.rs; gateway/src/rule_dsl.rs |
| RCA LLM sandboxing | gateway/src/narrate.rs — no tools, no authority, display-only output |
| Advisory-only composite risk | gateway/src/risk.rs — score written to decisions row and response; never read by Cedar |
| MCP manifest pinning | gateway/src/routes.rs → discover_mcp_tools; mcp_servers.manifest_hash column |
| Redaction of secrets | gateway/src/main.rs → redact_secrets; hashes stored, not raw payloads |
8. Deployment security notes¶
- Binding: The gateway binds
127.0.0.1:8080by default (AEGIS_BIND_ADDR). Never expose the gateway on0.0.0.0in production without a reverse proxy and mTLS. - Agent tokens: Stored as SHA-256 hashes (
agents.token_hash). The plaintext token is returned once at registration and never again. - Webhook secrets: Callback
secretis stored assha256(secret)(approvals.callback_secret_hash). The plaintext secret is never persisted. - Secrets in logs: The
redact_secretsmiddleware strips bearer tokens, API keys, and passwords from log lines before emission. - Ed25519 receipt signing: Optional. Set
AEGIS_SIGN_RECEIPTS=true+AEGIS_SIGNING_KEY_PATH. Without signing, receipt integrity relies on the hash chain alone (tamper-evident, not tamper-proof against a full DB compromise). - Rate limiting: Default token bucket is 100 burst / 10 refill per second per tenant. Increase via
AEGIS_RATE_LIMIT_CAPACITY/AEGIS_RATE_LIMIT_REFILL_RATEfor production load. - SQLite ceiling: SQLite WAL mode caps
/v1/authorizeat ~130–150 req/s per instance. For higher throughput, migrate to PostgreSQL (seedocs/performance-baseline.md).
For threat IDs (T-A1–T-D7), STRIDE analysis, assurance regression tests, and the governing principle, see AegisAgent_Threat_Model.md.