Shrike

Shrike is the adversarial pass across the agentic stack, synthesized by Aether's multi-model council, used in authorized engagements to probe production agentic AI systems for failure modes that single-model testing misses.

category: ai security research
status: Internal · authorized engagements only

Shrike is a security research agent built to probe production agentic AI systems for known failure modes: prompt injection, document- and image-based context manipulation, authority-impersonation patterns, and decision-boundary erosion under multi-turn pressure.

It's the sibling to Atrium in the OUTURE stack. Atrium operates the agents; Shrike probes them. Where Atrium renders work to be observed, Shrike runs structured adversarial scenarios against it and writes down what gives.

Built through Aether's M2M(A)^nth council across five frontier models running simultaneously: Claude Opus, Claude Sonnet, Gemini, Grok, and GPT. Each provider contributed a perspective; the council's debate produced the system prompt, the training corpus, and the decision policy.

Adversarial reasoning is exactly the domain where provider diversity matters most. A red-team agent written by a single model inherits that model's blind spots. The moves it can't predict are the same ones it can't generate. A council across five providers covers ground no single one of them could write a defense against.

Shrike walks a target agentic system through structured adversarial scenarios. Probes context handling: does the agent change behavior based on injected content inside documents or images that arrive through normal channels? Probes authority recognition: does it accept altered authoritative-source signals as legitimate? Probes conversational pressure: does it preserve refusals across multi-turn coercion? Probes decision boundaries: does it execute high-impact actions outside intended scope under indirect framing?

Findings are categorized by severity and reproducibility, with each finding paired to a defensive recommendation. The output of a Shrike engagement is a report the system owner can act on, not a list of party tricks.

Used internally to test OUTURE's own agentic systems before they ship. Used externally only with explicit written permission from the system owner: scope, in-scope assets, timing, and disclosure timeline agreed in writing before any probe runs.

In internal use, Shrike was run against Atrium, the multi-agent orchestration system on the same lab page, and successfully compromised it across multiple categories. Findings drove hardening across the agent-persona logic, document and image parsing, and authority-recognition layers before Atrium's public release. The same posture applies across the rest of the OUTURE stack: nothing ships before Shrike has had a pass at it.

External engagements have been conducted under written authorization against real production agentic AI systems. Specific methodologies, findings, and target identities remain private. Case summaries are available under NDA for prospective clients evaluating an AI-assurance engagement; the methodology itself stays inside the engagement, where it belongs.

Provider diversity in synthesis. Single-model security research has a structural blind spot: the agent can only generate adversarial moves the model knows how to think about. Synthesizing across five providers, each from a different training lineage, produces an adversary whose move set is the union, not the intersection.

Structured methodology over ad-hoc prompting. Reproducibility is the difference between security research and tinkering. Every probe Shrike runs is reproducible from a written scenario plus a model-version pin; findings hold up to re-test, which is what makes them findings.

Authorization-only. Security research without permission isn't research, it's intrusion. The methodology is built around scope agreements, written consent, and responsible disclosure timelines, both because that's the professional standard and because it's the only version of this work that compounds reputation instead of burning it.

Findings stay private. Published exploit specifics are exactly the surface that gets weaponized. The defensive recommendations go to the system owner; the methodology and the specifics stay with the engagement. The only thing this page is publishing is that the work happens, with permission, against real systems.

← previousAugur next →Skald

see all lab pieces →