Aether

Aether is a self-correcting multi-model orchestration engine built on RDAV (Reflect → Decide → Act → Validate). Provider diversity × recursive agent depth, not just model size.

category: multi-model orchestration
status: Internal research · production-grade

Aether is the intelligence layer that sits behind the OUTURE stack. It is not a chatbot wrapper around a single model. It is a deliberation engine that runs across heterogeneous providers (Anthropic, OpenAI, Gemini, Groq, Ollama, Mistral, DeepSeek, plus 15+ others) and combines their outputs through council deliberation, self-correction, and adaptive routing.

The core IP is what we call M2M(A)^nth: multiple heterogeneous providers, each able to recursively spawn its own agent stack, to arbitrary depth. Intelligence scales with provider diversity multiplied by recursive depth, not just model size. Two providers running in council, each able to spawn five sub-agents, produces reasoning paths a single model cannot.

Every request passes through the RDAV loop: Reflect on what's actually being asked, Decide a plan, Act on it, then Validate the output against the original intent. If validation fails, the critique feeds back into the next reflection. The loop iterates until a confidence threshold is met or a max-cycle cap is hit.

For non-trivial decisions the loop calls the council. Four deliberation modes are available: Vote (majority), MOA (mixture of agents), Deliberate (multi-round debate), and Perspectives (constitutional lenses). Each mode uses a different mix of provider families to avoid correlated errors. Anthropic and Google reach the same answer for different reasons.

Routing is adaptive. A Thompson-sampling bandit maintains Beta distributions per provider per task type across nine task dimensions (coding, reasoning, creative, research, speed, and others). The router learns over time which provider is best for what. When a locally distilled specialist becomes good enough, it gets routed to instead of the frontier model, at no cost.

Once an agent template clears 20+ runs at mean quality ≥ 0.80, it's eligible for export. Three formats: a Aether-native package (system prompt + SFT corpus + DPO pairs + KDRs); a HuggingFace package (ChatML JSONL, Alpaca JSON, TRL SFTTrainer config, Axolotl config, model card); or a Vertex AI Gemini fine-tuning package (gcloud CLI scripts, Python SDK, validation split).

The export tier is a separate axis: agent definition only, agent + GGUF model + Ollama Modelfile, or agent + Safetensors + tokenizer. The point is that an agent trained inside Aether is portable (to local Ollama, to HuggingFace, or to Vertex AI) without rewriting it.

A concrete example of M2M(A)^nth at work. Aether synthesized a specialized agent for AI security research: for testing agentic systems against known failure modes (prompt injection, document- and image-based context manipulation, authority-impersonation patterns).

The notable part is the synthesis. The agent wasn't written by a single model and shipped. It was produced by a council running five frontier models simultaneously: Claude Opus, Claude Sonnet, Gemini, Grok, and GPT. Each provider's perspective shaped the agent's reasoning patterns; the council's debate produced the system prompt, the training corpus, and the decision policy.

That's the kind of agent that's genuinely hard to produce without provider diversity in the loop. A single model has trouble critiquing its own blind spots in a register foreign to its own training.

Specific findings, techniques, and vendor-product specifics stay private. That work informs how OUTURE thinks about AI deployments and isn't published. The point on this page is the synthesis methodology, which is the architecture working as designed.

Provider diversity over model diversity. Heterogeneous training lineages (Google vs Anthropic vs Meta vs Mistral vs xAI) produce genuinely different reasoning paths and uncorrelated errors. A council of two Anthropic models is barely a council; a council across three providers is one; across five is what the security research agent above required.

Recursive agent depth, not models calling models. Each provider brings its full agent stack. Sub-agents can spawn sub-agents with no fixed cap. The system scales by composing intelligence, not by stacking parameters.

Adaptive routing that learns. Static rules ("use Claude for code, GPT for creative") rot fast. Thompson sampling treats provider choice as an exploration-exploitation problem and improves with every run. Local distilled models join the routing table the moment they earn the seat.

Self-correction in the loop. The RDAV validator runs on every output before it leaves the system. The most expensive failure mode of LLM applications is confident incorrectness; closing the loop catches a meaningful share of it before the user sees it.

Aether is the brain; Atrium is the body. Aether alone is a backend with no channels, devices, or automation infrastructure. Atrium alone is execution and I/O with a single LLM making all decisions. Together: tasks come in through a Atrium channel, get routed through Aether's RDAV/council, and the result returns through Atrium's channel layer.

The integration is already wired through `aether_run`, `aether_spawn_agent`, and an MCP bridge. Any system that plugs Aether in gets recursive multi-model intelligence as a service. Atrium is the first full-stack proof of that.

Aether sits on top of a smaller set of primitives: three internal systems that informed how Aether handles knowledge, reasoning, and provider perspective. We use them daily; they're not productized, but they're the conceptual foundation everything else is built on.

Brain is a SQLite-backed knowledge graph with full-text search and typed relations between concepts (supports, contradicts, elaborates, derives_from, related). Not a document store. A graph where each connection has a type and is queryable. Aether's agent template registry and KDR storage use the same data-shape thinking.

Think is a structured reasoning engine that forces typed steps (assumption, argument-for, argument-against, evidence, observation, synthesis, conclusion). The same step types informed how Aether's RDAV loop validates outputs against original intent.

Cognitive Modes are six named operating registers (first-principles, devil's advocate, researcher, architect, reviewer, debugger). Each biases the same model toward a different mode of thinking. Aether's council uses the same idea at the provider level: each provider is its own cognitive mode by virtue of its training lineage.

All three are exposed through an MCP server, callable from any LLM client mid-conversation. The infrastructure is internal; the patterns it taught us are everywhere in Aether.

← previousAtrium next →Augur

see all lab pieces →