Augur

A research-and-production multi-agent system trading prediction markets across Kalshi, Polymarket, and PredictIt, with a global Kelly optimizer, agent calibration feedback, and disciplined cost engineering on the LLM stack.

category: multi-agent trading research
status: Research and production

Augur is a multi-agent prediction-market trading system. Thirteen persistent agents (five weather specialists by city, an economics agent, an events agent, an arbitrage scanner, three monitors, a portfolio manager, and a news sentinel) analyze opportunities continuously and publish signals to a global allocator.

It runs as both research and production. The architecture is the lab work; the system trades real markets with conservative caps. The same instrumentation that informs the research informs the trading.

Each agent runs as a long-lived process maintaining its own message history. Cycles publish signals to an event bus. A meta-agent, Captain, listens to every signal and solves a globally Kelly-optimal allocation under hard constraints (≤80% portfolio exposure, ≤20% single position, ≤30% per domain, Herfindahl <0.15).

A calibration store records every signal and its eventual outcome. Agent weights, used by Captain for sizing, are a function of historical Sharpe and calibration error. Good agents get scaled; agents that drift get downweighted within five to ten trades.

Execution is multi-venue. The same outcome traded on Kalshi, Polymarket, and PredictIt may have meaningfully different prices and depth; orders route to whichever venue offers the best fill, with arb capture when spreads exceed costs.

Why weather markets first. Weather contracts on Kalshi settle on a specific NWS station: Chicago is KMDW (Midway), not O'Hare; New York is KNYC (Central Park), not JFK or LaGuardia. The settlement infrastructure is unambiguous, the data is free from NOAA, and the contracts are high-frequency and low-noise compared to politics. Where the settlement source is observable, you can be confident; where it's ambiguous, you're guessing.

Why FRED, the Cleveland Fed Nowcast, and Atlanta Fed GDPNow. FRED is the canonical free source for CPI / unemployment / GDP / Fed funds. The Cleveland Fed Inflation Nowcast updates daily at 10am ET and outperforms Blue Chip consensus across all reporting windows. GDPNow updates 6–7 times per month and aggregates 13 GDP subcomponents via dynamic factor models. These are what professional macro forecasters trust; choosing them is itself a decision.

Why ZORI, Manheim, and EIA gasoline as leading indicators. The Zillow Observed Rent Index leads CPI shelter by 8–14 months. The Manheim Used Vehicle Value Index leads CPI used cars by 1–2 months. EIA weekly gasoline prices feed directly into CPI gasoline. Picking leads, not coincidents, is what gives the agents an information edge over Bloomberg consensus.

Why claude-code CLI subscription, not pure API spend. The decision layer (final trade calls) runs through `claude --print` on a Max subscription, flat-rate, predictable. The analysis layer (multi-turn data gathering and reasoning) uses the API. Cost-engineer the part that's high-volume into a fixed-cost subscription; keep the API for the part that benefits from persistent context. The whole LLM stack runs at roughly $80/month.

Why subprocess → persistent agents. The original architecture spawned a `claude --print` subprocess per cycle (~500ms overhead). The bottleneck was never reasoning quality. It was process spawn. Long-lived agents that maintain message history dropped latency about 10× and unlocked multi-turn reasoning across cycles.

Why a Captain. Each agent independently sizing 5% per trade leaves money on the table and creates correlation blowups. Solving Kelly globally with constraints and weighting signals by historical Sharpe is a different shape of problem and a different shape of answer.

Why a calibration loop. Without feedback, agents never improve. Tracking win rate, Sharpe, and calibration error per agent, and auto-weighting future signals, creates a flywheel where good agents scale and bad ones quietly downweight. The system improves without being retrained by hand.

Research and production. The system runs with conservative caps; methodology and architecture are shareable, specific live-trading parameters are not. This page is the public-facing version; the implementation guide and parameter set live with the operator.

← previousAether next →Shrike

see all lab pieces →