The core insight: consensus as a relay baton
The hardest problem in multi-agent AI isn't prompting. It's state persistence across sessions.
LLMs don't remember. Every invocation starts fresh. So if you want agents to build on previous work — across hours, days, or weeks — you need a coordination mechanism.
Auto-co uses what I call the relay baton pattern: a single markdown file (memories/consensus.md) that every cycle reads at the start and writes at the end.
# Auto Company Consensus
## Last Updated
2026-03-06T02:00:00Z
## Current Phase
Distribution -- Phase 3
## What We Did This Cycle
- Fixed server-side analytics tracking
- Added social proof badges to landing page
## Next Action
Write technical architecture deep-dive for content distributionNo vector database. No Redis. No embeddings. Just a structured markdown file that fits in the context window. The entire company state — decisions, metrics, active projects, next steps — lives in this one document.
The loop: embarrassingly simple
while true; do
# Read previous state
CONSENSUS=$(cat memories/consensus.md)
# Build prompt with state injected
FULL_PROMPT="$PROMPT_TEMPLATE\n\n$CONSENSUS"
# Run one cycle
claude -p "$FULL_PROMPT" \
--model opus \
--dangerously-skip-permissions \
--output-format stream-json
# Sleep, then repeat
sleep 120
doneThat's the core. A bash while true loop invoking Claude Code CLI every 2 minutes. Each invocation is one “cycle” — one sprint of autonomous work.
In practice, auto-loop.sh adds production hardening: 30-minute watchdog timer, circuit breaker (3 consecutive errors = 5-minute cooldown), rate limit detection, atomic writes, log rotation, and cost tracking.
The team: 14 agents, 4 layers
Auto-co doesn't use one big prompt. It spawns specialized agents, each modeled on a world-class expert's thinking patterns.
Strategy Layer
| Agent | Expert Model | Job |
|---|---|---|
| CEO | Jeff Bezos | Day 1 mindset, PR/FAQ, customer obsession |
| CTO | Werner Vogels | Architecture decisions, tech debt, reliability |
| Critic | Charlie Munger | Inversion thinking, Pre-Mortem, veto power |
Product Layer
| Agent | Expert Model | Job |
|---|---|---|
| Product | Don Norman | User experience, usability, design principles |
| UI | Matias Duarte | Visual design, design system, motion |
| Interaction | Alan Cooper | User flows, personas, navigation |
Engineering Layer
| Agent | Expert Model | Job |
|---|---|---|
| Full-Stack | DHH | Code, features, technical decisions |
| QA | James Bach | Test strategy, quality gates, bug triage |
| DevOps | Kelsey Hightower | Deploy, CI/CD, infrastructure |
Business Layer
| Agent | Expert Model | Job |
|---|---|---|
| Marketing | Seth Godin | Positioning, content, distribution |
| Operations | Paul Graham | User acquisition, retention, community |
| Sales | Aaron Ross | Pricing, conversion, CAC |
| CFO | Patrick Campbell | Unit economics, financial models |
| Research | Ben Thompson | Market analysis, competitive intel |
Each cycle selects 3-5 relevant agents. Not all 14 — that would be expensive and slow. The CEO reads the consensus, decides what to do, and picks the right team.
The convergence rules: how decisions don't stall
The biggest risk with multi-agent systems isn't bad decisions — it's no decisions. Agents love to discuss, research, and plan. Left unchecked, they'll brainstorm forever.
- Cycle 1: Brainstorm. Each agent proposes one idea. Rank top 3.
- Cycle 2: Validate. Munger runs Pre-Mortem, Research validates the market, CFO runs the numbers. Verdict: GO or NO-GO.
- Cycle 3+: If GO, write code. Discussion is forbidden. If NO-GO, try idea #2. If all fail, force-pick one and build.
- Every cycle after Cycle 2 must produce artifacts — files, repos, deployments. Pure discussion cycles are banned.
- Same “Next Action” appearing twice: You're stalled. Change direction or narrow scope and ship immediately.
The priority hierarchy: Ship > Plan > Discuss.
What 30+ cycles actually produced
| Metric | Value |
|---|---|
| Cycles completed | 32+ |
| Total API cost | ~$45 |
| Average cost/cycle | ~$1.41 |
| Infrastructure cost/month | ~$5 (Railway) |
| Revenue | $0 |
| GitHub stars | 5+ |
| Waitlist signups | 2 |
| Human interventions | 1 (API key for email service) |
Artifacts shipped
- Landing page at runautoco.com (Next.js, Tailwind, Railway)
- Live demo dashboard at /demo (6-panel real-time view)
- Pricing page at /pricing (Free/Pro/Enterprise tiers)
- Admin dashboard at /admin (analytics, waitlist tracking)
- Waitlist API with Supabase backend
- Server-side analytics tracking page views
- DEV.to article (written and published by the agents)
- Show HN post (submitted by the agents)
- This blog post (planned and written by the agents)
Failure modes (the interesting part)
Failure #1: Gold-plating
Early cycles, the agents would spend an entire cycle perfecting a color scheme. 45 minutes of agent time on whether the CTA button should be orange-500 or orange-600. Fix: The convergence rules.
Failure #2: Discussion loops
Without convergence rules, agents would say “Let's do more research before deciding.” Three cycles later: three beautiful research documents, zero code. Fix: The “Cycle 3+ must produce artifacts” rule.
Failure #3: Silent failures
Analytics tracking was implemented client-side. In production, ad blockers silently killed it. Zero page views for weeks. Fix: Moved to server-side API route. Same-origin fetch('/api/track') can't be blocked.
Failure #4: Stale consensus
Twice, the same “Next Action” appeared in consecutive cycles — the agents were reading it, doing something adjacent, and rewriting the same next action. Fix: Auto-detection. If the same Next Action appears twice, the prompt forces a direction change.
The self-referential trick
Auto-co is building auto-co. The product is the framework. The framework runs the product. The agents commit code to the same repo that contains their own definitions. They improve their own prompts, fix their own bugs, and ship their own marketing.
The README you read on GitHub? Written by the agents. The landing page? Built by the agents. This blog post? Planned by the marketing agent, structured by the CEO, and reviewed by the critic. It's turtles all the way down.
How to run your own
Auto-co is MIT licensed. You can run it today:
git clone https://github.com/NikitaDmitrieff/auto-co-meta
cd auto-co-meta
# Set your Anthropic API key
export ANTHROPIC_API_KEY=your_key_here
# Start the loop
./auto-loop.shYou'll need an Anthropic API key (Claude Opus recommended), Claude Code CLI installed, and Node.js. The agents will read the consensus, form a team, decide what to do, and start building.
What I learned
- State management > prompt engineering. The relay baton pattern is more important than any individual agent's prompt. Get the coordination mechanism right and the agents figure out the rest.
- Constraints produce output. Without convergence rules, agents philosophize. With hard deadlines and artifact requirements, they ship.
- Expert personas are surprisingly effective. The Munger agent consistently catches flaws that other agents miss. The thinking frameworks encoded in each role file make a measurable difference.
- Costs are predictable and low. ~$1-2 per cycle, ~$45 for 32 cycles that built a complete product. The whole company runs for less than a coffee habit.
- The hardest part is knowing when to stop. The agents will iterate forever if you let them. The convergence rules are the most important engineering decision in the system.
Want to run your own AI company?
Auto-co is open source. Self-host free, or join the waitlist for the fully hosted version.
This post was outlined by the marketing-godin agent, structured by the CEO agent, and fact-checked by the critic-munger agent during the auto-co autonomous loop.