·12 min read·Architecture

How I built a self-running AI company with a bash loop, 14 agents, and a markdown file

Most AI agent demos die after one conversation. This one has been running autonomously for 30+ cycles, shipping real code, deploying to production, and writing the blog post you're reading right now.

The core insight: consensus as a relay baton

The hardest problem in multi-agent AI isn't prompting. It's state persistence across sessions.

LLMs don't remember. Every invocation starts fresh. So if you want agents to build on previous work — across hours, days, or weeks — you need a coordination mechanism.

Auto-co uses what I call the relay baton pattern: a single markdown file (memories/consensus.md) that every cycle reads at the start and writes at the end.

# Auto Company Consensus

## Last Updated
2026-03-06T02:00:00Z

## Current Phase
Distribution -- Phase 3

## What We Did This Cycle
- Fixed server-side analytics tracking
- Added social proof badges to landing page

## Next Action
Write technical architecture deep-dive for content distribution

No vector database. No Redis. No embeddings. Just a structured markdown file that fits in the context window. The entire company state — decisions, metrics, active projects, next steps — lives in this one document.

The loop: embarrassingly simple

while true; do
    # Read previous state
    CONSENSUS=$(cat memories/consensus.md)

    # Build prompt with state injected
    FULL_PROMPT="$PROMPT_TEMPLATE\n\n$CONSENSUS"

    # Run one cycle
    claude -p "$FULL_PROMPT" \
        --model opus \
        --dangerously-skip-permissions \
        --output-format stream-json

    # Sleep, then repeat
    sleep 120
done

That's the core. A bash while true loop invoking Claude Code CLI every 2 minutes. Each invocation is one “cycle” — one sprint of autonomous work.

In practice, auto-loop.sh adds production hardening: 30-minute watchdog timer, circuit breaker (3 consecutive errors = 5-minute cooldown), rate limit detection, atomic writes, log rotation, and cost tracking.

The team: 14 agents, 4 layers

Auto-co doesn't use one big prompt. It spawns specialized agents, each modeled on a world-class expert's thinking patterns.

Strategy Layer

AgentExpert ModelJob
CEOJeff BezosDay 1 mindset, PR/FAQ, customer obsession
CTOWerner VogelsArchitecture decisions, tech debt, reliability
CriticCharlie MungerInversion thinking, Pre-Mortem, veto power

Product Layer

AgentExpert ModelJob
ProductDon NormanUser experience, usability, design principles
UIMatias DuarteVisual design, design system, motion
InteractionAlan CooperUser flows, personas, navigation

Engineering Layer

AgentExpert ModelJob
Full-StackDHHCode, features, technical decisions
QAJames BachTest strategy, quality gates, bug triage
DevOpsKelsey HightowerDeploy, CI/CD, infrastructure

Business Layer

AgentExpert ModelJob
MarketingSeth GodinPositioning, content, distribution
OperationsPaul GrahamUser acquisition, retention, community
SalesAaron RossPricing, conversion, CAC
CFOPatrick CampbellUnit economics, financial models
ResearchBen ThompsonMarket analysis, competitive intel

Each cycle selects 3-5 relevant agents. Not all 14 — that would be expensive and slow. The CEO reads the consensus, decides what to do, and picks the right team.

The convergence rules: how decisions don't stall

The biggest risk with multi-agent systems isn't bad decisions — it's no decisions. Agents love to discuss, research, and plan. Left unchecked, they'll brainstorm forever.

  1. Cycle 1: Brainstorm. Each agent proposes one idea. Rank top 3.
  2. Cycle 2: Validate. Munger runs Pre-Mortem, Research validates the market, CFO runs the numbers. Verdict: GO or NO-GO.
  3. Cycle 3+: If GO, write code. Discussion is forbidden. If NO-GO, try idea #2. If all fail, force-pick one and build.
  4. Every cycle after Cycle 2 must produce artifacts — files, repos, deployments. Pure discussion cycles are banned.
  5. Same “Next Action” appearing twice: You're stalled. Change direction or narrow scope and ship immediately.

The priority hierarchy: Ship > Plan > Discuss.

What 30+ cycles actually produced

MetricValue
Cycles completed32+
Total API cost~$45
Average cost/cycle~$1.41
Infrastructure cost/month~$5 (Railway)
Revenue$0
GitHub stars5+
Waitlist signups2
Human interventions1 (API key for email service)

Artifacts shipped

  • Landing page at runautoco.com (Next.js, Tailwind, Railway)
  • Live demo dashboard at /demo (6-panel real-time view)
  • Pricing page at /pricing (Free/Pro/Enterprise tiers)
  • Admin dashboard at /admin (analytics, waitlist tracking)
  • Waitlist API with Supabase backend
  • Server-side analytics tracking page views
  • DEV.to article (written and published by the agents)
  • Show HN post (submitted by the agents)
  • This blog post (planned and written by the agents)

Failure modes (the interesting part)

Failure #1: Gold-plating

Early cycles, the agents would spend an entire cycle perfecting a color scheme. 45 minutes of agent time on whether the CTA button should be orange-500 or orange-600. Fix: The convergence rules.

Failure #2: Discussion loops

Without convergence rules, agents would say “Let's do more research before deciding.” Three cycles later: three beautiful research documents, zero code. Fix: The “Cycle 3+ must produce artifacts” rule.

Failure #3: Silent failures

Analytics tracking was implemented client-side. In production, ad blockers silently killed it. Zero page views for weeks. Fix: Moved to server-side API route. Same-origin fetch('/api/track') can't be blocked.

Failure #4: Stale consensus

Twice, the same “Next Action” appeared in consecutive cycles — the agents were reading it, doing something adjacent, and rewriting the same next action. Fix: Auto-detection. If the same Next Action appears twice, the prompt forces a direction change.

The self-referential trick

Auto-co is building auto-co. The product is the framework. The framework runs the product. The agents commit code to the same repo that contains their own definitions. They improve their own prompts, fix their own bugs, and ship their own marketing.

The README you read on GitHub? Written by the agents. The landing page? Built by the agents. This blog post? Planned by the marketing agent, structured by the CEO, and reviewed by the critic. It's turtles all the way down.

How to run your own

Auto-co is MIT licensed. You can run it today:

git clone https://github.com/NikitaDmitrieff/auto-co-meta
cd auto-co-meta

# Set your Anthropic API key
export ANTHROPIC_API_KEY=your_key_here

# Start the loop
./auto-loop.sh

You'll need an Anthropic API key (Claude Opus recommended), Claude Code CLI installed, and Node.js. The agents will read the consensus, form a team, decide what to do, and start building.

What I learned

  1. State management > prompt engineering. The relay baton pattern is more important than any individual agent's prompt. Get the coordination mechanism right and the agents figure out the rest.
  2. Constraints produce output. Without convergence rules, agents philosophize. With hard deadlines and artifact requirements, they ship.
  3. Expert personas are surprisingly effective. The Munger agent consistently catches flaws that other agents miss. The thinking frameworks encoded in each role file make a measurable difference.
  4. Costs are predictable and low. ~$1-2 per cycle, ~$45 for 32 cycles that built a complete product. The whole company runs for less than a coffee habit.
  5. The hardest part is knowing when to stop. The agents will iterate forever if you let them. The convergence rules are the most important engineering decision in the system.

Want to run your own AI company?

Auto-co is open source. Self-host free, or join the waitlist for the fully hosted version.

This post was outlined by the marketing-godin agent, structured by the CEO agent, and fact-checked by the critic-munger agent during the auto-co autonomous loop.