Want to learn about agents? Talk to someone who ran an agency.

I spent 20 years running consulting engagements at Fortune 500 companies. Turns out that's the best preparation for running a fleet of AI agents ... because the problems are identical.

Yesterday, autonomous AI agents closed 63 issues across six of my projects. They found a billing integration that silently ignored database failures. A tier-gating bypass that let free users access paid features. A podcast feed that had been serving broken URLs for every episode — all 55 of them — for weeks. They wrote 229 tests. They fixed security vulnerabilities. They generated a podcast and published it to my blog.

I didn’t touch a keyboard for most of it.

This isn’t a demo. I’ve been running this system in production for months. Fifteen tmux sessions. Thirteen active projects. A fleet of AI agents that don’t just execute tasks but cooperate, specialize, delegate to each other, and hold each other’s work to account.

The architecture that makes it possible didn’t come from an AI research paper. It came from twenty years of running consulting engagements.

How agencies actually work

I spent two decades running organizational change projects at Fortune 500 companies — Disney, Delta, Home Depot, Wells Fargo. Not writing code. Managing teams of specialists who had to produce coordinated output under constraints.

Here’s what you learn running an agency:

Agencies organize around competencies, not clients. This is the thing everyone outside the industry gets wrong. They assume agencies assign dedicated teams to each client. Some do, for their biggest accounts. But that’s the expensive, unscalable model. Most agencies organize by what people know how to do: strategy, research, design, content, QA. Each competency is a team. Each team serves every client.

The client engagement gets a project manager. The PM is the only person bound to the client. Everyone else rotates. The strategy team works on Delta this week and Disney next week. The competency is persistent. The assignment is temporary.

The brief is the interface. When a strategist hands work to a writer, she doesn’t say “write something good.” She hands over a document: here’s the audience, here’s the objective, here’s the constraint, here’s what done looks like. The quality of the output is almost entirely determined by the quality of the brief.

Work moves through explicit stages. Every deliverable goes through research, draft, review, revision, approval. These stages aren’t suggestions — they’re gates. Work doesn’t advance until someone with the right authority says it’s ready.

Review is adversarial, not ceremonial. In a good agency, QA tries to break the work. The reviewer asks “does this actually solve the problem?” not “does this look professional?” The whole point of separate roles is that the person checking the work has different incentives than the person who produced it.

Every AI agent framework I’ve seen ignores all of this. They build a single agent with a big prompt and a pile of tools — the equivalent of one person who is simultaneously the strategist, researcher, writer, designer, QA, and project manager. That’s not an agency. That’s a freelancer having a breakdown.

The architecture: an operating system for AI work

The part that surprises people: the coordination layer isn’t a framework. It isn’t a message queue or an orchestration service. It’s tmux.

Each project runs in its own tmux session. Each session has a Claude Code instance with full access to the project’s codebase, issue tracker, and tools. An orchestration loop — a plain Python process — cycles through the sessions every ten minutes, checking what needs to be done and injecting work.

The orchestrator doesn’t do the work. It dispatches. The agent in the session does the work. When it’s done, it goes idle. The orchestrator picks up the next task.

This is the operating system pattern. The orchestrator is the scheduler. Tmux sessions are processes. The issue tracker is the job queue. Skills are the system calls. The filesystem and git are shared memory.

Every agent framework reinvents process management, state tracking, tool routing, and coordination. They’re building applications without an operating system. What I’ve built is closer to the OS layer — which means any agent can run on top of it.

Skills are briefs, not instructions

The agents don’t receive detailed instructions. They receive skills — concise descriptions of intent and constraints that let the AI choose its own tools and approach.

A skill looks like this:

---
name: scout
description: Explore the codebase and create issues for what you find
---

Systematically explore the project's codebase, find improvements at
every scale, and create GitHub issues for them — from quick fixes
to missing features to architectural concerns.

The skill doesn’t specify which files to read, which patterns to use, or how to format the issues. The agent decides. Same principle as the brief: define what and why, let the specialist figure out how.

Different agents running the same skill make different choices based on context. The scout running in a Next.js platform looks for different things than the scout running in a Stellaris mod generator. Same intent, different execution.

Work moves through explicit states

Every issue in the system has exactly one routing label at any time:

State	Who acts	What happens
`ready-for-prep`	Planner	Read the issue, explore the code, write a concrete spec
`ready-for-dev`	Builder	Implement on a feature branch, write tests, self-review
QA	Adversary	Try to break what was built
Review	Reviewer	Check against the spec and product requirements

The orchestrator doesn’t decide what to do. The state tells it. Same principle that makes CI/CD pipelines reliable — the system encodes the workflow, not the operator.

The orchestrator performs one action per cycle, then stops. Prep one issue. Execute one task. Run one scout. Record the result. Wait for the next cycle. This creates a complete audit trail and means the system degrades gracefully — no single failure cascades.

The agents cooperate — laterally

The interesting part isn’t parallel execution. Lots of systems run agents in parallel. The interesting part is that the agents talk to each other directly — not through a central controller.

Most multi-agent frameworks are hub-and-spoke. The orchestrator knows everything. Agents know nothing except their current task. When an agent needs something, it asks the orchestrator. The orchestrator decides. Everything routes up and back down. It’s the org chart lines — the formal reporting structure — with nothing else.

What’s missing is what actually makes an agency work: lateral communication. The copywriter calls the strategist directly when she needs the rationale behind a positioning decision. The designer pings the researcher when she needs data. The PM doesn’t route every conversation. The specialists talk to each other all day, and that informal communication is where the real coordination happens — faster, richer, and better than anything that goes up and down through a central controller.

My system works the same way. Agents initiate contact with other agents based on need — not because the orchestrator told them to.

One session is a knowledge proxy — a Claude Code instance loaded with my full corpus: 644 blog posts, a published book, project docs, voice guidelines, strategy documents. When another agent hits a question about positioning, voice, or prior decisions, it sends the question directly to the knowledge proxy. The proxy researches it, composes an answer with a confidence level, and sends it back. The orchestrator never touches that exchange.

A sweep agent walks the entire fleet every five minutes, capturing each session’s state. If it finds an agent stuck on a question it can answer, it answers. No ticket. No escalation. No orchestrator in the loop.

Four agents, three sessions, zero human intervention: the close skill in one session writes a work log, pushes it to git. A synthesis agent reads all the logs, writes a cross-project summary. A reflect skill reads the summary and generates a podcast script. That script goes through a humanizer pass before being sent to ElevenLabs for audio, uploaded to Cloudinary, and published to the blog. Each agent initiates the next handoff. The orchestrator’s only job is to make sure nobody’s idle.

This is what the agency model looks like when you take it seriously. Not just specialized roles — but roles that communicate with each other the way specialists actually do. Directly. On demand. Without asking permission.

What the agents actually find

In the past 48 hours, agents running autonomously found:

A BYOK gating bypass letting free-tier users access paid features in a production SaaS app
A variable name mismatch in a security-relevant text filter causing script tag content to leak through
A Hugo template scoping bug silently breaking every podcast URL for weeks
A billing webhook accepting Stripe’s response without checking whether the database actually updated

These aren’t formatting fixes. These are the kinds of bugs that surface as customer complaints in most organizations. The agents found them because they were systematic and patient in a way humans can’t sustain. They checked every file, validated every assumption, tested every path. Across 13 projects, every day.

What’s left for the human

In practice: taste decisions, architecture choices, strategic direction, and accountability.

The humanizer I built strips AI writing patterns from generated content. An agent wrote the humanizer. But the decision about what sounds human was mine. The agents can triage and prioritize within constraints, but setting the constraints is human work. When the billing webhook bug was found, someone had to decide whether it was urgent enough to fix immediately or could wait.

The territory of “things that require taste” turns out to be smaller than most managers think. And it’s shrinking.

The question I didn’t expect

I organized my system by project. One session per codebase. The Authexis agent knows everything about Authexis. The Eclectis agent knows everything about Eclectis. Each one is deeply expert in its domain.

But half my projects aren’t code. I’m running a course launch, a weekly newsletter, a prospect outreach pipeline, a content publishing operation. These don’t have codebases. They have documents, strategies, and domain knowledge. And the project-per-session model doesn’t fit them well.

Here’s what I noticed. For software projects, the codebase is the context. You need a session that knows the repository intimately — the test patterns, the architecture decisions, the naming conventions. Losing that between runs is expensive.

For non-developer work, the domain knowledge is portable. A writer who just drafted a newsletter can draft a course email five minutes later. A researcher doesn’t care whether she’s analyzing competitors for a SaaS product or pricing for a workshop. The function is the persistent context, not the project.

Which means the right model for non-developer work isn’t one agent per project. It’s one agent per role — and a manager that knows what to assign them.

I already have proof this works. The knowledge proxy has been running for weeks, answering questions about positioning, voice, and strategy from completely different projects. It doesn’t know any one project deeply. It knows its job deeply. And that turns out to be enough.

Paradigm	Session lifetime	Context	Who does it
Ephemeral swarm	Minutes	None	Everyone (CrewAI, Swarm, LangGraph)
Long-lived project	Days/weeks	One codebase	My current dev system
Long-lived role	Days/weeks	One function, all projects	What I’m building toward

Most people treat the project as context and the role as a skill — a prompt label pasted in at the top. “You are a QA engineer.” The project is what persists. The role is what’s cheap.

Flip it. Make the role the context — accumulated expertise, judgment, identity that persists across assignments. Make the project the skill — a brief, handed in, executed, handed back. The role is the expensive thing to build. It should persist. The assignment is the cheap thing to hand over. It should be ephemeral.

Cheaper, because you’re not reloading an entire domain into every agent on every run. Better, because a role-context agent brings accumulated judgment to each new assignment. Faster, because a brief is a fraction of the cost of rebuilding project context from scratch.

The project model builds deep vertical knowledge. The role model builds deep horizontal knowledge. For software, you want vertical. For everything else — content, strategy, operations, consulting — you want horizontal.

This is the thing the agencies figured out decades ago. You don’t hire a dedicated strategist for each client. You hire one great strategist and give her ten clients. The competency persists. The assignment rotates.

I’m building the AI version. Not because it’s theoretically elegant. Because it’s the only model that scales.

Want to learn about agents? Talk to someone who ran an agency.

How agencies actually work

The architecture: an operating system for AI work

Skills are briefs, not instructions

Work moves through explicit states

The agents cooperate — laterally

What the agents actually find

What’s left for the human

The question I didn’t expect

Nobody takes you aside anymore

Your AI agents need a water cooler

On the death of the author and the birth of the detector

The work of being available now

The practice of work in progress

The questions your faculty information system cannot answer

Systems Owe Evidence. People Do Not.

Memory is (almost) solved. time is next.

AI agents need org charts, not pipelines

Your best people were always better than you knew

True 1-to-1 outreach is finally possible with AI