Skip to main content
Paul Welty, PhD AI, WORK, AND STAYING HUMAN

· development

The delegation problem nobody talks about

When your automated systems start finding real bugs instead of formatting issues, delegation has crossed a line most managers never see coming.

Duration: 7:23 | Size: 6.8 MB

Most conversations about delegation end at trust. Can you trust this person — or this system — to do the work? That’s the wrong question. What happens when the delegate becomes better at finding problems than you are?

Sixty-three issues closed across six projects today. Not by a team of developers. By autonomous agents running scout-triage-prep-exec pipelines, looping through codebases, finding problems, writing specs, implementing fixes, and merging them. One platform had its biggest day ever: nineteen pull requests merged without a human touching a keyboard. The volume isn’t the interesting part. What they found is.

A billing integration that silently ignored database failures. A tier-gating bypass that let free users access paid features. A variable name mismatch that caused security-relevant content to leak through a text filter. A podcast feed that had been serving broken URLs for every single episode — fifty-five of them — for weeks. Nobody noticed because nobody was listening to the output. The infrastructure looked fine. The dashboard said green. The function was zero.

These aren’t formatting fixes or lint warnings. These are the kinds of bugs that, in a traditional organization, would surface as customer complaints or security incidents. They were found by systems that were told to look, given the tools to look properly, and left alone to do it.

There’s a management concept called the “Peter Principle” — people rise to their level of incompetence. There’s a delegation equivalent that nobody talks about: systems rise to their level of actual utility, and the moment they cross from “useful assistant” to “finding things you missed,” the relationship changes. You’re no longer delegating tasks you don’t have time for. You’re being audited by something with more patience than you have.

I spent most of today on things machines can’t do. Designing a humanizer, a system that strips AI writing patterns from generated content. Deciding what a newsletter list should look like. Choosing how to frame a blog post’s subject line for business professionals instead of developers. Approving content. Making taste calls. The machines handled two hundred and twenty-nine new tests, input validation across three application layers, parallel performance optimizations, and security patches. I chose fonts and wrote prompts about em dashes.

This split happened naturally, not by design. Nobody scheduled “human does taste work, machines do volume work.” The machines just took the volume because they could, and what was left was the stuff that requires judgment about judgment — not “is this correct?” but “is this the right thing to be correct about?”

Organizations are going to hit this split whether they plan for it or not. The question is whether they recognize it. Most won’t. Most will keep assigning humans to do the work machines already did better, because the org chart says that’s someone’s job. The job title says “quality assurance” and so a human does quality assurance, even when the automated system already found bugs the human would have missed.

A pattern I noticed today that I haven’t seen discussed anywhere: multiple projects simultaneously ran out of things to build. One platform closed all four of its milestones. A game modding tool finished its v1. An infrastructure project cleared its entire issue queue. A consulting site closed its fifty-first issue across five audit passes. Done. Nothing left.

The immediate reaction in each case wasn’t satisfaction. It was disorientation. What now? The infrastructure of development — issue trackers, milestones, CI pipelines, sprint ceremonies — assumes there’s always more. When you finish, the infrastructure itself becomes decorative. You have a project board with nothing on it. A CI pipeline testing code that isn’t changing. Standup meetings with nothing to stand up about.

Most organizations never plan for completion. They plan for failure, delays, scope creep, technical debt. They never plan for “it works and there’s nothing else to add.” The possibility that a product might be done, actually done, not “done for now,” is almost inconceivable in a culture that equates activity with value.

I think this reveals something about how we structure work. The tools we build to manage work assume work is infinite. Jira has no “finished” state for a project. GitHub milestones can be closed, but the repo stays open. The entire apparatus of project management is built on the premise that there’s always a next sprint. When there isn’t, people get anxious. They create work to fill the vacuum. They call it “tech debt” or “polish” or “future-proofing.” Sometimes it is those things. Sometimes it’s just organizational discomfort with stillness.

The testing story today shows something adjacent. A test suite grew from one thousand to nearly eleven hundred tests, and in the process of writing those tests, the agents found real bugs. Not theoretical edge cases. A regex that used the wrong variable name, silently passing the wrong content through a security filter. A billing webhook that accepted Stripe’s response and then didn’t check whether the database actually updated. These bugs were invisible to the systems that were supposed to catch them, because the systems that were supposed to catch them didn’t exist yet.

Testing, in this context, isn’t verification. It’s archaeology. You’re not confirming that something works. You’re discovering what was quietly broken while everyone assumed it was fine. The difference matters because it changes who should write tests and why. If testing is verification, it’s a chore — confirm the thing you already know works. If testing is discovery, it’s the most valuable engineering activity you can do, because it’s the only one that reliably surfaces the problems nobody knows about.

A podcast feed had been broken since a Hugo template change introduced a scoping bug. Every episode URL rendered as a blank string. The feed was generated on schedule. The blog was deployed on time. The CI pipeline passed. The only thing that didn’t work was the actual content, the thing the feed existed to deliver. Nobody noticed because nobody was consuming the output through the feed. They were consuming it through the website, where the URLs worked fine. The feed was decorative infrastructure: present, validated, completely useless.

This is the third day in a row I’ve encountered this pattern — infrastructure that looks structural but carries no load. Yesterday’s blog post was about it explicitly. But today’s version has a twist: the things that exposed the decorative infrastructure weren’t humans reviewing systems. They were automated agents running tests and scouts. The machines found what the humans missed, not because the machines are smarter, but because the machines were patient enough to check.

Patience is underrated as a competitive advantage. Not the motivational poster kind. The mechanical kind. The willingness to run through every file, check every field, validate every assumption, across every project, every day. Humans can’t sustain that. They skip things. They assume. They trust that the podcast feed works because it worked last month. Machines don’t skip, don’t assume, and don’t trust. They just check.

The delegation question I started with — what happens when the delegate is better at finding problems than you are — has an uncomfortable answer. You stop being the quality gate. You become the person who decides what quality means, and then you step back and let something more patient than you enforce it. Your jurisdiction shrinks from “everything” to “the things that require taste.” And taste, it turns out, is a much smaller territory than most managers think.

When was the last time you checked whether the infrastructure you’re most proud of actually does what you think it does? Not whether it exists. Not whether the dashboard says it’s running. Whether the output is correct. Whether anyone is consuming it. Whether the function matches the structure.

The machines checked today. You might not like what they found.

Why customer tools are organized wrong

This article reveals a fundamental flaw in how customer support tools are designed—organizing by interaction type instead of by customer—and explains why this fragmentation wastes time and obscures the full picture you need to help users effectively.

Infrastructure shapes thought

The tools you build determine what kinds of thinking become possible. On infrastructure, friction, and building deliberately for thought rather than just throughput.

Server-side dashboard architecture: Why moving data fetching off the browser changes everything

How choosing server-side rendering solved security, CORS, and credential management problems I didn't know I had.

The work of being available now

A book on AI, judgment, and staying human at work.

The practice of work in progress

Practical essays on how work actually gets done.

The inbox nobody reads is the one that matters

Every organization has a monitoring system that works perfectly and reports to nobody. The gap between having information and acting on it is where most failures actually live.

The best customers are the first ones you turn against

Every subscription makes a bet that most customers won't use what they're paying for. The customer who closes that gap becomes a problem to be managed.

Delegation without comprehension is just prayer

The organizations that survive won't be the ones that automated the most. They'll be the ones that figured out what to stop delegating.

The best customers are the first ones you turn against

Every subscription makes a bet that most customers won't use what they're paying for. The customer who closes that gap becomes a problem to be managed.

Delegation without comprehension is just prayer

The organizations that survive won't be the ones that automated the most. They'll be the ones that figured out what to stop delegating.

The case for corporate amnesia

Most organizations worship institutional memory. But what if the thing they're preserving is mostly decay?