Large language models struggle with generating clean code

Explore how large language models struggle with clean code generation, revealing high API misuse and the need for better reliability assessments.

The article discusses a study on the reliability and robustness of code generated by large language models (LLMs) for Java coding questions. The study evaluated four code-capable LLMs, including GPT-3.5 and GPT-4 from OpenAI, and found that they exhibited high rates of API misuse. The study also highlighted the importance of assessing code reliability beyond semantic correctness and emphasized the need for static analysis to ensure full coverage. Llama 2, an open model, performed the best with a failure rate of less than one percent.

Original article: Perhaps AI is going to take away coding jobs of those who trust this tech too much

Featured writing

Why customer tools are organized wrong

This article reveals a fundamental flaw in how customer support tools are designed—organizing by interaction type instead of by customer—and explains why this fragmentation wastes time and obscures the full picture you need to help users effectively.

Infrastructure shapes thought

The tools you build determine what kinds of thinking become possible. On infrastructure, friction, and building deliberately for thought rather than just throughput.

Server-Side Dashboard Architecture: Why Moving Data Fetching Off the Browser Changes Everything

How choosing server-side rendering solved security, CORS, and credential management problems I didn't know I had.

Books

The Work of Being (in progress)

A book on AI, judgment, and staying human at work.

The Practice of Work (in progress)

Practical essays on how work actually gets done.

Recent writing

We always panic about new tools (and we're always wrong)

Every time a new tool emerges for making or manipulating symbols, we panic. The pattern is so consistent it's almost embarrassing. Here's what happened each time.

Dev reflection - February 03, 2026

I've been thinking about constraints today. Not the kind that block you—the kind that clarify. There's a difference, and most people miss it.

When execution becomes cheap, ideas become expensive

This article reveals a fundamental shift in how organizations operate: as AI makes execution nearly instantaneous, the bottleneck has moved from implementation to decision-making. Understanding this transition is critical for anyone leading teams or making strategic choices in an AI-enabled world.

View all writing →