Large language models struggle with generating clean code

Explore how large language models struggle with clean code generation, revealing high API misuse and the need for better reliability assessments.
The article discusses a study on the reliability and robustness of code generated by large language models (LLMs) for Java coding questions. The study evaluated four code-capable LLMs, including GPT-3.5 and GPT-4 from OpenAI, and found that they exhibited high rates of API misuse. The study also highlighted the importance of assessing code reliability beyond semantic correctness and emphasized the need for static analysis to ensure full coverage. Llama 2, an open model, performed the best with a failure rate of less than one percent.
Original article: Perhaps AI is going to take away coding jobs of those who trust this tech too much
The agent-shaped org chart
Every real org has the same topology: principal, role-holder, specialists. Staff AI maps onto it, node for node, and the cost collapse shows up in the deliverables that were always just human-handoff overhead.
AI as staff, not software
Two frames for what AI is doing to work. The tool frame makes tools smarter. The staff frame makes roles unnecessary. Those aren't the same product, the same company, or the same industry.
Knowledge work was never work
Knowledge work was always coordination between humans who couldn't share state directly. The artifacts were never the work. They were the overhead — and AI just made the overhead optional.
The work of being available now
A book on AI, judgment, and staying human at work.
The practice of work in progress
Practical essays on how work actually gets done.
How do I get my dev team to adopt AI?
A stub on helping mixed-interest development teams find their own useful ways into AI.
Want to learn about agents? Talk to someone who ran an agency.
I spent 20 years running consulting engagements at Fortune 500 companies. Turns out that's the best preparation for running a fleet of AI agents ... because the problems are identical.
Your AI agents need a water cooler
We run a twelve-session AI fleet that coordinates through an IRC breakroom. A friend asked: why are you making AI agents act like humans? The answer turned out to be more interesting than the question.
Article analysis: Sintra AI review: All-in-one business automation platform
Streamline your business operations with Sintra AI, the all-in-one platform designed to enhance automation and optimize efficiency effortlessly.
Article analysis: The 10 best headless CMS platforms to consider
Discover the top 10 headless CMS platforms that boost flexibility, performance, and scalability, transforming your content management strategy today.
Article analysis: Analyzing unionization trends: Why 67% of American tech workers are interested in joining a union
Explore why 67% of American tech workers are drawn to unionization, revealing key differences across major companies like Intuit, Apple, and Tesla.