Skip to main content
Paul Welty, PhD AI, WORK, AND STAYING HUMAN

· found · innovation · ruby-on-rails · technology

Llama 2 avoids errors by staying quiet, GPT-4 gives long, if useless, samples

Llama 2 avoids errors by staying quiet, GPT-4 gives long, if useless, samples

Discover how Llama 2 outperforms GPT-4 in generating reliable code, revealing crucial insights on the effectiveness of large language models.

The article discusses a study conducted by computer scientists at the University of California San Diego on the reliability and robustness of large language models (LLMs) in generating code. The researchers evaluated four different code-capable LLMs using an API checker called RobustAPI. They gathered 1,208 coding questions from StackOverflow involving 24 common Java APIs and tested the LLMs with three different types of questions. The results showed that the LLMs had high rates of API misuse, with GPT-3.5 and GPT-4 from OpenAI exhibiting the highest failure rates. However, Meta’s Llama 2 performed exceptionally well, with a failure rate of less than one percent. The study highlights the importance of assessing code reliability and the need for improvement in large language models’ ability to generate clean code.

https://www.theregister.com/2023/08/29/ai_models_coding/

The agent-shaped org chart

Every real org has the same topology: principal, role-holder, specialists. Staff AI maps onto it, node for node, and the cost collapse shows up in the deliverables that were always just human-handoff overhead.

AI as staff, not software

Two frames for what AI is doing to work. The tool frame makes tools smarter. The staff frame makes roles unnecessary. Those aren't the same product, the same company, or the same industry.

Knowledge work was never work

Knowledge work was always coordination between humans who couldn't share state directly. The artifacts were never the work. They were the overhead — and AI just made the overhead optional.

The work of being available now

A book on AI, judgment, and staying human at work.

The practice of work in progress

Practical essays on how work actually gets done.

How do I get my dev team to adopt AI?

A stub on helping mixed-interest development teams find their own useful ways into AI.

Want to learn about agents? Talk to someone who ran an agency.

I spent 20 years running consulting engagements at Fortune 500 companies. Turns out that's the best preparation for running a fleet of AI agents ... because the problems are identical.

Your AI agents need a water cooler

We run a twelve-session AI fleet that coordinates through an IRC breakroom. A friend asked: why are you making AI agents act like humans? The answer turned out to be more interesting than the question.

It’s going to take a century for artifical intelligence to be able to perform most human jobs. But there are going to be some key developments during the next decade.

Explore how AI will transform jobs in the next decade, from enhancing security to automating coding, reshaping the future of work.

Many businesses are not yet prepared to fully reap the benefits of AI.

Unlock AI's true potential for your business by integrating it into your strategy, boosting productivity, and enhancing customer experiences.

Rose-tinted predictions for artificial intelligence’s grand achievements will be swept aside by underwhelming performance and dangerous results.

Explore the reality of generative AI in 2024 as hype fades, revealing limitations, job displacement, and the need for regulation.