This post is based on How AI Impacts Skill Formation by Judy Hanwen Shen and Alex Tamkin at Anthropic, and the accompanying blog post.

The Headline That Writes Itself

“AI makes coders dumber.” You can practically hear the hot takes forming. And on the surface, the numbers seem to support it: in a randomized controlled trial, developers who used AI assistance scored 17% lower on a mastery quiz compared to those who coded by hand. That’s the equivalent of nearly two letter grades.

But here’s the thing about research: the headline is rarely the full story. Anthropic’s study is more interesting, more nuanced, and more useful than a simple “AI bad for learning” conclusion. Let me break it down.

The Study: What They Actually Did

The researchers recruited 52 software engineers, mostly junior, who had at least a year of weekly Python experience. Crucially, none of them were familiar with Trio, a relatively obscure Python library for asynchronous concurrency. This was intentional: they wanted to measure how well people learn something new with and without AI, not how fast they bang out code they already know.

The study had three phases:

  1. Warm-up: to get everyone comfortable with the environment
  2. Main task: two coding exercises using Trio (building a concurrent timer and implementing error-handling with memory channels)
  3. Quiz: 14 questions covering debugging, code reading, and conceptual understanding of the Trio concepts they’d just used

Participants were randomly assigned to either an AI group (with access to a GPT-4o-powered coding assistant) or a control group (no AI, just documentation and web search). Everyone was told a quiz was coming, but encouraged to work quickly.

The Numbers

Metric AI Group Control Group Difference
Average quiz score 50% 67% 17% lower (p=0.01)
Average completion time ~2 min faster Not statistically significant
Task completion rate 100% 85% AI group all finished

The quiz score difference is statistically significant with a medium-to-large effect size (Cohen’s d = 0.738). The completion time difference? Not significant. This is an important point: AI didn’t even make them meaningfully faster in this context, likely because participants spent substantial time composing queries, thinking about what to ask, and reading AI responses.

Some participants spent up to 11 minutes, 30% of the total allotted time, just interacting with the AI assistant. So the time “saved” by not writing code was partially eaten up by the time spent managing the AI conversation.

The Biggest Gap: Debugging

When the researchers broke down quiz scores by question type, the largest gap was in debugging questions. This makes intuitive sense: the control group encountered more errors during the task and had to figure them out independently. The AI group had fewer errors because the AI’s code was mostly correct, meaning they never developed the mental models for what goes wrong and why.

This is probably the most concerning finding for professional software engineering. If you’re building the future workforce to oversee AI-generated code, they need to be great at spotting bugs. And the skill of debugging is apparently best developed by… actually debugging. (I dig into why measuring debugging this way might be incomplete in my critical analysis of the study.)

Here’s Where It Gets Interesting: The Six Interaction Patterns

The study’s qualitative analysis is where the real gold is. The researchers watched screen recordings of every participant and identified six distinct patterns of AI interaction, which split cleanly into two groups:

Low-Scoring Patterns (avg quiz score < 40%)

Pattern Behavior Speed Learning
AI Delegation (n=4) Asked AI to write everything. Zero independent coding. Fastest Worst
Progressive AI Reliance (n=4) Started coding independently but gradually surrendered to AI. Moderate Poor
Iterative AI Debugging (n=4) Wrote some code, but pasted every error back to AI instead of thinking about it. Slowest Poor

High-Scoring Patterns (avg quiz score ≥ 65%)

Pattern Behavior Speed Learning
Generation-Then-Comprehension (n=2) Let AI generate code, then asked follow-up questions to understand it. Moderate Good
Hybrid Code-Explanation (n=3) Asked AI for code and explanations in the same query. Read the explanations. Moderate Good
Conceptual Inquiry (n=7) Only asked conceptual questions. Wrote all code independently. 2nd fastest Best

This is the key insight: not all AI usage is equally harmful to learning. The developers who used AI as a thinking partner, asking “why?” and “how does this work?” rather than “write this for me”, retained nearly as much knowledge as the no-AI group.

The Conceptual Inquiry group is particularly fascinating. They were the second-fastest group overall (behind only the full delegators) and scored the highest. They essentially used AI as a more responsive version of documentation, asking questions to build understanding, then coding it themselves. They encountered plenty of errors, but that was the point.

What This Means (And Doesn’t Mean)

It doesn’t mean “stop using AI”

The study measured skill acquisition: learning something new. It explicitly does not claim that AI hurts productivity when you already have the relevant skills. Anthropic’s own earlier research showed AI can speed up well-understood tasks by up to 80%. The two findings aren’t contradictory; they address different scenarios.

It does mean “how you use AI matters enormously”

The spread between the worst AI interaction pattern (Delegation, ~24% quiz score) and the best (Conceptual Inquiry, ~86% quiz score) is enormous. Both groups had access to the exact same AI tool. The difference was entirely in intent: whether the developer approached AI as a replacement for thinking or as a supplement to it.

The real-world tension is organizational

This is where it gets uncomfortable. Junior developers face real pressure to ship fast. The study found that full AI delegation was the fastest approach. If your manager is measuring velocity and you’re under deadline pressure, the rational individual choice is to delegate everything to AI, which happens to be the worst approach for skill development.

This creates a subtle organizational problem: the practices that maximize short-term output may be systematically undermining the long-term capability of your engineering team. If you’re a manager grappling with this, I’ve written up concrete strategies for balancing AI productivity with skill growth.

One More Thing: Participant Self-Awareness

One of the most human details in the paper is the participant feedback. After the quiz, several developers in the AI group independently reported feeling “lazy” and acknowledged “there are still a lot of gaps in my understanding.” They knew they hadn’t really learned the material. The control group, by contrast, reported finding the task more enjoyable and felt they’d learned more.

There’s something almost poignant about that: the AI group got through the work faster and with less friction, but walked away feeling less accomplished. The control group struggled more, but emerged feeling like they’d actually grown.

The Bottom Line

Anthropic’s study is a carefully designed piece of evidence pointing to a real tension in AI-assisted work: the things that make you productive today can undermine the skills you’ll need tomorrow. But it also shows a path forward: AI can be a powerful learning tool if you approach it with the right mindset.

The question isn’t whether to use AI. It’s whether you’re using it as a crutch or as a sparring partner. The data suggests the difference between those two approaches is about two letter grades. If you want to go deeper on the study’s methodology and limitations, I wrote a critical analysis examining what it gets right and where it falls short. And if you want practical advice you can apply immediately, see How to Use AI Without Losing Your Edge.

For more details, read the full paper or Anthropic’s blog post.