The weight of AI on engineers

Accountability has a weight limit

You can't fire the AI. If the AI deletes all your business-critical data, you can't fire the AI. If an AI controlled missile launcher accidentally shoots an ally, you can't send the AI to prison. AI can act, but only humans can be accountable.

As a rhetorical device for this post, let's assume that a single engineer in the old days could complete one story a day, but now they can, and are expected to do 10; because AI makes writing code and building systems 10x faster, right?

Of course, measuring productivity by lines of code per day is stupid and flawed, and user stories have a similar problem. A user story should be valuable, but it might not be. Sometimes our knowledge can be imperfect, and not quite what the user wants, or they could be catastrophically wrong. (See: the $327m Mars Climate Orbiter, lost because one team used metric units and another used imperial.)

Is it realistic though for an engineer to be accountable for 5-10x the amount of change they were dealing with before? How can they possibly understand 10 user stories, and the code that was written to fulfil the requirements. Does that sound realistic?

The AI maximalists argue you no longer need to understand the code, just focus on the outcomes and let the AI deal with that weird code stuff. But nonetheless, whoever is accountable has to understand how the system will behave; and that's not abstract: would you really be comfortable taking accountability for something like the Therac-25 (a radiation therapy machine whose software bugs delivered fatal overdoses to patients) if it was generated entirely from a prompt, without understanding medicine?

We've been solving the wrong problem

The code was never the bottleneck. Not a new take, I appreciate. Pedro Tavares put it well recently:

The marginal cost of adding new software is approaching zero, especially with LLMs. But what is the price of understanding, testing, and trusting that code? Higher than ever.

Now people believe reviewing is the bottleneck. What can we do? Every day stories of teams with 10s, 100s of pull requests stuck in review.

In truth, reviewing was a bottleneck for teams before the AI boom, they just didn't realise it. Many companies that adopt a pull-request model with individual contributors created a bottleneck, but it was one of their own doing.

To deal with this, techniques from extreme programming (XP) are having somewhat of a renaissance, even if people don't want to admit it. XP Explained was published in 1999.

Hey everyone! I've invented spec-driven development, totally a new thing, buy my book!

— AI influencer, 2026

These techniques will help, breaking problems down into smaller pieces, encoding small behaviour changes in automated tests reduces risk and cognitive load, but it still feels like it's attacking the wrong problem, it's treating a symptom, not the cause. Again, can a human realistically, deeply understand 10, albeit smaller problems every day?

Presence in the problem

My teams do pair programming. Where review is an ongoing part of the process. Two of you are in the context together. You might do an informal review together before committing, but it is nothing like the pull request model where you must read someone else's git diffs, out of context, when you have a spare 30 minutes. So, for my teams reviewing wasn't a bottleneck.

Two humans pairing with an AI and watching what it does, does not feel like pair programming, I think we might look back on this as a daft moment. Unthinkingly using the same tool or technique in a different context is often a bad decision. With pairing, it's like you've introduced the worst part of pair programming; the demanding, present, high-energy way of work, with the boredom and difficulty of looking at git diffs written by someone else.

When you're pairing, or if you're writing code by yourself the engineer's presence is in the problem, not just the solution. The code is a manifestation of your business domain, it represents your business, and how it should behave. When you're writing code, you're trying to add capability to the business, or change how it works. Doing this, helps you understand it, but now with AI writing the code, the human is taken out of this process.

There's a pattern every engineer recognises. A stakeholder comes in confident: "it's simple, I know exactly how it should work." And on the surface, it is simple. But the moment you start building, the gaps appear. What happens when the user has two accounts? What does "current" mean when the time zone changes? These aren't edge cases, they're the domain, and they only reveal themselves under the pressure of implementation. This is the engineering. Not the code, the thinking. The code is just what's left behind when the thinking is done. Coding agents don't help here, if anything they let you paper over the gaps faster.

Review is still necessary. The question is what good review looks like when the code wasn't written by a human. Reading git diffs was always a blunt instrument, but we tolerated it because the human who wrote the code had already done the hard cognitive work; the diff was just a record of their thinking. When an AI writes the code, that thinking never happened on the human side. The diff is all there is, and it tells you surprisingly little about whether the system now behaves correctly.

diff + const r = await db.query(`SELECT * FROM orders o LEFT JOIN users u ON + u.id = o.user_id LEFT JOIN products p ON p.id = o.product_id WHERE + o.status NOT IN ('cancelled','refunded') AND (u.flags & 0x4) = 0 + AND p.region = $1 GROUP BY o.id, u.id HAVING sum(o.qty * p.price) + > $2 ORDER BY o.created_at DESC`, [ctx.region, threshold]) + if (r.rows.length) await notify(r.rows.map(x => transform(x, opts)))

LGTM

Engineers don't think in text files. We build mental models of flows, of state, of how components talk to each other. Maybe the right review primitive isn't a diff at all. Sequence diagrams, visual system maps, behaviour traces -- tools that show you what the system does rather than what the code says. This feels like an open problem the industry hasn't seriously tackled yet, and I suspect whoever cracks it will matter more to the future of AI-assisted engineering than whatever coding agent is flavour of the month.

Left of dev

So, if the AI has freed up time, what's it freed for? I think, given the cognitive burden, perhaps churning out more stories per day is not the right approach. How about the engineers focus on making stories valuable in the first place. The stuff that happens "left of dev". Sitting with users and watching them use your product, taking part in research, spending more time with stakeholders and so on.

Perhaps the real unlock of AI isn't this "10x" thing, it's that engineers finally have time to deeply understand what they're building, and why. That may result in some real productivity gains, more valuable stories gained from deeper insights; and it helps the people working on these systems feel more comfortable with the accountability on their shoulders.