Somewhere right now, a founder is on slide 14. Bar chart. Engineering team shipping three times as many pull requests since rolling out AI coding tools. The room nods. Someone writes it down. The metric lands in the next investor update.
Nobody asks what’s in the pull requests.
This is the quiet problem spreading through teams that have adopted AI tools at pace. The numbers look good. The dashboards are green. And underneath it all, a slow degradation in the quality of work that won’t show up in the metrics until it’s already done real damage. We’ve built an entire narrative around AI productivity that is, in large part, a story teams are telling themselves because the alternative is awkward.
The Numbers Game
The most commonly cited AI productivity gains tend to cluster around the same handful of metrics: lines of code written, tasks closed, time-to-first-draft, tickets resolved per week. These are volume metrics. They measure how much came out, not whether any of it was worth producing.
It’s worth being specific about why this matters. A developer using an AI tool can generate a working-ish function in four minutes that would have taken forty. That’s real. But “working-ish” is doing a lot of heavy lifting in that sentence. The code compiles. The tests pass, because the tests were also written by the same tool, against the same assumptions, with the same blind spots baked in. Three weeks later, an edge case surfaces in production. The rework takes longer than the original task would have.
This pattern is showing up across functions, not just engineering. Marketing teams report producing more content than ever. Sales teams are sending more outreach. Support teams are closing more tickets per hour. In almost none of these cases is there a robust measurement of whether the output is better, or whether customers are responding to it differently, or whether the closed ticket actually resolved the underlying issue. The volume went up. That got reported. The story stuck.
Why Leaders Let It Happen
Nobody is setting out to deceive anyone here. The incentive structures do it quietly, without anyone having to make a deliberate choice.
Founders are under real pressure to demonstrate that their AI investments are paying off. Boards want to see it. Investors want to quote it in their portfolio updates. When the easiest available evidence is a set of volume metrics that point in the right direction, the temptation to lead with those numbers is enormous, and largely human. You’re not lying. The PRs really did triple. You’re choosing which truth to tell.
Middle managers have their own version of the same problem. Flagging that output quality might be slipping requires you to have measured it in the first place, then to surface data that makes your team look worse at a moment when everyone is supposed to be riding the AI productivity wave. The path of least resistance is to report the wins and quietly manage the quality issues as they come up, case by case, without connecting them to a pattern.
The result is a self-reinforcing loop. Good volume numbers go up the chain. Nobody with skin in the game has a structural reason to complicate the picture. The narrative hardens into received wisdom, and the received wisdom becomes the benchmark everyone else feels pressure to match.
What Quality Measurement Actually Looks Like
The good news is that this isn’t unfixable. It does require accepting that measuring quality is harder than measuring volume, slower to yield results, and occasionally going to surface findings that are uncomfortable.
The first step is separating output volume from output value. For engineering teams, this might mean tracking defect rates, rework frequency, or code review rejection rates alongside velocity. For content teams, it might mean looking at engagement depth rather than publish frequency. For support, resolution durability, meaning whether the same customer comes back with the same issue within 30 days, is often more revealing than tickets-closed-per-hour.
There’s a concept worth naming here: quality decay lag. The problems created by AI-assisted work frequently don’t surface immediately. A piece of content that was optimised by a tool for a certain kind of engagement might perform fine for two weeks and then flatline. A codebase that was built fast might only become difficult to maintain six months in, when the original developers have context-switched and the accumulated technical debt becomes someone else’s problem. Attribution gets hard. The connection to AI tooling gets lost. The metric that caused the problem is long gone from anyone’s dashboard.
Practically speaking, founders and operators can start small. Pick one AI-assisted workflow, define what a good output looks like beyond “it was produced”, and track that measure for 60 days alongside the volume metric. The comparison is usually instructive.
The Uncomfortable Middle Ground
To be fair, AI tools do create genuine, measurable gains in specific contexts. Boilerplate code. First-draft research summaries. Routine customer communications that follow predictable patterns. The case against AI productivity theatre is not a case against AI tools. It’s a case against applying blanket claims to contexts where they don’t hold.
The teams that are actually getting durable value from these tools tend to share a few characteristics. They were sceptical early. They resisted the pressure to report dramatic numbers before they had any. They built feedback loops between AI output and real-world outcomes before they scaled. And they were willing to pull back in areas where the quality data told them something the volume data was hiding.
There’s a quiet group of operators out there doing exactly this, and they are generally not the ones on stages talking about their 10x gains. They’re busy actually getting them.
The louder cohort, the ones with the bar charts and the board-ready dashboards, are often in the early stages of a problem they haven’t fully reckoned with yet. The volume is up. The quality debt is accumulating. And at some point those two lines are going to cross.
What To Do Before They Cross
This week, before the next sprint review or investor update, pick one AI-assisted workflow and trace the quality, not the speed. Look at the outputs that came out of it three months ago and ask whether they held up. Whether the customer stayed. Whether the code is still readable. Whether anyone would be proud of it.
The founders who will build something durable with AI are the ones willing to sit with that question honestly, even when the answer is inconvenient. The productivity gains are real, but they are narrower and more fragile than the headlines suggest. Knowing exactly where yours are solid, and where they are mostly noise, is the kind of advantage that actually compounds.

Bill is a conversion-focused copywriter with over a decade of experience in digital marketing and SEO strategy. Since 2015, he has helped Perth businesses scale by blending persuasive storytelling with data-driven technical optimisation. Specialising in high-converting landing pages and comprehensive content frameworks, Bill ensures every piece of copy aligns with Open Door Creative’s mission to turn local brands into industry trendsetters.



