In November 2025, Amazon told its engineers to use Kiro, its internal AI coding tool, at least 80% of the time. It was a corporate OKR. Roughly 1,500 engineers pushed back, saying they preferred other tools. Amazon overrode them.
Four months later, the tool they were forced to use broke production. Multiple times. The biggest incident cost 6.3 million orders in a single afternoon.
This is a story about what happens when you mandate a tool instead of trusting the people who use it.
The incidents
It started in December 2025. Kiro was given operator-level permissions to fix a small issue in AWS Cost Explorer. The AI assessed the situation, considered its options, and decided the best course of action was to delete the entire environment and rebuild it from scratch.
The outage lasted thirteen hours. In the China region.
That’s not a bug. The AI didn’t malfunction. It made a technically defensible decision that was contextually insane. “Fix this small thing” does not mean “nuke the environment and start over.” Any engineer with six months of experience knows that. The AI didn’t, because the AI doesn’t have context. It has permissions.
Then in March 2026, a separate AI-assisted change through Q Developer contributed to incorrect delivery times across Amazon’s marketplaces. 120,000 lost orders. 1.6 million website errors. That was the warm-up.
On March 5, the big one hit. A six-hour outage on Amazon’s ecommerce division, traced directly to AI-generated code. 6.3 million orders, gone. Amazon internally called it a “high blast radius” event. No kidding.
The fix that isn’t a fix
Amazon’s response was a new company-wide policy: senior engineers must sign off on any AI-assisted change before it ships.
I get why. Something broke badly, multiple times, and management needed a visible response. Senior sign-off is a visible response. It looks responsible. It sounds like accountability.
It’s also a management answer to an engineering problem.
Code review has never been great at catching bugs. That’s not what it’s for. Code review spreads context. It makes sure more than one person understands what a change does and why. It catches design problems, not logic errors. If your bug is subtle enough to pass a test suite, it’s subtle enough to pass a code review.
Someone on Hacker News called this the “reverse-centaur” problem, and I think they nailed it. The old model was humans writing code and machines helping validate it. The new model is machines writing code and humans trying to validate it. But machines generate at volume. Humans fatigue on validation. So senior review doesn’t solve the problem. It just moves the bottleneck up the org chart.
CodeRabbit’s 2026 State of AI vs. Human Code report found AI-generated PRs show 1.7x more issues than human-written ones. Logic errors are 1.75x higher. XSS vulnerabilities are 2.74x more likely. A separate Cortex report found change failure rates up roughly 30% year-over-year as AI-assisted PR volume has increased. We’re shipping more code, faster, with more bugs.
Making a senior engineer sign each one doesn’t change the math. It just makes that engineer the scapegoat when the next one breaks.
What Amazon actually got wrong
The sign-off policy isn’t the problem. The mandate was the problem.
1,500 engineers said “we prefer other tools.” Amazon said “use this one anyway, 80% of the time, it’s an OKR.” When you force adoption of a tool against the judgment of the people who use it, you’re not accelerating innovation. You’re overriding the one signal that matters: whether the people closest to the code trust the tool.
Engineers build trust with tools the same way they build trust with coworkers. Slowly. Through experience. Through seeing it handle edge cases well and fail gracefully when it doesn’t. You can’t OKR your way to that trust.
This is the same dynamic playing out with AI usage metrics everywhere. Companies tracking “tokenmaxxing” leaderboards where engineers compete on raw AI usage volume. The metric becomes the goal. The outcome gets lost somewhere between the dashboard and the deploy.
What we do instead
I run a two-person agency. Victoria Garland builds Shopify apps and integrations for merchants whose revenue depends on our code working. Same blast radius risk as Amazon, just at a different scale. A broken add-to-cart flow on a merchant’s store doesn’t cost 6.3 million orders, but it costs their orders, and that’s their livelihood.
We don’t have a sign-off policy because we don’t need one. Every change gets reviewed because there’s nobody else. That’s the natural guardrail of a small team. There’s no code that ships without both of us understanding what it does.
But the real answer isn’t the review. It’s everything before the review.
Staging environments and CI pipelines catch the obvious stuff before it ships. TDD catches the rest. A bug can happen once, and then we know the pattern. It gets a test, and it never happens the same way again. That’s not a policy. That’s just engineering.
The danger for small teams isn’t missing a review step. It’s when AI-generated code looks right enough that you skip the steps you normally wouldn’t. The tests pass. The logic reads clean. The PR is small and tidy. You approve it because it looks good, not because you actually traced the logic. That’s the real trap, and no sign-off policy protects against it.
The question nobody’s answering
We’ve spent two years arguing about whether AI can write code. Amazon just taught us the harder question: who’s accountable when it ships?
Senior sign-off says “this person approved it.” But approval isn’t understanding. The senior engineer who signs off on fifty AI-generated PRs a week doesn’t understand each one better than the junior who would have written it by hand. They’re just the name on the form.
The real accountability is in the system, not the signature. Tests, staging, CI, constraints that catch failures before they reach a customer. And engineers who trust their tools because they chose them, not because a corporate OKR told them to.
Amazon got the diagnosis right. AI-assisted code has a higher failure rate. But the treatment they prescribed assumes humans can keep up with the volume AI generates. They can’t. Nobody can. Fifty PRs a week reviewed by a tired senior engineer is just theater with a signature at the bottom.
Better systems beat more signatures. Every time.
Shameless plug: At Victoria Garland, we build Shopify apps for merchants who can’t afford a six-hour outage. We don’t need a mandate to review our own code.