I had a working scheduling app in three days.
Recurring events rendered on screen, synced to Google Calendar, the whole thing. I called it Claro. It looked like a product. It felt like a product. Then the app started putting fourteen events on a Tuesday morning, some of them in the past. It took months to fix.
The prototype was the easy part. It’s always the easy part.
The fun part
Claro was a personal project, a rebuild of a scheduling app I’d been thinking about for a while. I wanted to see how much AI could actually do, so I gave it a long leash. I didn’t prompt it on the data model. I didn’t spec out the recurrence algorithm. I just described what I wanted and watched what it planned.
And for the first few days, it felt like flying. I had screens, a calendar view, events rendering, Google Calendar sync. The momentum was intoxicating. You don’t notice you’re accumulating technical debt when the code is appearing that fast. Everything looks right because everything looks right: the UI is there, the data flows, the buttons work. You mistake the presence of a working demo for the presence of a working product.
The METR study I’ve talked about before found that AI made experienced developers 19% slower while they believed they were 20% faster. That 39-point gap between perception and reality? I was living inside it. The prototype was real. My understanding of how it worked was not.
Fourteen events on a Tuesday
The recurrence logic was the first thing to break. The algorithm had no concept of a calculation horizon. It would generate events into the future with no boundary, or sometimes not far enough. One morning I opened Google Calendar and there were fourteen instances of the same event stacked on a single day. Some of them were in the past, as if the app had decided retroactively that I’d been busy yesterday morning.
The thing is, I didn’t design the recurrence engine. The AI did. I approved it without fully understanding the horizon calculation, the series generation, the edge cases around time zones and daylight saving. It’s like inheriting a codebase from a developer who quit (except that developer was me ten minutes ago).
I’ve always considered writing software to be like telling a story. It has to make sense. There should be a narrative you can follow from data model to UI. When AI writes the first draft of the story, the code compiles and the tests pass, but the narrative is missing. I couldn’t explain why Claro calculated horizons the way it did. And if you can’t explain your own code, you can’t debug it and you definitely can’t hand it to someone else.
The AI fixed its own mess (eventually)
Here’s the part that complicates the narrative: the AI did eventually solve the recurrence problem. Not in a flash of brilliance, but in a slow, iterative feedback loop over months.
I set up Playwright tests that could see the actual output in Google Calendar. The app syncs events to your calendar, so when the recurrence logic was wrong, the evidence was right there: fourteen events on a Tuesday, events in the past, missing series. I’d describe the problem, point the AI at the Playwright results, and let it write tests and code itself out of the hole.
It worked. Months later, the recurrence engine was solid. But “months later it was resolved” is doing a lot of heavy lifting in that sentence. The prototype took days. The fix took months of coming back to it, giving it new test cases, letting it see its own failures. The 100 hours aren’t gone. They’re just spent differently. Instead of writing the fix yourself, you’re supervising an intern who’s very fast and very confident and occasionally puts fourteen events on a Tuesday.
The agency version of this
At Victoria Garland, we see the same gap from the other side of the table.
Coding is maybe 40% of a Shopify build. The rest is QA, client feedback rounds, content migration, launch checklists, training the client’s team on how to actually use the thing. AI speeds up that 40%, and then the client looks at the timeline and says why does this still take six weeks?
Because the six weeks was never about writing code. It was about everything around the code.
We produce code but allow about a month for the feature to bake in. This is when the edge cases hit. This is when someone says “I wish the button was over there, it would be ten times faster for my workflow.” You can’t compress that. It literally takes time for humans to use software, perceive it, and tell you everything you got wrong. Even if the code can be built in a flash, we can’t comprehend what we’ve built until we’ve lived with it for a while. The perception takes time.
A Hacker News thread with 332 comments put it plainly: “Testing workloads that take hours to run still take hours to run with either a human or LLM.” AI writes the code. It doesn’t watch the pipeline. It doesn’t sit with the client while they try to figure out why the button isn’t where they expected it.
The gap between prototype and product
The 100-hour gap is a description of what software actually is.
Software is code plus the slow, human process of discovering whether it actually works for the people using it. That process hasn’t gotten faster. The first day got faster. The next thirty didn’t.
The developers are the gatekeepers of this. We’re the ones who need to guide the client toward the correct process; slowly and intentionally building something that runs solidly for a long time. You can build a hasty tent in an afternoon. It’ll blow over with the next weekend’s wind gust. Or you can take the time to anchor it properly and end up with something that actually shelters people.
AI made the prototype trivial. The product is still the hard part. The 100 hours are still the 100 hours. They just look different now.
Shameless plug: At Victoria Garland, we build Shopify stores that survive past the prototype phase. We’ve done our time in the 100-hour gap so our clients don’t have to. If you’re building something on Shopify and want it done right, let’s talk.