For this, which summarises vibe coding and hence the rest of the article, the models aren't good enough yet for novel applications.
With current models and assuming your engineers are of a reasonable level of experience, for now it seems to result in either greatly reduced velocity and higher costs, or worse outcomes.
One course correction in terms of planned process, because the model missed an obvious implication or statement, can save days of churning.
The math only really has a chance to work if you reduce your spend on in-house talent to compensate, and your product sits on a well-trodden path.
In terms of capability we're still at "could you easily outsource this particular project, low touch, to your typical software farm?"
You can expand it beyond novel applications. The models aren't good enough for autonomous coding without a human in the loop period.
They can one shot basic changes and refactors, or even many full prototypes, but for pretty much everything else they're going to start making mistakes at some point. Usually very quickly. It's just where the technology is right now.
The thing that frustrates me is that this is really easy to demonstrate. Articles like this are essentially hallucinations that, at least many, people mystifyingly take seriously.
I assume the reason they get any traction is that a lot of people don't have enough experience with LLM agents yet to be confident that their personal experience generalizes. So they think maybe there are magical context tricks to get the current generation of agents to not make the kinds of mistakes they're seeing.
There aren't. It doesn't matter if it's Opus 4.6 in Claude Code or Codex 5.3 xhigh, they still hallucinate, fail to comprehend context and otherwise drift.
Anyone who can read code can fire up an instance and see this for themselves. Or you can prove it for free by looking at the code of any app that the author says was vibecoded without human review. You won't have to look very hard.
Agents can accomplish impressive things but also, often enough, they make incomprehensibly bad decisions or make things up. It's baked into the technology. We might figure out how to solve that problem eventually, but we haven't yet.
You can iterate, add more context to AGENTS.md or CLAUDE.md, add skills, setup hooks, and no matter how many times you do it the agents will still make mistakes. You can make specialized code review agents and run them in parallel, you can have competing models do audits, you can do dozens of passes and spend all the tokens you want, if it's a non trivial amount of code, doing non trivial things, and there's no human in the loop, there will still be critical mistakes.
No one has demonstrated different behavior, articles and posts claiming otherwise never attempt to prove that what they claim is actually possible. Because it isn't.
Just to be clear, I think coding agents are incredibly useful tools and I use them extensively. But you can't currently use them to write production code without a human in the loop. If you're not reading and understanding the code, you're going to be shipping vulnerabilities and tech debt.
Articles like this are just hype. But as long as they keep making front pages they'll keep distorting the conversation. And it's an otherwise interesting conversation! We're living through an unprecented paradigm shift, the field of possibilities is vast and there's a lot to figure out. The idea of autonomous coding agents is just a distraction from that, at least for now.
I agree with you, I think. In the non-digital world people are regularly held at least partly responsible for the things they let happen through negligence.
I could leave my car unlocked and running in my drive with nobody in it and if someone gets injured I'll have some explaining to do. Likewise for unsecured firearms, even unfenced swimming pools in some parts of the world, and many other things.
But we tend to ignore it in the digital. Likewise for compromised devices. Your compromised toaster can just keep joining those DDOS campaigns, as long as it doesn't torrent anything it's never going to reflect on you.
The inertia (or actively maintained status quo) in Europe towards the US platforms is massive.
Anecdotally, I recently found myself in the local government building of a small European town. They run several free digitalisation classes for small businesses.
It might be worth considering that if those are intro classes, then it's not like they can't be easily replaced: it's not like the audience is wedded to any of those at an introductory level.
I am not much of a devops person but running your own DB in a VPS with docker containers don't you also need to handle all this manually too?
1) Creating and restoring backups
2) Unoptimized disk access for db usage (can't be done from docker?)
3) Disk failure due to non-standard use-case
4) Sharding is quite difficult to set up
5) Monitoring is quite different from normal server monitoring
But surely, for a small app that can run one big server for the DB is probably still much cheaper. I just wonder how hard it really is and how often you actually run into problems.
My guess is some people have never worked with the constraints of time and reliability. They think setting up a database is just running a few commands from a tutorial, or they're very experienced and understand the pitfalls well; most people don't fall into the latter category.
But to answer your question: running your own DB is hard if you don't want to lose or corrupt your data. AWS is reliable and relatively cheap, at least during the bootstrapping and scaling stages.
Maybe it's unfair, unhelpful or overdone to call out llmisms, but if OP is reading this I stopped reading pretty quickly as a result of things like:
> [CUE] does not just hold the text; it validates that the pieces actually fit. It ensures that the code in your explanation is the exact same code in your final build. It is like having a Lego set where the bricks refuse to click if you are building something structurally unsound.
And that's despite having a passing interest in both cue and LP
> Maybe it's unfair, unhelpful or overdone to call out llmisms
Not anywhere near as overdone as posting AI generated/revised articles to HN that are an absolute slog to read.
A real shame, honestly, because the other article[1] from this blog that made it to the front page recently was good. The difference in writing style between them is striking, and I think it serves as a really good example of why I just can't stand reading AI articles.
Ah, the negative positive construction. Another casualty of the anti-AI movement. The semicolon was almost certainly inserted manually in place of an em-dash, models almost never use them.
Accusing people of using generative AI is definitely one of those things you have to be careful with, but on the other hand, I still think it's OK to critique writing styles that are now cliche because of AI. I mean come on, it's not just the negative-positive construction. This part is just as cliche:
> It is like having a Lego set where the bricks refuse to click if you are building something structurally unsound.
And the headings follow that AI-stank rhythmic pattern with most of them starting with "The":
> The “Frankenstein” Problem
> The Basic Engine
> The Ignition Key
> The Polyglot Pipeline
I could go on, but I really don't think you have to.
I mean look, I'm no Pulitzer prize winner myself, but let's face it, it would be hard to make an article feel more it was adapted from an LLM output if you actually tried.
For this, which summarises vibe coding and hence the rest of the article, the models aren't good enough yet for novel applications.
With current models and assuming your engineers are of a reasonable level of experience, for now it seems to result in either greatly reduced velocity and higher costs, or worse outcomes.
One course correction in terms of planned process, because the model missed an obvious implication or statement, can save days of churning.
The math only really has a chance to work if you reduce your spend on in-house talent to compensate, and your product sits on a well-trodden path.
In terms of capability we're still at "could you easily outsource this particular project, low touch, to your typical software farm?"
reply