I have little doubt where things are going, but the irony of the way they communicate versus the quality of their actual product is palpable.
Claude Code (the product, not the underlying model) has been one of the buggiest, least polished products I have ever used. And it's not exactly rocket science to begin with. Maybe they should try writing slightly less than 100% of their code with AI?
More generally, Anthropic's reliability track record for a company which claims to have solved coding is astonishingly poor. Just look at their status page - https://status.claude.com/ - multiple severe incidents, every day. And that's to say nothing of the constant stream of bugs for simple behavior in the desktop app, Claude Code, their various IDE integrations, the tools they offer in the API, and so on.
Their models are so good that they make dealing with the rest all worth it. But if I were a non-research engineer at Anthropic, I wouldn't strut around gloating. I'd hide my head in a paper bag.
I am constantly amazed how developers went hard for claude-code when there were and are so many better implementations of the same idea.
It's also a tool that has a ton of telemetry, doesn't take advantage of the OS sandbox, and has so many tiny little patch updates that my company has become overworked trying to manage this.
Its worst feature (to me at least), is the, "CLAUDE.md"s sprinkled all over, everywhere in our repository. It's impossible to know when or if one of them gets read, and what random stale effect, when it does decide to read it, has now been triggered. Yes, I know, I'm responsible for keeping them up to date and they should be part of any PR, but claude itself doesn't always even know it needs to update any of them, because it decided to ignore the parent CLAUDE.md file.
Sometimes the agent (any agent, not just Claude — cursor, codex) would miss a rule or skill that is listed in AGENTS.md or Claude.md and I'm like "why did you miss this skill, it's in this file" and it's like "oh! I didn't see it there. Next time, reference the skill or AGENTS.md and I'll pick it up!"
Like, isn't the whole point of those files to not have to constantly reference them??
"Coding" is solved in the same way that "writing English language" is solved by LLMs. Given ideas, AI can generate acceptable output. It's not writing the next "Ulysses," though, and it's definitely not coming up with authentically creative ideas.
But the days of needing to learn esoteric syntax in order to write code are probably numbered.
OK, but seriously... if Anthropic is on the "best" path, aside from somehow nuking all AI research labs, an IPO would be the most socially responsible thing that they could do. Right?
Exactly my experience, I know they vibe code features and that’s fine but it looks like they don’t do proper testing which is surprising to me because all you need bunch of cheap interns to some decent enough testing
No there is a wide gap between good and bad testers. Great testers are worth their weight in gold and delight in ruining programmer's days all day long.
IMO not a good place to skimp and a GREAT place to spend for talent.
> Great testers are worth their weight in gold and delight in ruining programmer's days all day long.
Site note: all the great testers I've know when my employers had separate QA departments all ended up becoming programmers, either by studying on the side or through in-house mentorship. By all second hand accounts they've become great programmers too.
They bring down production because the version string was changed incorrectly to add an extra date. That would have been picked up in even the most basic testing since the app couldn't even start.
That's a bummer. I was looking forward to testing this, but that seems pretty limiting.
My current solution uses Tailscale with Termius on iOS. It's a pretty robust solution so far, except for the actual difficulty of reading/working on a mobile screen. But for the most part, input controls work.
My one gripe with Termius is that I can't put text directly into stdin using the default iOS voice-to-text feature baked into the keyboard.
I’ve been doing this for a while [1], but ultimately settled on a building a thin transport layer for Telegram to accept and return media, and persistent channels, vastly improved messaging UX, etc. and ended up turning this into a ‘claw with a heartbeat and SOUL [2].
I really enjoyed reading both posts. Thanks for sharing!
I, like many others, have written my own "claw" implementation, but it's stagnated a bit. I use it through Slack, but the idea of journaling with it is compelling. Especially when combined with the recent "two sentence" journaling article[1] that floated through HN not too long ago.
Great posts! So far [2] is the only "claw" that has caught my interest, mostly because it isn't trying to do everything itself in some bespoke, NIH way.
I've been using email and Cloudeflare email router. You don't get the direct feedback of a terminal, but it's much easier to read what's happening in html formatted email.
It also feels kind of nice to just fire off an email and let it do it's thing.
Oooh, now this is a very interesting idea. I live in my inbox and keep it quite tidy. Email is the perfect place to fire-and-forget ideas and then come back to a full response.
Do you have a blog outlining how you set it up? I'm curious to learn more.
First of all /remote-control in the terminal just printed a long url. Even though they advertise we can control it from the mobile app (apparently it should show a QR code but doesn't). I fire up the mobile app but the session is nowhere to be seen. I try typing the long random URL in the mobile browser, but it simply throws me to the app, but not the session. I read random reddit threads and they say the session will be under "Code", not "Chats", but for that you have to connect github to the Claude app (??, I just want to connect to the terminal Claude on my PC, not github). Ok I do it.
Now even though the session is idle on the pc, the app shows it as working... I try tapping the stop button, nothing happens. I also can't type anything into it. Ok I try starting a prompt on the pc. It starts the work on the PC, but on the mobile app I get a permission dialog... Where I can deny or allow the thing that actually already started on the pc because I already gave permission for that on the PC. And many more. Super buggy.
I wonder if they let Claude write the tests for their new features... That's a huge pitfall. You can think it works and Claude assures you all is fine but when you start it everything falls apart because there are lots of tests but none actually test the actual things.
You jest but I was flabbergasted when doing some AI backed feature that the fix was adding a "The result you send back MUST be accurate." to the already pretty clear prompt.
I'm willing to bet most of their libraries are definitely vibe coded. I'm using the claude-agent-sdk and there are quite a few bugs and some weird design decisions. And looking through the actual python code it's definitely not what I would classify 'best practice'. Bunch of imports in functions, switching on strings instead of enums, etc.
I had to downgrade to an earlier release because an update introduced a regression where they weren't handling all of their own event types.
I think they are betting that any of this code is transient and not worth too much effort because once Opus 5 is traimed, they can just ask it to refactor and fix everything and improve code quality enough so that things don't fall apart while adding more features, and when opus 5.5 comes out it will be able to clean up after opus 5. And so on. They don't expect these codebase to be long lived and worth the time investment.
A few weeks ago the github integration was completely broken on the claude website for multiple days. It's very clear they vibe code everything and while it's laudable that they eat their own dogfood, it really projects a very amateurish image about their infrastructure and implementation quality.
In theory, comments on Hacker News should advance discussion and meet a certain quality bar lest they be downvoted to make room for the ones that meet the criteria. I am not sure if this ever was true in practice, it certainly seems to have waned in the years I have been a reader of this forum (see one of the many pelican on a bike comments on any AI model release thread), but I'd expect some people still try to vote with this in mind.
Being sarcastic doesn't lower the bar for a comment to meet to not get downvoted, so I wouldn't go thinking people miss the sarcasm without first considering whether the comment adds to the discussion when wondering why a comment is downvoted.
I only understood it after reading some of co_king_5’s other comments. This is Poe’s law in action. I know several people who converted into AI coding cultists and they say the same things but seriously. Curiously none of them were coders before AI.
I'm willing to bet you don't full-on YOLO vibecode like the lead Claude Code developer, running 10 Claude Code sessions in parallel to push 259 pull requests that modify >40k lines of code in a month [0]? There is zero chance any of that code was rigorously reviewed.
I use Claude Code almost every day [1], and when used properly (i.e. with manual oversight), it's an amazing productivity booster. The issue is when it's used to produce far more code than can be rigorously reviewed.
This is my general experience with the claude app, I don't know what they're smoking over at anthropic but their ability to touch mobile arch inappropriately with AI is reaching critical levels.
On top of that is something they should have had from earlier times. My biggest pain point is to not to be able to continue from my phone. I just use a service to pipe telegram to any cc session in the dev machine. This is the number 1 reason why I got excited by openclaw in the first place but its overkill to have it just to control cc
> - You can't interrupt Claude (you press stop and he keeps going!)
This is normal behavior on desktop sometimes its in the middle of something? I also assume there's some latency
> - At best it stops but just keeps spinning
Latency issues then?
> - It can get stuck in plan mode
I've had this happen from the desktop, and using Claude Code from mobile before remote control, I assume this has nothing to do with remote control but a partial outage of sorts with Claude Code sometimes?
I don't work for Anthropic, just basing off my anecdotal experience.
We’ve been building in this space for a while, and the issues listed here are exactly the hard parts: session connectivity, reconnection logic, multi-session UX, and keeping state in-sync across devices. Especially when it comes to long running tasks and the edge cases that show up in real use.
Isn't it a simpler solution to create some protocol for a browser or device announce an age restricted user is present and then have parents lock down devices as they see fit?
Aside from the privacy concerns, all this age verification tech seems incredibly complicated and expensive.
I think this solution exists (e.g. android parental lock, but also ISP routers). But parents and industry have failed to do so on a greater scale. So legislation is going for a more affirmative action that doesn't require parental consent or collaboration.
A service provider of adult content now cannot serve a child, regardless of the involvement or lack thereof of a parent.
In EU/UK, some are sadly app only. I avoid those. Many others are pushing apps as a 2FA, even if you use their website. You need to insist to get another authentication system, like TAN. Some governments are also pushing mobile IDs.
The best Linux for phones, SailfishOS, has a fairly good Android compatibility layer that runs many bank apps well. But despite that, it's an uphill battle. The network effect of the duopoly is gigantic.
peter's claw is a lot more than just a wrapper around my slop.
i too had plenty of offers, but so far chose not to follow through with any of them, as i like my life as is.
also, peter is a good friend and gives plenty of credit. in fact, less credit would be nice, so i don't have to endure more vibeslopped issues and PRs going forward :)
For me it is the simplicity of it (transparent minimal system prompts and harnest), you can extend it the way you like, I don't have to install a (buggy) Electron app (CC or Codex app), it integrates where I work, because it's simple (like in a standard terminal on VS code). I'm not locked in with any vendor and can switch models whenever I want, and most importantly, I can effectively use it within apps that are themselves using it as coding agent (the meta part - like a chat UI for very specific business cases). Being in TypeScript, it integrates very well with the browser and one can leverage the browser sandbox around it.
I cannot directly answer your question, because I am looking into this topic myself currently, but I found this HN discussion from two weeks ago, which should give you more insights about pi: https://news.ycombinator.com/item?id=46844822
i'm not a member of openclaw.
i build some oss in parallel, and added 3 or so commits to the openclaw repo. and peter is taking some of the openclaw contributors with him.
What nonsense is this. You seem to be implying that contributing to an open source project creates some kind of entitlement to whatever another contributor attains. That’s not how it works.
I think it would be a very interesting discussion in how open source projects get compensated. Acting like it's shameful to discuss things, in a thread literally about someone making a massive payday by getting hired from an OS project, is odd.
It's not like it would be an impossible ask to include a stipulation to also compensate other developers, but what do I know? In fact I'm curious why this doesn't happen more, but it feels like crab bucket mentality which is the mindset VC culture has exported across the world.
Other than the response from Mario itself, pi is very frequently showcased at meetups organised by Peter/OpenClaw community, so there is definitely crediting involved.
> There is no code, there are no tools, there is no configuration, and there are no projects.
To add to this, OpenClaw is incapable of doing anything meaningful. The context management is horrible, the bot constantly forgets basic instructions, and often misconfigures itself to the point of crashing.
(5) Is there a reason why we don't investigate using a cable to pull down energy to earth? That seems to be a far more valuable and tractable problem to solve.
"Buy a mac mini, copy a couple of lines to install" is marketing fluff. It's incredibly easy to trip moltbot into a config error, and its context management is also a total mess. The agent will outright forget the last 3 messages after compaction occurs even though the logs are available on disk. Finally, it never remembers instructions properly.
Overall, it's a good idea but incredibly rough due to what I assume is heavy vibe coding.
It's been a few days, but when I tried it, it just completely bricked itself because it tried to install a plugin (matrix) even though that was already installed. That wasn't some esoteric config or anything. It bricked itself right in the onboarding process.
When I investigated the issue, I found a bunch of hardcoded developer paths and a handful of other issues and decided I'm good, actually.
Software engineers don't understand how user hostile all these AI gizmos are.
Terminals are scary. AI running local code is scary. Random Github software is scary. And in my experience, normies are far more security paranoid than developers when it comes to AI.
Normies have a much more realistic take on AI than technical people or semi-technical "power users":
* They LOVE image-generating AI and AI that messes with their own photos/videos.
* They will ask ChatGPT, Gemini, etc and just believe the result.
* They will ask Copilot to help them make a formula in Excel and be happy to be done.
The common theme here is they don't care. To them, AI is just a neat thing. It's not a huge difference in their lives. They don't think about the environmental impact much unless someone tells them it's bad, via a high-quality video stream that itself was vastly worse for the environment than any AI conversation or image generation ever could be.
They will play a game 100% made by AI because their friend said it was fun. They don't care that some AAA publisher lost a sale on their "human made for sure, just trust us :nod:" identical game because the bored person was able to pull of something good enough with little effort (and better design decisions).
They also don't care if some article or book or whatever was written partially or entirely by AI as long as it's good. The AI part just isn't important to them. Not even a little bit!
Its kind of funny how it’s the exact same discussion as we used to have about privacy in the advent of social media. "I’m not worried, I got nothing to hide!" The convenience benefits of Facebook (in the beginning, likely less nowadays) massively outweighed the privacy concerns of the layman or woman.
This is not unusual. Spotify is included because it is a relevant source of evidence as the custodian of the data. It improves the narrative that the data wasn't just indexed but obtained illegally.
It's because the higher up the stack you go, tools become more declarative and literate. Calling sort is far easier than understanding the algorithm for example.
> Calling sort is far easier than understanding the algorithm for example.
This was one of my gripes in college, why am I implementing something if I just need to understand what it does? I'm going to use the built-in version anyway.
Because that's the entire point of college. It's supposed to teach you the fundamentals - how to think, how to problem solve, how to form mental models and adapt them, how things you use actually work. Knowing how different sorting functions work and what the tradeoffs are allows you to pick the best sorting function for your data and hardware. If the tools you have aren't doing the job, you can mend them or build new tools.
So you know which sort to call because there isn't a right answer for all cases.
And so you can write your own because you're probably going to want to sort data in a specific way. Sort doesn't mean in numerical increasing or decreasing order, it means whatever order you want. You're sorting far more often than you're calling the sort function.
My degree was not specifically CS, it was a related degree, the focus was on landing jobs, but they still covered some CS concepts because some students were in fact doing a CS degree. I was more focused on show me what I need to build things. I have never had to hand-craft any algorithm in my 15 years of coding, it just makes no sense to me. Someone else figured it out, I'm contempt understanding the algorithms.
In my twenty years, I've rerolled famous algorithms "every now and then".
Its almost wild to me that you never have.
Sometimes you need a better sort for just one task. Sometimes you need a parser because the data was never 100% standards compliant. Sometimes you need to reread Knuth for his line-breaking algorithm.
My high school computer science teacher (best one I ever had) once told us this anecdote when we were learning sorting algorithms:
He was brought in by the state to do some coaching for existing software devs back in the 90s. When he was going over the various different basic algorithms (insertion sort, selection sort, etc.) one of the devs in the back of the class piped up with, "why are you wasting our time? C++ has qsort built in."
When you're processing millions of records, many of which are probably already sorted, using an insertion sort to put a few new records into a sorted list, or using selection sort to grab the few records you need to the front of the queue, is going to be an order of magnitude faster than just calling qsort every time.
Turned out he worked for department of revenue. So my teacher roasted him with "oh, so you're the reason it takes us so long to get our tax returns back."
Thinking that you can just scoot by using the built-in version is how we get to the horrible state of optimization that we're in. Software has gotten slow because devs have gotten lazy and don't bother to understand the basics of programming anymore. We should be running a machine shop, not trying to build a jet engine out of Lego.
I mean, the lesson I got from my 10X class was pretty much that: "never write your own math library, unless you're working on maintaining one yourself".
funnily enough, this wasn't limited to contributing to some popular OS initiative. You can call YAGNI, but many companies do in fact have their own libraries to maintain internally. So it comes up more than you expect.
On a higher level, the time I took to implement a bunch of sorts helped me be able to read the docs for sort(), realize it's a quicksort implentation, and make judgements like
1. yeah, that works
2. this is overkill for my small dataset, I'll just whip up basic bubblesort
3. oh, there's multiple sort API's and some sorts are in-place. I'll use this one
4. This is an important operation and I need a more robust sorting library. I'll explain it to the team with XYZ
The reasoning was the important lesson, not the ability to know what sorting is.
Right now:
- You can't interrupt Claude (you press stop and he keeps going!)
- At best it stops but just keeps spinning
- The UI disconnects intermittently
- It disconnects if you switch to other parts of Claude
- It can get stuck in plan mode
- Introspection is poor
- You see XML in the output instead of things like buttons
- One session at a time
- Sessions at times don't load
- Everytime you navigate away from Code you need to wait for your session to reappear
I'm sure I'm missing a few things.
reply