I have heard it said that tokens will become commodities. I like being able to switch between Open AI and Anthropics models, but I feel I'd manage if one of them disappeared. I'd probably even get by with Gemini. I don't want to lock in to any one provider any more than I want to lock in to my energy provider. I might pay 2x for a better model, but no more, and I can see that not being the case for much longer.
My current take is that AI is helping me experiment much faster. I can get less involved with the parts of an application that matter less and focus more (manually) on the parts that do. I agree with a lot of the sentiment here - even with the best intentions of reviewing every line of AI code, when it works well and I'm working fast on low stakes functionality, that sometimes doesn't happen. This can be offset however by using AI efficiencies to maintain better test coverage than I would by hand (unit and e2e), having documentation updated with assistance and having diagrams maintained to help me review. There are still some annoyances, when the AI struggles with seemingly simple issues, but I think that we all have to admit that programming was difficult, and quality issues existed before AI.
I built static site publishing into AS Notes, to add in to the mix (https://www.asnotes.io an extension for VS Code). It's markdown and wikilink based, and can publish either the whole workspace or from one or more specific folders. I've designed it so that I was not dependent on any platform for my static sites. Publishing is a pro feature, but it's a one time lifetime licence purchase.
I've used open claw (just for learning, I agree with the author it's not reliable enough to do anything useful) but also have a similar daily summary routine which is a basic gemini api call to a personal mcp server that has access to my email, calendar etc. The latter is so much more reliable. Open claw flows sometimes nail it, and then the next day fails miserably. It seems like we need a way to 'bank' the correct behaviours - like 'do it like you did it on Monday'. I feel that for any high percentage reliability, we will end up moving towards using LLMs as glue with as much of the actual work as possible being handed off to MCP or persisted routine code. The best use case for LLMs currently is writing code, because once it's written, tested and committed, it's useful for the long term. If we had to generate the same code on the fly for every run, there's no way it would ever work reliably. If we extrapolate that idea, I think it helps to see what we can and can't expect from AI.
This is interesting. I haven't used OpenClaw but I set up my own autonomous agent using Codex + ChatGPT Plus + systemd + normal UNIX email and user account infrastructure. And it's been working great! I'm very happy with it. It's been doing all kinds of tasks for me, effectively as an employee of my company.
I haven't seen any issues with memory so far. Using one long rolling context window, a diary and a markdown wiki folder seems sufficient to have it do stuff well. It's early days still and I might still encounter issues as I demand more, but I might just create a second or third bot and treat them as 'specialists' as I would with employees.
I did (using Claude Code) something that sounds very similar to this. It’s a bunch of bootstrapped Unix tools, systemd units, and some markdown files. Two comments:
- I suspect that in this moment, cobbling together your own simple version of a “claw-alike” is far more likely to be productive than a “real” claw. These are still pretty complex systems! And if you don’t have good mental models of what they’re doing under the hood and why, they’re very likely to fail in surprising, infuriating, or downright dangerous ways.
For example, I have implemented my own “sleep” context compaction process and while I’m certain there are objectively better implementations of it than mine… My one is legible to me and therefore I can predict with some accuracy how my productivity tamagotchi will behave day-to-day in a way that I could not if I wasn’t involved in creating it.
(Nb I expect this is a temporary state of affairs while the quality gap between homemade and “professional” just isn’t that big)
- I do use mine as a personal assistant, and
I think there is a lot of potential value in this category for people like me with ADD-style brains. For whatever reason, explaining in some detail how a task should be done is often much easier for me than just doing the task (even if, objectively, there’s equal or higher effort required for the former). It therefore doesn’t do anything I _couldn’t_ do myself. But it does do stuff I _wouldn’t_ do on my own.
Right - I think email is a much better UI than Slack or WhatsApp or Discord for that reason. It forces you to write properly and explain what you want, instead of firing off a quick chat. Writing things down helps you think. And because coding harnesses like Codex are very good at interacting with their UNIX environments but are also kinda slow, email's higher latency expectations are a better fit for the underlying technology.
Two categories: actual useful work for the company, and improving the bot's own infrastructure.
Useful work includes: bug triage, matching up external user bug reports on GitHub to the internal YouTrack, fixing easy looking bugs, working on a redesign of the website. I also want to extend it to handling the quarterly accounting, which is already largely automated with AI but I still need to run the scripts myself, preparing answers to support queries, and more work on bug fixing+features. It has access to the bug tracker, internal git and CI system as if it were an employee and uses all of those quite successfully.
Meta-work has so far included: making a console so I can watch what it's doing when it wakes up, regularly organizing its own notes and home directory, improving the wakeup rhythm, and packaging up its infrastructure to a repeatable install script so I can create more of them. I work with a charity in the UK whose owner has expressed interest in an OpenClaw but I warned him off because of all the horror stories. If this experiment continues to work out I might create some more agents for people like him.
I'm not sure it's super useful for individuals. I haven't felt any great need to treat it as a personal assistant yet. ChatGPT web UI works fine for most day to day stuff in my personal life. It's very much acting like an extra employee would at a software company, not a personal secretary or anything like that.
It sounds like our experience differs because you wanted something more controlled with access to your own personal information like email, etc, whereas I gave "Axiom" (it chose its own name) its own accounts and keep it strictly separated from mine. Also, so far I haven't given it many regular repeating tasks beyond a nightly wakeup to maintain its own home directory. I can imagine that for e.g. the accounting work we'd need to do some meta-work first on a calendar integration so it doesn't forget.
I’m doing this exact same thing in my solo saas company, except with Cursor’s Cloud Agents. I can kick them off from web, slack, linear, or on a scheduled basis, so I’m doing a lot of the same things as you. It’s just prompts on a cron, with access to some tools and skills, but super useful.
That unreliability was why I gave up on OpenClaw. I tried hard to give it very simple tasks but it had a high degree of failure. Heartbeats and RAG are lightyears away from where they need to be. I'm not sure if this can be overcome using an application layer right now, but I trust that many people are trying, and I'm eager to see what emerges in the next year. In the mean time I know that they're working very hard on continuous learning - real-time updates to weights and parametric knowledge. It could be that in a year or so, we can all have customised models.
That would be great if that comes to fruition. Investing in a model with weights updates would be like investing in employee training, rather than just giving the same unreliable employee more and more specific instructions.
I've had a crack at this problem in Agent Kanban for VS Code (https://github.com/appsoftwareltd/vscode-agent-kanban). The core idea is that you converse with the agent in a markdown task file in a plan, todo, implement flow, and that I have found works really well for long running complex tasks, and I use this tool every day. But after a while, the agent just forgets to converse in the task file. The only way to get it to (mostly) reliably converse in the task file is to reference the task file and instructions in AGENTS.md. There is support for git work trees and skipping commits of the agents file so as not to pollute the file with the specific task info. There is also an option for working without work trees, but in this flow I had to add chat participant "refresh" commands to help the agent keep it's instructions fresh in context. It's a problem that I believe will slowly get better as better agents appear, and get cheaper to use, because general LLM capability is the key differentiator at the moment.
I built the AS Notes extension for VS Code (https://asnotes.io) partly because I wanted to be able to write my notes with the support of other VS Code extensions, and because of the agent harness options in VS Code (copilot etc). The key thing for easy zettlekasten management is really good wikilink support in markdown. AS Notes supports nested wikilinking and automatic updating in the index on rename etc.
This looks great. Building right into the editor looks like a solid way to go. I built "Agent Kanban" (anextension) for VS Code to enforce a similar "plan, tasks, implement" flow as you describe. That flow is really powerful for getting solid Agentic coding results. My tool went the route of encouraging the model via augmenting AGENTS.md and having the Kanban task file be markdown that the user and agent converse in (with some support for git worktrees which helps when running multiple sessions in parallel): https://www.appsoftware.com/blog/introducing-vs-code-agent-k...
It's always surprised me that Youtube being owned by the worlds leading search company has such awful on-site search. I've always left Youtube and searched for youtube videos via Google search, which brings up better results!
I guess YouTube doesn't really have any competition, i.e it's not like you're going to switch to the competitor video platform and search there. Your only option is to watch through multiple other videos before finding the one you want, which is great for them.
I think a good analogy is people not being able to work on modern cars because they are too complex or require specialised tools. True I can still go places with my car, but when it goes wrong I'm less likely to be able to resolve the problem without (paid for) specialised help.
And just like modern vehicles rob the user of autonomy, so too for coding agents. Modern tech moves further and further away from empowering normal people and increasingly serves to grow the influence of corporations and governments over our day to day lives.
It's not inherent, but it is reality unless folks stop giving up agency for convenience. I'm not holding my breath.
Cars are actually a good metaphor, it works on so many levels. Modern cars have "democratized" access to long-distance travel in a sense, and most people don't need to do any heavy maintenance themselves. But the flipside is that places that have adopted it have become "car dependent" and build cities assuming access to cars.
Are we net better off than if we didn't have cars and simply built public transport with walkable cities?
That is a very possible reality. Pay extra for no ads or a reduced cost trip if you consent to having your eyeballs held open while separate ads are played to each eyeball.
I built AS Notes for VS Code (https://www.asnotes.io) with the option for this usage pattern in mind. By augmenting VS Code so it has the tooling we use in personal knowledge management systems, it makes it easy to write, link and update markdown / wikilinked notes manually (with mermaid / LaTeX rendering capability also) - but by using VS Code we have easy access to an Agent harness that we can direct to work on, or use our notes as context. Others have pointed out that context bloat is an issue, but no more so than when you use the copilot harness (or any other) inside a large codebase. I find I get more value from my AI conversations when I persist the outputs in markdown like this.
reply