My experience so far tells me that the default path with AI tooling is that it lets us create without learning. So the author is right in that they can pay for a seat in this revolution whenever they want.
A practitioner with more experience maybe a few percentage points more productive, but the median - grab subscription, get tool, prompt, will be mostly good enough.
I expect tools to start embedding an SLM ~1B range locally for something like this. It will become a feature in a rapidly changing landscape and its need may disappear in the future. How would you turn into a sticky product?
It seems like a real problem for me. Probably because I'm not overly inspired to pay for a Claude x5 subscription and really hate the session restrictions (esp when weekly expend at the end of the week can't be utilized due to session restrictions) on a standard pro model. Most of my tasks are basically using superpowers and I find I get about 30-90m of usage per session before I run out of tokens (resets about every 4 hours after which I generally don't get back to until the next day (my weekly usage is about 50% so lots of wastage due to bad scheduling). A tool like this could add better afk like agent interoperability through batching etc as a one tool fits all like scenario.
If this gets its foot in the door/market-share there is plenty of runway here for adding more optimized agent utilization and adding value for users.
Agreed on the need, and this space needs more exploration that is not going to come from big-cos as they are incentivised in boosting spend. I've been exploring the same problem statement, but with a different approach https://github.com/hsaliak/std_slop/blob/main/docs/CONTEXT_M....
The comment was more around how to make their approach sticky.. I feel that local SLMs can replicate what this product does.
https://github.com/hsaliak/std_slop a sqlite centric coding agent. it does a few things differently.
1 - context is completely managed in sqlite
2 - it has a "mail model" basically, it uses the git email workflow as the agentic plan => code => review loop. You become "linus" in this mode, and the patches are guaranteed bisect safe.
3 - everything is done in a javascript control plane, no free form tools like read / write / patch. Those are available but within a javascript repl. So the agent works on that. You get other benefits such as being able to persist js functions in the database for future use that's specific to your codebase.
I explored this in std::slop (my clanker) https://github.com/hsaliak/std_slop. One of it's differentiating features of this clanker i that it only has a single tool call, run_js.
The LLM produces js scripts to do it's work. Naturally, i tried to teach it to add comments for these scripts and incorporate literate programming elements. This was interesting because, every tool call now 'hydrated' some free form thinking, but it comes at output token cost.
Output Tokens are expensive! In GPT-5.4 it's ~180 dollars per Million tokens!
I've settled for brief descriptions that communicate 'why' as a result. The code is documentation after all.
This is a very nice and clean implementation. Related to this - I've been exploring injecting landlock and seccomp profiles directly into the elf binary, so that applications that are backed by some LLM, but want to 'do the right thing' can lock themselves out. This ships a custom process loader (that reads the .sandbox section) and applies the policies, not unlike bubblewrap which uses namespaces). The loading can be pushed to a kernel module in the future.
https://github.com/hsaliak/sacre_bleu very rough around the edges, but it works.
In the past there were apps that either behaved well, or had malicious intent, but with these LLM backed apps, you are going to see apps that want to behave well, but cannot guarantee it.
We are going to see a lot of experimentation in this space until the UX settles!
This brings the Linux Kernel style patch => discuss => merge by maintainer workflow to agents. You get bisect safe patches you 'review' and provide feedback and approve.
While a SKILL could mimic this, being built in allows me to place access control and 'gate' destructive actions so the LLM is forced to follow this workflow. Overall, this works really well for me. I am able to get bisect-safe patches, and then review / re-roll them until I get exactly what I want, then I merge them.
Sure this may be the path to software factories, but it scales 'enough' for medium size projects and I've been able to build in a way that I maintain strong understanding of the code that goes in.
No it does not. None of these models have the “depth” that the frontier models have across a variety of conversations, tasks and situations. Working with them is like playing snakes and ladders, you never know when it’s going to do something crazy and set you back.
The Gemini-CLI situation is poor. They did not communicate that AI Pro or AI Ultra accounts cannot be used with this API broadly earlier. I specifically remember searching for this info. Seeing this made me wonder if I had missed it. Turns out it was added to the TOS 2 days ago - diff
https://github.com/google-gemini/gemini-cli/pull/20488/chang.... I'd be happy to stand corrected here.
Anti Gravity I understand, they are subsidizing to promote a general IDE, but I dont understand constraining the generative AI backend that Gemini CLI hits.
It takes your query, computes the complexity of the request, and tries to route it to the appropriate model. There is a /manual command i think, to pick the right model.
They mask the 429s well in Gemini-Cli - if an endpoint is rate limited, they try another, or route to another model, etc to keep service availability up.
Your experience on the 429s is consistent with mine - the 429s is the first thing they need to fix. Fix that and they have a solid model at a good price point.
I use my own coding agent (https://github.com/hsaliak/std_slop) and not being able to bring my (now cancelled) AI account with Google to it is a bummer.
I'd still use it with the Code Assist Standard license if the google cloud API subscription allows for it but I have no clarification.
> It takes your query, computes the complexity of the request, and tries to route it to the appropriate model. There is a /manual command i think, to pick the right model.
That is what is should do, but there is no > 2.5 model shown in /model and it always picks a 2.5 model. Ive enabled preview models in the google cloud project as well.
If I pass the 3 model in start param it shows 3 in the lower right corner but it is still using 2.5.
I know google has issues dealing with paying customers but the current state is a shit show. If you go to the gemini-cli repo its a deluge of issues and ai slop. It seems there is a cadre of people jumping to be the first person to pump an issue into claude and get some sort of PR clout.
It might be good but it needs more time to cook, or they need to take a step back and evaluate what they should consider a paid product.
This is the way. This exact workflow is my sweet spot.
In my coding agent std::slop I've optimized for this workflow
https://github.com/hsaliak/std_slop/blob/main/docs/mail_mode... basically the idea is that you are the 'maintainer' and you get bisect safe, git patches that you review (or ask a code reviewer skill or another agent to review). Any change re-rolls the whole stack. Git already supports such a flow and I added it to the agent. A simple markdown skill does not work because it 'forgets'. A 'github' based PR flow felt too externally dependent. This workflow is enforced by a 'patcher' skill, and once that's active, tools do not work unless they follow the enforced flow.
I think a lot of people are going to feel comfortable using agents this way rather than going full blast. I do all my development this way.
This is broadly how I worked when I was still using chat instead of cli agents for LLM support. The downside, I feel, is that unless this is a codebase / language / architecture I do not know, it feels faster to just code by hand with the AI as a reviewer rather than a writer.
- on the Lua integration https://x.com/hsaliak/status/2022911468262350976 (I've since disabled the recursion, not every code file is long and it seems simpler to not do it), but the rest of it is still there
Also /review and /feedback. /feedback (the non code version) opens up the LLM's last response in an editor so you can give line by line comments. Inspired by "not top posting" from mailing lists.
I quit x so cant read beyond toplevel links. I subscribed to your tool on github, would appreciate blog-posts-in-release notes to keep up with future developments. Will try the tool. Rare to find something new among ai hype, thank you.
Fair enough. I'll find a way to publish some of this. I try to cover most of the information in the docs/ folder, and keep it up to date.
Blog posts in release notes is a good idea!
A practitioner with more experience maybe a few percentage points more productive, but the median - grab subscription, get tool, prompt, will be mostly good enough.
reply