counterargument: I always hated writing docs and therefore most of thing that I done at my day job didn't had any and it made using it more difficult for others.
I was also burnt many times where some software docs said one thing and after many hours of debugging I found out that code does something different.
LLMs are so good at creating decent descriptions and keeping them up to date that I believe docs are the number one thing to use them for.
yes, you can tell human didn't write them, so what?
if they are correct I see no issue at all.
Indeed. Are you verifying that they are correct, or are you glancing at the output and seeing something that seems plausible enough and then not really scrutinizing? Because the latter is how LLMs often propagate errors: through humans choosing to trust the fancy predictive text engine, abdicating their own responsibility in the process.
As a consumer of an API, I would much rather have static types and nothing else than incorrect LLM-generated prosaic documentation.
> Can you provide examples in the wild of LLMs creating bad descriptions of code? Has it ever happened to you?
Yes. Docs it produces are generally very generic, like it could be the docs for anything, with project-specifics sprinkled in, and pieces that are definitely incorrect about how the code works.
> for some stuff we have to trust LLMs to be correct 99% of the time
The above post is an example of the LLM providing a bad description of the code. "Local first" with its default support being for OpenAI and Anthropic models... that makes it local... third?
Can you provide examples in the wild of LLMs creating good descriptions of code?
>Somehow I doubt at this point in time they can even fail at something so simple.
I think it depends on your expectations. Writing good documentation is not simple.
Good API documentation should explain how to combine the functions of the API to achieve specific goals. It should warn of incorrect assumptions and potential mistakes that might easily happen. It should explain how potentially problematic edge cases are handled.
And second, good API documentation should avoid committing to implementation details. Simply verbalising the code is the opposite of that. Where the function signatures do not formally and exhaustively define everything the API promises, documentation should fill in the gaps.
This happens to me all the time. I always ask claude to re-check the generated docs and test each example/snippet, sometimes more than once; more often than not, there are issues.
I guess the term "correct" is different for me. I shouldn't be able to nitpick comments out like that. Putting LLM's aside, they basically did not proof-read your own docs. Things like "No python required" are an obvious sign that you
1. Started talking about a project (you {found || built} in python), want to do it in Rust (because it's fast!) and then the LLM put that detail in the docs.
If they did not skim it out, then they did not read their own documentation. There was no love put into it.
Nonetheless, I totally get your point, and the docs are at least descriptive.
> LLMs are so good at creating decent descriptions and keeping them up to date
I totally agree! And now that CC auto-updates memories, it's much easier to keep track of changes. I'm also confident that you're the type of person to at least proof-read what it wrote, so I do not doubt your validity in your argument. It just sounds a lot different when you look at this project.
but isn't it what we wanted?
we complained so much that LLM uses deprecated or outdated apis instead of current version because they relied so much on what they remembered
To be clear, what I mean is that grok will query 30 pages and then answer your question vaguely or wrongly and then ask for clarification of what it meant and then it goes and requeries everything again ... I can imagine why it might need to revisit pages etc and it might be a UI thing but it still feels like until you yell at it to stop searching for answers to summarise it doesn't activate it's "think with what you got" mode.
I guess we could call this gathering and then do your best conditional on what you found right now.
I used minimax M2 (context it's very unreliable) for installation and it didn't work and my document folder is missing, help
how do you even debug this? imagine you some path or behaviour is changed in new os release and model thinks it knows better?
if anything goes wrong who is responsible?
Previously they didn't officially quote how much limits were included in pro subscription, but you could determine it by upgrading from plus that reached weekly limits - after upgrade you ended up with 8% used limits, so we can assume they reduced limits by half just for pro users.
inference costs nothing in comparison to training (you have so many requests in parallel at their scale), for inference they should be profitable even when you drain whole weekly quota every week
but of course they have to pay for training too.
this looks like short sighted money grab (do they need it?), that trade short term profit for trust and customer base (again) as people will cancel their unusable subscriptions.
changing model family when you have instructions tuned for for one of them is tricky and takes long time so people will stick to one of them for some time, but with API pricing you quickly start looking for alternatives and openai gpt-5 family is also fine for coding when you spend some time tuning it.
another pain is switching your agent software, moving from CC to codex is more painful than just picking different model in things like OC, this is plausible argument why they are doing this.
And even when AMD does move their mainstream desktop processors to a new socket, there's very little reason to expect them to be trying to accommodate multi-GPU setups. SLI and Crossfire are dead, multi-GPU gaming isn't coming back for the foreseeable future, so multi-GPU is more or less a purely workstation/server feature at this point. They're not going to increase the cost of their mainstream platform for the sole purpose of cannibalizing Threadripper sales.
>why do the results need to be decrypted by trustees after the election?
they probably design this system to be used for government elections, how they can convince anyone to use it when they do not use it for their own elections?
I gave it a spin with instructions that worked great with gpt-5-codex (5.1 regressed a lot so I do not even compare to it).
Code quality was fine for my very limited tests but I was disappointed with instruction following.
I tried few tricks but I wasn't able to convince it to first present plan before starting implementation.
I have instructions describing that it should first do exploration (where it tried to discover what I want) then plan implementation and then code, but it always jumps directly to code.
this is bug issue for me especially because gemini-cli lacks plan mode like Claude code.
for codex those instructions make plan mode redundant.
it would be great if that could be in the article in the first place. (I'm assuming you are the author)
reply