More

jerf · 2026-04-07T17:50:19 1775584219

I see two basic cases for the people who are claiming it is useless at this point.

One is that they tried AI-based coding a year or two ago, came to the IMHO completely correct at that time conclusion that it was nearly useless, and have not tried it since then to see that the situation has changed. To which the solution is, try it again. It changed a lot.

The other are those who have incorporated into their personal identity that they hate AI and will never use it. I have seen people do things like fire AI at a task they have good reasons to believe it will fail at, and when it does, project that out to all tasks without letting themselves consciously realize that picking a bad task on purpose skews the deck.

To those people my solution is to encourage them to hold on to their skepticism. I try to hold on to it as well despite the incredible cognitive temptation not to. It is very useful. But at the same time... yeah, there was a step change in the past year or so. It has gotten a lot more useful...

... but a lot of that utility is in ways that don't obviate skilled senior coding skills. It likes to write scripting code without strong types. Since the last time I wrote that, I have in fact used it in a situation where there were enough strong types that it spontaneously originated some, but it still tends to write scripting code out of that context no matter what language it is working in. It is good at very straight-line solutions to code but I rarely see it suggest using databases, or event sourcing, or a message bus, or any of a lot of other things... it has a lot of Not Invented Here syndrome where it instead bashes out some minimal solution that passes the unit tests with flying colors but can't be deployed at scale. No matter how much documentation a project has it often ends up duplicating code just because the context window is only so large and it doesn't necessarily know where the duplicated code might be. There's all sorts of ways it still needs help to produce good output.

I also wonder how many people are failing to prompt it enough. Some of my prompts are basically "take this and do that and write a function to log the error", but a lot of my prompts are a screen or two of relevant context of the project, what it is we are trying to do, why the obvious solution doesn't work, here's some other code to look at, here's the relevant bugs and some Wiki documentation on the planning of the project, we should use {event sourcing/immutable trees/stored procedures/whatever}, interact with me for questions before starting anything. This is not a complete explanation of what they are doing anymore, but there's still a lot of ways in which what an LLM can really do is style transfer... it is just taking "take this and do that and write a function to log the error" and style-transforming that into source code. If you want it to do something interesting it really helps to give it enough information in the first place for the "style transfer" to get a hold of and do something with. Don't feel silly "explaining it to a computer", you're giving the function enough data to operate on.

jerf · 2026-04-07T15:35:31 1775576131

"Also, it seems like all the Copilot 'connected experiences' are really just a chat window without any real integration with the applications they are embedded in."

I was triple-booked today. Two of the meetings in question should have had significant overlap between attendees. I figured, hey, there's this Copilot thing here, I'll ask it what the overlap is, that's the sort of thing an AI should be able to do. It comes back and reports that there is one person in both meetings, and that "one person" isn't even me. That doesn't seem right. One of the autocompleted suggestions for the next thing to ask is "show me the entire list of attendees" so I'm like, sure, do that.

It turns out that the API Copilot has access to can only access the first ten attendees of the meetings. Both meetings were much larger than that.

Insert rant here about hobbling 2026 servers with random "plucked out of my bum" limits on processing based on the capabilities of roughly 2000-era servers for the sheer silliness of a default 10-attendee limit being imposed on any API into Outlook.

But also in general what a complete waste of hooking up an amazingly sophisticated AI model to such an impoverished view of the world.

bombcar · 2026-04-07T16:01:07 1775577667

There are innumerable companies built around the Outlook calendar; you’d think Microsoft could get something right here with AI; but they seem unable.

JSR_FDED · 2026-04-07T17:09:41 1775581781

"plucked out of my bum" sounds so much more sophisticated than “pulled out of my ass”

RickHull · 2026-04-07T17:51:53 1775584313

Plucked betwixt mine cheeks

jerf · 2026-04-07T15:13:31 1775574811

I would personally pay money not to have this thing.

It's wonderful and I love that someone else loves it. The care put into it is fantastic. Vive la différence.

(https://en.wiktionary.org/wiki/vive_la_diff%C3%A9rence for those who may not recognize that phrase.)

jerf · 2026-04-07T14:10:53 1775571053

"say that AI developers should incorporate more real-world diversity into large language model (LLM) training sets,"

Are you kidding me?

How much more "real-world diversity" could they possibly incorporate into the models than the entire freaking Internet and also every scrap of text written on paper the AI companies could get a hold of?

How on Earth could someone think that AIs speak like this because their training set is full of LLM-speak? This is transparently obviously false.

This is the sort of massive, blinding error that calls everything else written in the article into question. Whatever their mental model of AI is it has no resemblance to reality.

exe34 · 2026-04-07T14:12:21 1775571141

The problem isn't the diversity in the training set - the problem is that the method by design picks the average.

jerf · 2026-04-07T15:14:37 1775574877

LLM speak isn't even quite the average either. It's something more like the average, then pushed through more training to turn it into the agents we think of today (a fresh-off-the-training-set LLM really is in some sense that "fancy autocomplete" that people called it for a while), then trained by the AI companies to be generally inoffensive and do the other things they want them to do. All of the further actions push the agents away from the original LLM average. The similarity of the "LLM tone" across multiple models and multiple companies, and the fact I don't think this tone has been super directly trained for, strongly suggests that the process of converting the raw LLM into the desirable agents we all use is some sort of strong strange attractor for the LLMs that are pushed through that process.

Maybe they are training for that tone now, either deliberately or accidentally. But my belief that they weren't initially comes from the fact that it's a new tone that I doubt anyone designed with deliberation. It bears strong resemblance to "corporate bland", but it is also clearly distinct from it in that we could all tell those two apart very easily.

exe34 · 2026-04-07T16:30:35 1775579435

Like foxes coming up with floppy ears.

jerf · 2026-04-05T11:43:16 1775389396

There is a study that shows that what the model is doing behind the scenes in those cases is a lot more than just outputting those tokens.

For an LLM, tokens are thought. They have no ability to think, by whatever definition of that word you like, without outputting something. The token only represents a tiny fraction of the internal state changes made when a token is output.

Clearly there is an optimal for each task (not necessarily a global one) and a concrete model for a given task can be arbitrarily far from it. But you'd need to test it out for each case, not just assume that "less tokens = more better". You can be forcing your model to be dumber without realizing it if you're not testing.

DonHopkins · 2026-04-05T12:34:15 1775392455

High dimensional vectors are thought (insofar as you can define what that even means). Tokens are one dimensional input that navigates the thought, and output that renders the thought. The "thinking" takes place in the high dimension space, not the one dimensional stream of tokens.

gchamonlive · 2026-04-05T13:14:23 1775394863

But isn't the one dimensional tokens a reflex of high dimensional space? What you see is "sure let's take a look at that" but behind the curtains it's actually an indication that it's searching a very specific latent space which might be radically different if those tokens didn't exist. Or not. In any case, you can't just make that claim and isolate those two processes. They might be totally unrelated but they also might be tightly interconnected.

sheiyei · 2026-04-05T13:30:45 1775395845

I assume in practice, filler words do nothing of value. When words add or mean nothing (their weights are basically 0 in relation to the subject), I don't see why they'd affect what the model outputs (except cause more filler words)?

gchamonlive · 2026-04-05T13:33:07 1775395987

Politeness have impact (https://arxiv.org/abs/2402.14531) so I wouldn't be too fast to make any kind of claim with a technology we don't know exactly how it works.

xgulfie · 2026-04-05T13:11:27 1775394687

> For an LLM, tokens are thought. They have no ability to think

This is so funny

jerf · 2026-04-02T13:34:25 1775136865

The existence of science does not obligate us to either receive a double-blind study of massive statistical significance on the exact question we're thinking about or to throw our hands up in total ignorance and sit in a corner crying about the lack of a scientific study.

It is perfectly rational to rely on experience for what screens do to children when that's all we have. You operate on that standard all the time. I know that, because you have no choice. There are plenty of choices you must make without a "data" to back you up on.

Moreover, there is plenty of data on this topic and if there is any study out there that even remotely supports the idea that it's all just hunky-dory for kids to be exposed to arbitrary amounts of "screen time" and parents are just silly for being worried about what it may be doing to their children, I sure haven't seen it go by. (I don't love the vagueness of the term "screen time" but for this discussion it'll do... anyone who wants to complain about it in a reply be my guest but be aware I don't really like it either.)

"Politicians" didn't even begin to enter into my decisions and I doubt it did for very many people either. This is one of the cases where the politicians are just jumping in front of an existing parade and claiming to be the leaders. But they aren't, and the parade isn't following them.

jerf · 2026-04-01T14:34:07 1775054047

I've been waiting for the article talking about how AI is affecting COBOL. Preferably with quotes from actual COBOL programmers since I can already theorize as well as the next guy but I'm interested in the reports from the field.

While LLMs have become pretty good at generating code, I think some of their other capabilities are still undersold and poorly understood, and one of them is that they are very good at porting. AI may offer the way out for porting COBOL finally.

You definitely can't just blindly point it at one code base and tell it to convert to another. The LLMs do "blur" the code, I find, just sort of deciding that maybe this little clause wasn't important and dropping it. (Though in some cases I've encountered this, I sometimes understand where it is coming from, when the old code was twisty and full of indirection I often as a human have a hard time being sure what is and is not used just by reading the code too...) But the process is still way, way faster than the old days of typing the new code in one line at a time by staring at the old code. It's definitely way cheaper to port a code base into a new language in 2026 than it was in 2020. In 2020 it was so expensive it was almost always not even an option. I think a lot of people have not caught up with the cost reductions in such porting actions now, and are not correctly calculating that into their costs.

It is easier than ever to get out of a language that has some fundamental issue that is hard to overcome (performance, general lack of capability like COBOL) and into something more modern that doesn't have that flaw.

jerf · 2026-03-31T19:09:58 1774984198

Nominally, Common Law, the system of law that to a first approximation is used in countries descended from the UK, has a lot of protections of that sort. You can't put "unconscionable" terms in a contract, e.g., it is simply illegal to sell yourself into total slavery in common-law derived systems. All signatories to a contract must consent, must not be under duress, the contract can not be one-sided (this doesn't mean "the contract is 'fair' from a 3rd-party point of view" but "the contract can't result in only one side giving things but the other doesn't"), and a variety of other common sense rules.

In practice, availing yourself of any of these protections is a massively uphill battle. Judges tend to presume that these common law matters are already embedded into the de facto legal system because the people writing the laws already operated under those assumptions while framing the law. Personally, I disagree and think a lot of these protections have eroded away into either nothing, or so little that it might as well be nothing, but you have a 0% chance of drawing me as a judge in your case so that won't help you much if you try.

jerf · 2026-03-31T13:25:11 1774963511

I think this is a fundamental LLM issue. I recall a paper a ways back about trying to get the LLMs to be too succinct, and the problem is, with the way they are implemented, the only way they can "think" is to emit a token. IIRC it demonstrated that even when the model is just babbling something like "Yeah, let's take a look at the issue you just raised" that under the hood, even though that output was superficially useless, it was also changing its state in ways related to solving the problem and not just outputting that superficially useless text.

It helps to understand that, because then you can also not be annoyed by things like "Let's do X. No, wait, X has this problem, let's do Y instead." You might think to yourself, if X was a bad idea, couldn't it have considered X and rejected it without outputting a token?" and the answer is, that sentence was it considering X and rejecting it, and no, there is no way for it to do that and not emit tokens. Thinking is inextricably tied to output for LLMs.

There is even some fairly substantial evidence from a couple of different angles that the thinking output is only somewhat loosely correlated to what the model is "actually" doing.

Token efficiency is an interesting question to ponder and it is something to worry about that the providers have incentives to be flabby with their tokens when you're paying per token, but the question is certainly not as easy as just trying to get the models to be "more succinct" in general.

I often discuss a "next gen" AI architecture after LLMs and I anticipate one of the differences it will have is the ability to think without also having to output anything. LLMs are really nifty but they store too much of their "state" in their own output. As a human being, while I find like many other people that if I'm doing deep thinking on a topic it helps to write stuff down, it certainly isn't necessary for me to continuously output things in order to think about things, and if anything I'm on the "absent minded"/"scatterbrained" side... if I'm storing a lot of my state in my output for the past couple of hours then it sure isn't terribly accessible to my conscious mind when I do things like open the pantry door only to totally forget the reason I had for opening it between having that reason and walking to the pantry.

jerf · 2026-03-30T20:30:20 1774902620

The people spamming curl did step one, "write me a vulnerability report on X" but skipped step two, "verify for me that it's actually exploitable". Tack on a step three where a reasonably educated user in the field of security research does a sanity check on the vulnerability implementation as well and you'll have a pipeline that doesn't generate a ton of false positives. The question then will rather be how cost-effective it is for the tokens and the still-non-zero human time involved.