More

ComplexSystems · 2026-04-07T22:38:43 1775601523

This line of reasoning makes no sense when the AI can just be given access to a fuzzer. I would guess that it probably did have access to a fuzzer to put together some of these vulnerabilities.

ComplexSystems · 2026-03-28T16:28:34 1774715314

> So the data cannot possibly tell you anything about how likely is the observed outcome, because the observed outcome is the only outcome that you observe.

This could also be viewed as supporting the Bayesian perspective, where the observed data are not viewed as random variables - they are fixed. This is because, as you say, the observed outcome is the only outcome that you observe. It is the classical setting, in comparison, where we instead do our analysis by treating the sample as a random variable, placing the counterfactual on other non-observed values ("what if I had drawn a different sample?"), even though we didn't. Bayesian methods treat the data as gospel truth, and place the counterfactual on the different parameters ("what if the population were different?"), even though it isn't.

The other criticism you have is

> The problem with this approach is that we can only observe ONE level of treatment effectiveness, i.e., the level of treatment effectiveness that the treatment actually possesses. All other possible levels of effectiveness are entirely hypothetical.

This is true of both Bayesian and classical methods. We build models that would explain how different hypothetical levels of effectiveness would affect what data we should expect to see - that is the whole point. Classical methods also involve exploring scenarios in which purely hypothetical values of the parameter may be potentially true, and characterizing counterfactual samples that could have been drawn from them, even though in real life they couldn't have been.

lottin · 2026-03-31T16:41:55 1774975315

Statistical inference is based on random sampling. The data has to be random, otherwise it doesn't work.

I wrote another comment here clarifying my point, if you're interested: https://news.ycombinator.com/item?id=47566033

ComplexSystems · 2026-03-28T16:06:28 1774713988

I found it surprising that this article persistently did not capitalize the word "Bayesian." Is this a new trend or something?

ComplexSystems · 2026-02-28T16:56:44 1772297804

I certainly don't. If a software developer has found a way to use these tools that works well for them and produces good results, that's a good thing.

hobs · 2026-03-01T11:50:42 1772365842

I have no dog in this fight, but simply claiming a count of tests get you anything is like saying your code coverage is 100% - it sounds really good until you think about what 5000 unreviewed tests actually... do.

ComplexSystems · 2026-02-27T01:25:16 1772155516

They are mathematical models of what human beings would say. That's it.

astrange · 2026-02-27T02:39:04 1772159944

Yeah, and you don't want them to be models of what neurotic people say. That's why you want Opus 4.6 and not Bing Sydney.

For instance, your comment's existence makes it harder to align them.

https://alignmentpretraining.ai

ComplexSystems · 2026-02-20T20:19:08 1771618748

> the code is not public, so we can't know.

I feel like you're making this statement in bad faith, rather than honestly believing the developers of the forum software here have built in a clause to pin simonw's comments to the top.

ComplexSystems · 2026-02-05T21:22:47 1770326567

Do you ever replace ChatGPT models with cheaper, distilled, quantized, etc ones to save cost?

tedsanders · 2026-02-06T06:52:18 1770360738

We do care about cost, of course. If money didn't matter, everyone would get infinite rate limits, 10M context windows, and free subscriptions. So if we make new models more efficient without nerfing them, that's great. And that's generally what's happened over the past few years. If you look at GPT-4 (from 2023), it was far less efficient than today's models, which meant it had slower latency, lower rate limits, and tiny context windows (I think it might have been like 4K originally, which sounds insanely low now). Today, GPT-5 Thinking is way more efficient than GPT-4 was, but it's also way more useful and way more reliable. So we're big fans of efficiency as long as it doesn't nerf the utility of the models. The more efficient the models are, the more we can crank up speeds and rate limits and context windows.

That said, there are definitely cases where we intentionally trade off intelligence for greater efficiency. For example, we never made GPT-4.5 the default model in ChatGPT, even though it was an awesome model at writing and other tasks, because it was quite costly to serve and the juice wasn't worth the squeeze for the average person (no one wants to get rate limited after 10 messages). A second example: in our API, we intentionally serve dumber mini and nano models for developers who prioritize speed and cost. A third example: we recently reduced the default thinking times in ChatGPT to speed up the times that people were having to wait for answers, which in a sense is a bit of a nerf, though this decision was purely about listening to feedback to make ChatGPT better and had nothing to do with cost (and for the people who want longer thinking times, they can still manually select Extended/Heavy).

I'm not going to comment on the specific techniques used to make GPT-5 so much more efficient than GPT-4, but I will say that we don't do any gimmicks like nerfing by time of day or nerfing after launch. And when we do make newer models more efficient than older models, it mostly gets returned to people in the form of better speeds, rate limits, context windows, and new features.

acuozzo · 2026-02-08T00:12:35 1770509555

> we never made GPT-4.5 the default model in ChatGPT

Just wondering: Why was it never made available via API? You can just charge whatever per token to make sure it's profitable like o1-pro.

I use it via my ChatGPT-Pro subscription, but I still find the API omission weird.

tedsanders · 2026-02-08T17:16:35 1770570995

It was available in the API from Feb 2025 to July 2025, I believe. There's probably another world where we could have kept it around longer, but there's a surprising amount of fixed cost in maintaining / optimizing / serving models, so we made the call to focus our resources on accelerating the next gen instead. A bit of a bummer, as it had some unique qualities.

jghn · 2026-02-05T21:28:59 1770326939

He literally said no to this in his GP post

ComplexSystems · 2026-01-25T09:53:55 1769334835

How much does this setup cost? I don't think a regular Claude Max subscription makes this possible.

amelius · 2026-01-25T12:42:32 1769344952

Can't you just use time-sharing and let the entire task run over night?

ComplexSystems · 2026-01-18T16:37:57 1768754277

The model doesn't know what its training data is, nor does it know what sequences of tokens appeared verbatim in there, so this kind of thing doesn't work.

ComplexSystems · 2026-01-14T04:40:54 1768365654

It's really only about the flooding the marketplace part, not about the extracting volume without their consent part. The current set of GenAI music models may involve training a black box model on a huge data set of scraped music, but would the net effect on artists' economic situations be any different if an alternate method led to the same result? Suppose some huge AI corporation hired a bunch of musicians, music theory Ph. D's, Grammy winning engineers, signal processing gurus, whatever, and hand-built a totally explainable model, from first principles, that required no external training data. So now they can crowd artists out of the marketplace that way instead. I don't think it would be much better.