IMO is not breakthrough, if you craft proper prompts you can excel imo with 2.5 ...

impossiblefork · 2025-08-07T18:19:30 1754590770

It wasn't long ago that test-time scaling wasn't possible. Test-time scaling is a core part of what makes this a breakthrough.

I don't believe your assessment though. IMO is hard, and Google have said that they use search and some way of combining different reasoning traces, so while I haven't read that paper yet, and of course, it may support your view, but I just don't believe it.

We are not close to solving IMO with publicly known methods.

demirbey05 · 2025-08-07T18:35:13 1754591713

test time scaling is based on methods from pre-2020. If you look details of modern LLMs its pretty small prob to encounter method from 2020+(ROPE,GRPO). I am not saying IMO is not impressive, but it is not breakthrough, if they said they used different paradigm then test-time scaling I would say breakthrough.

> We are not close to solving IMO with publicly known methods. The point here is not method rather computation power. You can solve any verifiable task with high computation, absolutely there must be tweaks in methods but I don't think it is something very big and different. Just OAI asserted they solved with breakthrough.

Wait for self-adapting LLMs. We will see at most in 2 years, now all big tech are focusing on that I think.

impossiblefork · 2025-08-07T19:19:41 1754594381

What kind of test time scaling did we have pre-2020?

Non-output tokens were basically introduced by QuietSTaR, which is rather new. What method from five years ago does anything like that?