Hacker Newsnew | past | comments | ask | show | jobs | submit | HarHarVeryFunny's commentslogin

That Claude Opus 4.5 result of 4,973 is what you get if you just vectorize the reference kernel. In fact you should be under 4,900 doing that with very little effort (I tried doing this by hand yesterday).

The performance killer is the "random" access reads of the tree node data which the scalar implementation hides, together with the lack of load bandwidth, and to tackle that you'd have to rewrite the kernel to optimize the tree data loading and processing.


Interesting statement coming from Nadella - almost that AI is a solution looking for a problem, or at least looking for a problem that justifies the cost in terms of the resources (energy, memory chips, fab capacity) it is sucking up, not to mention looming societal disruption.

There obviously are some compelling use cases for "AI", but it's certainly questionable if any of those are really making people's lives any better, especially if you take "AI" to mean LLMs and fake videos, not more bespoke uses like AlphaFold which is not only beneficial, but also not a resource hog.


Maybe it's the model you are using.

Even a year ago I had success with Claude giving it a photo of my credit card bill and asking it to give me repeating category subtotals, and it flawlessly OCR'd it and wrote a Python program to do as asked, giving me the output.

I'd imagine if you asked it to do a comparison to something else it'd also write code to do it, so get it right (and certainly would if you explicity asked).


Maybe. But it’s always Claude. I even tried copying the text in directly to take OCR out of consideration. It still didn’t work very well.

> For example, using ChatGPT to get a response to a random question like "How do I do XYZ" is much more convenient than googling it

More convenient than traditional search? Maybe. Quicker than traditional search? Maybe not.

Asking random questions is exactly where you run into time-wasting hallucinations since the models don't seem to be very good at deciding when to use a search tool and when just to rely on their training data.

For example, just now I was asking Gemini how to fix a bunch of Ubuntu/Xfce annoyances after a major upgrade, and it was a very mixed bag. One example: the default date and time display is in an unreadably small "date stacked over time" format (using a few pixel high font so this fits into the menu bar), and Gemini's advice was to enable the "Display date and time on single line" option ... but there is no such option (it just hallucinated it), and it also hallucinated a bunch of other suggestions until I finally figured out what you need to do is to configure it to display "Time only" rather than "Data and Time", then change the "Time" format to display both data and time! Just to experiment, I then told Gemini about this fix and amusingly the response was basically "Good to know - this'll be useful for anyone reading this later"!

More examples, from yesterday (these are not rare exceptions):

1) I asked Gemini (generally considered one of the smartest models - better than ChatGPT, and rapidly taking away market share from it - 20% shift in last month or so) to look at the GitHub codebase for an Anthropic optimization challenge, to summarize and discuss etc, and it appeared to have looked at the codebase until I got more into the weeds and was questioning it where it got certain details from (what file), and it became apparent it had some (search based?) knowledge of the problem, but seemingly hadn't actually looked at it (wasn't able to?).

2) I was asking Gemini about chemically fingerprinting (via impurities, isotopes) roman silver coins to the mines that produced the silver, and it confidently (as always) comes up with a bunch of academic references that it claimed made the connection, but none or references (which did at least exist) actually contained what it claimed (just partial information), and when I pointed this out it just kept throwing out different references.

So, it's convenient to be able to chat with your "search engine" to drill down and clarify, etc, but a big time waste if a lot of it is hallucination.

Search vs Chat has anyways really become a difference without a difference since Google now gives you the "AI Overview" (a diving off point into "AI Mode"), or you can just click on "AI Mode" in the first place - which is Gemini.


> I asked Gemini (generally considered one of the smartest models

Everyone is entitled to their own opinion, but I asked ChatGPT and Claude your XFCE question, and they both gave better answers than Gemini did (imo). Why would you blindly believe what someone else tells you over what you observe with your own eyes?


I'm curious what was your Claude prompt? I used to use Claude a lot more, but the free tier usage limits are very low if you use it for coding.

Another reason search vs chat has become a difference without a difference is that search results are full of highly-ranked AI slop. I was searching yesterday for a way to get a Gnome-style hot corner in Windows 11, and the top result falsely asserted that hot corners were a built-in feature, and pointed to non-existing settings to enable them.

x86-64 SSE and AVX are also SIMD

SIMD and VLIW are somewhat similar but very different in the end.

True.

The ISA in this Anthropic machine is actually both, VLIW and SIMD, and both are relevant to the problem.


It does seem a bit of a strange challenge - a bit reminiscent of high school math problems where understanding the question was as much part of it as actually solving the problem when you understood it.

Since the focus of the challenge appears(?) intended to be optimization, not reverse engineering, it's a bit odd that they don't give a clear statement of what the kernel is meant to be computing. Perhaps the challenge is intended to be a combination of the two, but then the correct reverse engineering part of it becomes a gate for the optimization part, else you'll be solving the wrong problem.

Given the focus on results achieved by Opus 4.5, maybe that's the main point - to show how well Opus can reverse engineer something like this. If they gave the actual clear problem statement, then maybe you could brute force an optimal solution using tree search.


I just threw this prompt at Gemini, and it seems (I haven't analyzed the problem to see if it is correct), to be able to extract a clear understanding of the problem, and a specification for the kernel.

"Can you "reverse engineer" what the kernel in this optimization exercise is actually doing - write a specification for it?

https://github.com/anthropics/original_performance_takehome"

Gemini says it's doing inference on a random forest - taking a batch of inputs, running each one through each decision tree, and for each input outputting the sum of these decision tree outputs - the accumulated evidence.


So looking at the actual code (reference_kernel() in problem.py), this "random forest inference" is completely wrong!

It's doing some sort of binary tree traversal, but the hashing and wrap around looks weird - maybe just a made up task rather than any useful algorithm?


Yes, it’s made up.

This isn't "reverse engineering" it's merely "being able to read fairly simple code you didn't write". A much simpler version of the kernel is provided at the end of problem.py as reference_kernel2.

If you can't make sense of such a small codebase or don't immediately recognize the algorithm that's being used (I'm guilty of the latter) then you presumably aren't someone that they want to hire.


Fair enough, and there are clues in the comments too, but why not just provide the specification of the kernel (inputs and outputs) as part of the problem?

They do. They provide reference_kernel which shows the algorithm itself, build_mem_image which shows the data format you will be working with, and finally reference_kernel2 which implements said algorithm on said data format.

They then provide you with a very naive implementation that runs on their (very simple) VLIW architecture that you are to optimize.

If at the end of that someone is still lost I think it is safe to say it was their goal that person should fail.


Well, yes, they have a reference implementation as documentation, just as they have the simulator as documentation for the ISA ...

The problem is about pipelining memory loads and ALU operations, so why not just give clear documentatation and state the task rather than "here's a kernel - optimize it"? \_(ツ)_/


Presumably that is only one of two purposes, with the other being to test your ability to efficiently read, understand, and edit low level code that you didn't write. I imagine you'd regularly run into raw PTX if you worked for them in the relevant capacity.

And perhaps a third purpose is to use the simulator to test your ability to reason about hardware that you are only just getting familiar with.


I would assume that anyone optimizing kernels at Anthropic has full documentation and specs for what they are working on, as well as a personal butler attending to their every need. This is big money work - every 1% performance improvement must translate to millions of cost savings.

Maybe they specified the challenge in this half-assed way to deliberately test those sorts of skills (even if irrelevant to the job), or maybe it was just lazily put together.

The other thing to note is that if you look at what the reference_kernel() is actually doing, it really looks like a somewhat arbitrary synthetic task (hashes, wraparound), so any accurate task specification would really need to be a "line by line" description of the steps, at which point you may as well just say "here's some code - do this".


In a fast-paced domain such as this one, and especially wrt the (global) competitiveness, development/leadership process is most likely chaotic and "best" practices that we would normally find in other lower-paced companies cannot be followed here. I think that by underspecifiying the assignment they wanted to test the ability of a candidate to fit into such environment, apart from the obvious reason and which is to filter out not enough motivated candidates.

They do, but documentation is not always complete or correct.

> as well as a personal butler attending to their every need

I think they do and his name is Claude ;)


They are certainly going to get them! Apparently zero tariffs on Chinese made EVs coming into Canada, so they will have EV's of at least same quality as Tesla, but half price (e.g. see Marques Brownlee's recent Xiaomi review).

Far more important, I'd say, are the European countries having already got together to discuss joint sanctions/tariffs against the USA (their nominal NATO partner) if Trump moves to seize Greenland.

Macron has also said that he wants no part of Trump's (billion dollar entrance fee) "peace board" that he's going to be pushing at Davos.

A divorce from the USA would certainly hurt Europe, but it will also hurt the USA and it's ability to defend itself if it loses access to European intelligence and ability to have forward located military bases and refueling locations.

The Republican's are really shooting themselves, and the US, in the foot here by not standing up to Trump and therefore indicating that all this craziness is Trump rather than an enduring US policy that they support. Even if they flip flop when Trump is out of office, the rest of the world is never going to trust the US again.


Precisely. If the USA wants to be trusted at all it needs to act now or it will be too late. This is the precipice and without action from Congress & the Senate the rest of the world will make up their own mind about the country as a whole. This is probably one of the most expensive moments in history.

One of the most expensive moments in history so far

Fair, but since we can't look into the future that's an obvious thing. Every maximum is always a maximum 'so far'.

It's more of a Simpsons reference haha, sorry.

https://www.youtube.com/watch?v=bfpPArfDTGw


The Simpsons have passed me by. The same with the Teletubbies. I know they exist but I don't feel compelled to go and watch any of it. Life's too short.

Life's too short to punch down on what others like, too; but here we are.

Take care.


I'm not sure if I would like it or not, I just haven't been exposed beyond mentions such as yours and the occasional still, and I have a lot on my plate (not enough to forego HN though ;) ).

I feel a bit conflicted on this.

On the one hand I do want someone (or a group of someones) to stick it to the US and "teach it a lesson". I see the US as a bully, and I want to see the bully get punched in the nose.

On the other, I don't wish harm to the US (mostly the people). Also because the US backed against a corner can have potentially devastating consequences for the rest of the world.


The US and France are both nuclear armed, including nuclear armed submarines, which are the ultimate defence deterrent.

It's hard to see any country wanting to get into conventional war with China, regardless of size of army and airforce - even the US is not going to do it unless actually attacked. At the end of the day if China seizes Taiwan (something Trump has made more likely by his seizing of Venezuala, now talking about Greenland, Cuba ..), then the US will just complain, create trade sanctions and/or tarrifs etc.


> The US and France are both nuclear armed, including nuclear armed submarines, which are the ultimate defence deterrent.

Except for France's ASMP-A nuclear warning shot (which is not a tactical option by the way), all other independent European nuclear options (French or British SSBNs) are not only strategic, but they imply wiping entire countries off the map. All options also depend on whether the French President or the British Prime Minister decides that this course of action is warranted.

If Trump indeed invades Greenland, shoving a French-made nuclear fireball in front of an American carrier battle group off the coast of Greenland would probably not be our first option. Besides the massive political cost of breaking the nuclear taboo, if kinetic actions are deemed necessary, deterrent also comes in conventional varieties.

Also, while the British nuclear deterrent might be operationally independent, it relies on American-supplied Trident missiles.


I'm pretty sure ASMPA-R is precise enough to be considered tactical, and the fallout small enough to easily consider its firing over blue water. Yes, the number of French tactical (sorry, "pre-strategic") nuke is limited (around 50), but i think that's enough of a detterent.

It's not about the precision, it's the yield. We've divested our tactical options back in the 1990s because our nuclear policy is one of strict sufficiency and all our nukes are above a hundred kilotons of TNT. That doesn't leave a lot of room to thread any needle.

Also, our nuclear doctrine says that it's a nuclear warning shot, meaning that the next step on the ladder is French-delivered Armageddon by SSBNs. We're not supposed to keep lobbing them in case of a persistent problem.


I was really thinking of a standalone Europe's deterrence against attack by Russia or China, not using them against the US!

If you asked De Gaulle, it'd be deterrence against anyone.

That "nest of objects point to each other" makes no sense ... RAII is just a technique where you choose to tie resource management to the lifetime of an object (i.e. acquire in constructor, release in destructor).

If an exception gets thrown, causing your RAII object scope to be exited, then no problem - the object destructor gets called and the resource gets released (this is the entire point of RAII - to make resource allocation and deallocation automatic and bullet-proof).

If you are creating spaghetti-like cross-referencing data structures, then that is either poor design or something you are doing deliberately because you need it. In either case, it has nothing to do with RAII, and RAII will work as normal and release resources when the managing object is destroyed (e.g. by going out of scope).

RAII could obviously be used to allocate/free a resource (e.g. temp buffer) from a free list, but is not really relevant to arena allocation unless you are talking about managing the allocation/release of the entire arena. The whole point of an arena allocator is the exact opposite of RAII - you are deliberately disconnecting individual item allocation from release so that you can instead do a more efficient bulk (entire arena) release.


Another commentor succinctly pointed out one argument against RAII+friends is that it encourages thinking about single objects, as opposed to bulk processing.

In many contexts, the common case is in fact bulk processing, and programming things with the assumption that everything is a single, discrete element creates several problems, mostly wrt. performance, but also maintainability. [1][2]

> The whole point of an arena allocator is the exact opposite of RAII

Yes, agreed. And the internet is rife with people yelling about just how great RAII is, but comparatively few people have written specifically about it's failings, and alternatives, which is what I'm asking about today.

[1] https://www.youtube.com/watch?v=tD5NrevFtbU

[2] https://www.youtube.com/watch?v=rX0ItVEVjHc&t=2252


I don't think the "complex web of objects to be deallocated" scenario is usually a problem, but I generally agree with your points. As always, careful design and control of the software is important. Abstractions are limited; spend them carefully.

> one argument against RAII+friends is that it encourages thinking about single objects, as opposed to bulk processing.

RAII is just a way to guarantee correctness by tying a resource's lifetime to that of an object, with resource release guaranteed to happen. It is literally just saying that you will manage your resource (a completely abstract concept - doesn't have to be memory) by initializing it in an objects constructor and release it in the destructor.

Use of RAII as a technique is completely orthogonal to what your program is doing. Maybe you have a use for it in a few places, maybe you don't. It's got nothing to do with whether you are doing "bulk procesing" or not, and everything to do with whether you have resources whose usage you want to align to the lifetime of an object (e.g this function will use this file/mutex/buffer, then must release it before it exits).


The alternative to RAII is simply do-it-yourself ! For example, if you are writing multi-threaded code and need a mutex to protect some data structure, then you'd need an explicit mutex_lock() before the access, and an explicit mutex_unlock() afterwards... if your code might throw exceptions or branch due to errors, then make sure that you have mutex_unlock() calls everywhere necessary!

Automating this paired lock and unlock, to avoid any programmer error in missing an unlock in one of the error paths, just makes more sense, and this is all that RAII is doing - it's not some mysterious religion or design philosophy that needs to pervade your program (other than to extent that it would make sense to remove all such potential programmer errors, not just some!).

In this mutex example, RAII would just mean having a class "MutexLocker" (C++ provides std::lock_guard) that does the mutex_lock() in it's constructor and mutex_unlock() in it's destructor.

Without RAII, your code might have looked something like this:

try {

  mutex_lock(mut);

  // access data structure

  mutex_unlock(mut);
} catch (...) { // make sure we unlock the mutex in the error case too !

  mutex_unlock(mut);
}

With the RAII approach the mutex unlocks are automatic since they'll happen as soon as the mutex locker goes out of scope and its destructor is called, so the equivalent code would now look like this:

try {

  MutexLocker locker(mut);

  // access data structure
} catch (...) {

}

That's it - no big deal, but note that now there was no need to add multiple mutex unlock calls in every (normal + error) exit path, and no chance of forgetting to do it.

You can do the same thing anywhere where you want to guarantee that some paired "resource" cleanup activity take place, whether that is unlocking a mutex, or closing a file, or releasing a memory buffer, or whatever.

You may not think of it as RAII, but this approach of automatic cleanup is being used anywhere you have classes that own something then release it when they are done with it, for example things like std::string (and all the C++ container classes) is doing this, as are C++ <stream> file objects, C++ smart pointers, C++ lock_guard for mutexes, etc, etc.

The name "RAII" (resource acquisition is initialization) is IMO a poor name for the technique, since the value is more in the guaranteed, paired, resource release than the acquisition, which was never a problem. Scope-based resource management, or Lifetime-based resource management, would better describe it!

Of course RAII isn't always directly applicable because maybe you need to acquire a resource in one place and release it someplace else entirely, not in same scope, although you could choose to use a smart pointer together with a RAII resource manager to achieve that. For example create a resource with std::make_shared<ResourceManager>() in one place, and resource release will still happen automatically whenever reference counting indicates that all uses of the resource are done.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: