> compile-time speed ups and potentially the use of the rorx instruction
Indeed, but the 'compile-time speedups' are a legitimate point in favour of the compiler. If they didn't occur to the assembly programmer, or struck them as too complex to pull off, then the compiler deserves the point.
Also, it's neat to see instruction-selection being so significant. Generally might expect cache/branch behaviour to be the kicker.
> It cannot be faster than the fastest assembly
Well, 'fastest assembly' is the domain of so-called 'superoptimisers', and has us pushing at the stubborn bounds of computability.
We were talking about hand-written assembly-code, compared to compiler-generated. Odds are that none of the binaries tested were the optimal assembly.
The only interesting question is whether the hand-tuned assembly code they tested, was the fastest available at the time. If not, the whole demonstration is a straw-man, of course.
Also I don't like that the winning Zig program runs for so much longer. A good benchmark should provide ironclad assurances that no candidate is getting an unfair advantage re. loading/'warm-up'.
Indeed, but the 'compile-time speedups' are a legitimate point in favour of the compiler. If they didn't occur to the assembly programmer, or struck them as too complex to pull off, then the compiler deserves the point.
Have to say I don't follow why the hand-tuned assembly doesn't use the rorx instruction. It's not mentioned in the assembly file, at least, but I thought that was the point? https://www.nayuki.io/page/fast-sha2-hashes-in-x86-assembly
Also, it's neat to see instruction-selection being so significant. Generally might expect cache/branch behaviour to be the kicker.
> It cannot be faster than the fastest assembly
Well, 'fastest assembly' is the domain of so-called 'superoptimisers', and has us pushing at the stubborn bounds of computability.
We were talking about hand-written assembly-code, compared to compiler-generated. Odds are that none of the binaries tested were the optimal assembly.
The only interesting question is whether the hand-tuned assembly code they tested, was the fastest available at the time. If not, the whole demonstration is a straw-man, of course.
Also I don't like that the winning Zig program runs for so much longer. A good benchmark should provide ironclad assurances that no candidate is getting an unfair advantage re. loading/'warm-up'.