Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And the problem is that such a thing doesn't really work on mainstream CPUs, because memory access instructions take a highly variable amount of cycles depending on which cache level the data is in, which the compiler cannot know, and generating code for all possibilities leads to exponential blowup of the source code size.

It's not clear how Intel thought it could possibly work.



predication and queues can go a long way


Out of three VLIW architectures I have looked into, 2x (Elbrus and Itanium) rely on the predication heavily. i860 does not have instruction predicates.

Predication places the burden of creating optimal instruction bundles AND the correct hinting via the use of predicates on the compiler. If stars aligned, the code could perform blazingly fast. It turned out that aligning the stars in an optimal space time sequence was an arduous task due to the actual hints only being available at the runtime.

Which is where JIT has delivered well (and cheaper!) without requiring a radically different VLIW design.


Fundamentally it seems though that more information is available at run time? You may get partway there in the compiler, but assuming you have sufficient transistor budget, it seems more optimal to do reordering in the CPU.


The runtime doesn't know all that much, though. All it has is a single instruction flow, that it can extract fine-grained parallelism from and try to speed up further via speculation. Nothing whatsoever about other work that may be scheduled in when the processor is stalled by memory, other than via SMT. Nothing about priorities or coarse-grained dependencies among work units. So there's a lot of parallelism that's left on the table, and a lot of speculated work that might just be wasted.


> The runtime doesn't know all that much […]

If we are talking about JIT, yes, it does, for it instruments the runtime, gathers the information about hot code paths and performs the in-place optimisation. Think of the profile guide compile time optimisation having been carried over into the runtime.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: