> This sounds like an appeal to a "sufficiently smart compiler" Because it is. I...

shoki · on Jan 12, 2014

> It's a long road from "gee, I changed something, better copy everything" to something as good as FFTW or ATLAS.

CUDA implementations of FFTs and matrix operations are faster than both FFTW and ATLAS, and they are neither sequential nor functional.

CUDA, C, and Haskell all have domains they typically outperform one another. The math vs. simulation divide sketched in this blog post is more an expression of the author's own psychology more than anything else.

kkjkok · on Jan 15, 2014

To be fair, the FFTW/CUDA thing is due to fundamentally different hardware architectures which drove design constraints for these types of libraries. FFTW was never meant to run on a dedicated, ultra-parallel processor with highly optimized floating point instructions (GPU), but it is incredibly fast considering it runs on general purpose hardware. I am sure the FFTW authors could have done something to squeeze out more performance if they controlled both the hardware and software as NVidia does. And the transfer time to/from the GPU does matter, especially for smaller/more frequent operations...

All that aside, the psychology of pure functional vs. pure OOP vs. some hybrid methodology is really interesting, and even the view of what a "clean solution" is becomes tainted based on past experiences with other code written in that style.