To be fair, the FFTW/CUDA thing is due to fundamentally different hardware architectures which drove design constraints for these types of libraries. FFTW was never meant to run on a dedicated, ultra-parallel processor with highly optimized floating point instructions (GPU), but it is incredibly fast considering it runs on general purpose hardware. I am sure the FFTW authors could have done something to squeeze out more performance if they controlled both the hardware and software as NVidia does. And the transfer time to/from the GPU does matter, especially for smaller/more frequent operations...
All that aside, the psychology of pure functional vs. pure OOP vs. some hybrid methodology is really interesting, and even the view of what a "clean solution" is becomes tainted based on past experiences with other code written in that style.
All that aside, the psychology of pure functional vs. pure OOP vs. some hybrid methodology is really interesting, and even the view of what a "clean solution" is becomes tainted based on past experiences with other code written in that style.