Thanks; I'll take a look. But OpenMP is CPU-only right? Apple's got their (curre...

foxhill · on Nov 15, 2012

OpenMP 4.0 is likely to have support for accelerator devices (i.e, move the necessary data on to the device, run the computation, and move back to the host). in fact, that's one of the methods you can use the Phi right now (intel have extensions to OpenMP)

or if you can't be bothered to wait for such a standard, you should have a look at OpenACC[1], which does exactly this, and exists now. you end up adding code like

    #pragma acc kernels for

on top of your for loops, it does the low level work for you.

[1] http://www.openacc-standard.org/