Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

what about using OpenMP?

My recollection is that I wrote a similar program (statistical bootstrapping, essentially a loop around random() ) and using OpenMP on a 4-core machine definitely produced a speedup (not x4 but close)

Does OpenMP somehow sidestep this issue of shared access? (or is my memory wrong)



My first thought was that this was a silly question, that there is nothing magic about OpenMP, and that the throughput would be the same. But thinking more, this would be a good thing to benchmark.

The difference is that OpenMP will be using multiple processes instead of multiple threads. Since the bottleneck in this case is access to memory in common between the threads, and since different processes won't be sharing this memory, I think OpenMP would indeed have a 4x speedup on this problem on a 4 (physical) core machine if the runtime was long enough to offset the cost of launching the processes with OpenMP.

More generally, it would be worthwhile to benchmark the performance difference between multiple processes with created with fork() versus multiple threads pthread_create(). If there is no need to write to the same memory (as with a bootstrap) the shared-nothing process based approach is going to more likely to get a linear speedup and is usually easier to reason about.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: