Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is not specific to dGPU, it could apply to any PCIe device. Emphasis on "theoretically" too.

On the device (dGPU here), it is possible to route memory accesses to part of the internal address space to the PCIe controller. In turn, the PCIe controller can translate such received memory access into a PCIe request (read or write), in the different PCIe address space, with some address translation.

This PCIe request goes to the PCIe host (CPU in a dGPU scenario). Here too the host PCIe controller can map the PCIe request, using using a PCIe address space address, into the host address space. And this can go to the host memory (after IOMMU filtering and address translation usually). And all this back for the return trip to the device in case of a read.

So latency would be rather high, but technically possible. In most application such transfers are offloaded to a DMA in the PCIe controller doing a copy between PCIe and local address spaces, but a processing core can certainly do a direct access without DMA if all the address mappings are suitably configured.



Uuuuuh, ok, but.. what’s the point of doing so? If I do zero-copy on a shared memory area between cpu and gpu, the advantage is clear - no copy and fast transfer.

If I map some host memory to the GPU… I get worse latency and worse bandwidth. Most likely not a win.


That's why the author says "theoretically" I guess ;) Yes in practice you probably wouldn't want your GPU compute engines to do such direct accesses and stall for a long time on each access, even for a one-shot streaming processing. Then even to avoid using the GPU main memory one would likely use DMA copies to a local working memory and do the processing there by chunks. But the direct mapping can still be convenient: a local DMA engine (or any HW coprocessor) can access host or GPU memory in the same way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: