This device has a fully switched fabric allowing comms between any of the 256 "superchip" clusters at 900GB/s. That is dramatically faster than a direct host to GPU 32-lane PCI-E connection (which is crazy), and obviously dwarfs any existing machine to machine connectivity. The actual usability of shared memory across the array is improved significantly.
I mean...nvidia has obviously been using DMA for decades. This isn't just DMA.
No i mean the fact that Nvidia is now claiming that the memory the CPU has access to can be counted as memory for the GPU. the fabric is neat. the "We have 500 GB of ram per gpu" claim is questionable.