> Many Virtualised workloads have many VMs with light CPU utilisation on each
Which is why every virtualization platform out there lets you oversubscribe CPUs. That's a solved problem, what's the benefit of having 100 VMs run on 32 slow cores vs 16 fast ones?
As mentioned above, context switching is expensive and extra L1 cache is valuable. Time-sharing can also have a huge effect on latency (because requests must wait until their server is scheduled), even when the throughput is
still good.
Even if time-sharing performs well most of the time, when it goes wrong, the performance problems can be opaque and hard to debug. In general solution that "really" does something will save engineer-days as compared to a thing that does it at the same price/performance trade-off, but virtually.
Which is why every virtualization platform out there lets you oversubscribe CPUs. That's a solved problem, what's the benefit of having 100 VMs run on 32 slow cores vs 16 fast ones?