You're assuming that network latency is the only latency that's involved here, but a huge latency source is the audio codec. Opus adds ~20ms latency, and that's the most low latency codec that's widely supported at the moment. You can see a comparison here: https://www.opus-codec.org/comparison/
There are all sorts of other latency that need to be taken into consideration too, and unfortunately in practice those do add up to live music being unplayable on pretty much any network.
There's a really interesting project called NINJAM https://www.cockos.com/ninjam/ which is designed for live music jam sessions. It flips this fundamental constraint on its head - instead of being real-time, it streams everyone else's output delayed by one bar (theoretically any interval >RTT I guess?). I haven't tried it, but it's a really cool idea.
Of course there's a ton of other potential sources of delay that make my fantasy hard to achieve, probably already starting at the typical USB microphones (in headsets/cameras).
20ms rtt through e.g. opus on a loopback network interface is already decidedly non-trivial to archive with "normal" hardware. When you do have low-latency devices, it becomes easy, but not everyone has those.
There are all sorts of other latency that need to be taken into consideration too, and unfortunately in practice those do add up to live music being unplayable on pretty much any network.
There's a really interesting project called NINJAM https://www.cockos.com/ninjam/ which is designed for live music jam sessions. It flips this fundamental constraint on its head - instead of being real-time, it streams everyone else's output delayed by one bar (theoretically any interval >RTT I guess?). I haven't tried it, but it's a really cool idea.