> it may be just okay to use the traditional way consisting of testing, log anal...

> it may be just okay to use the traditional way consisting of testing, log analysis, and visualizing that everyone is familiar with.

Not at all. I spent quite a lot of effort on debugging distributed deadlocks in a highly available system that was the best selling product in its category at one of the most famous (and loved) software companies based in SV and the amount of things that could (and will) go wrong is infinite, given every piece of infrastructure has its difficult-to-find/reproduce bugs. Things like sockets stopping responding after a few weeks due to an OS bug, messing up your distributed log, or unexpected sequences of waking up during a node crash and fail-over because some part of the network had its issues, leading to split brain for a few moments that needs to be resolved real-time, a brief out of memory issue that leads to a single packet loss, messing up your protocol etc. We used sophisticated distributed tests and that was absolutely inadequate. You are still in a naive and optimistic mode, though perhaps you as a designer will be shielded from the real-life issues your poor developers would experience as usually original architects move on towards newer cooler things and leave the fallout to somebody else.