I'm not an expert in this field, but as far as I know ZRTP trust is bootstrapped from the signalling channel (in this case your XMPP connection). So a MITM there could also MITM your voice/video.
And again, your metadata and signalling would still be exposed without encryption - along with your IP addresses (ICE candidates), etc.
And again, your metadata and signalling would still be exposed without encryption - along with your IP addresses (ICE candidates), etc.