Over the past three weeks I've tried a few different conferencing solutions, inc...

saghul · on April 8, 2020

We don't have a way to turn these on from the UI, but here is how you can disable all audio processing:

https://meet.jit.si/YourRoonNameHere#config.disableAP=true

telesilla · on April 8, 2020

Firstly, thanks for your work on what is really a great project. Can we set stereo=1 in the SDP and also the bandwidth constraint? That would make it ideal for this use case.

For music quality webRTC you need 3 things: disable audio processing, stereo=1 in the SDP and a way to limit bandwidth usage so it doesn't saturate the available bandwidth and create errors.

Disabling video is also really the best thing to do when recording for this reason (bandwidth saturation), and also Chromium will give you much superior experience. Safari and Firefox isn't quite there yet: Safari can't let you choose your output device and lacks some other useful features, and Firefox doesn't yet seem to allow stereo Opus, maybe that's changed since I tested. Microsoft Edge is now Chromium so you're good to go.

padenot · on April 8, 2020

Firefox has supported stereo opus for a very long time (four years at least?). We know it works, it's used by medical professionals for their job and they wrote a message a few month thanking us for this feature, that doesn't seem to work on other browser (according to them, but I see tickets open on chromium).

Of course all the chain has to be stereo, that goes without saying: input signal is stereo, negotiation has been done in stereo, having enough bandwidth is important (otherwise opus goes mono), and then playback has to be on stereo hardware (but that's the easy part).

vitro · on April 8, 2020

We hold regular flute meetings and play together. In this quarantine time we wanted to meet online, but if we play all at once, it seems I cannot hear everyone else at the same time. I guess it is as if everyone was shouting over everyone else, which is not the case when you have a meeting where usually only one person speaks at a time

Will this also fix this issue? So everyone will be able to hear everyone?

gnud · on April 8, 2020

You won't be able to play together because of latency.

You will think you are in time with someone, but you will react when you hear/see them on your screen, which is maybe .15 seconds after they actually made the sound/movement. And then they will hear/see your reaction .15 seconds later again.

tpolzer · on April 8, 2020

If all participants have good internet and are geographically close it should theoretically be possible to have delay not much greater than rtt/2 for everybody.

With rtt < 20ms that should make musical performances possible. After all, sound only travels less than four meters in 10ms. So this is just like singing in a choir (with more visual delay - but that can be solved by having a conductor).

Unfortunately I'm not aware of any software making that a practical reality, even with ftth.

lachenmayer · on April 8, 2020

You're assuming that network latency is the only latency that's involved here, but a huge latency source is the audio codec. Opus adds ~20ms latency, and that's the most low latency codec that's widely supported at the moment. You can see a comparison here: https://www.opus-codec.org/comparison/

There are all sorts of other latency that need to be taken into consideration too, and unfortunately in practice those do add up to live music being unplayable on pretty much any network.

There's a really interesting project called NINJAM https://www.cockos.com/ninjam/ which is designed for live music jam sessions. It flips this fundamental constraint on its head - instead of being real-time, it streams everyone else's output delayed by one bar (theoretically any interval >RTT I guess?). I haven't tried it, but it's a really cool idea.

tpolzer · on April 8, 2020

Just because 20ms is the default doesn't mean that it has to be that way. The chart you linked shows a big "bubble" for opus for a reason.

Opus minimum frame size is actually 2.5ms: https://tools.ietf.org/html/rfc6716#section-2.1.4

Of course there's a ton of other potential sources of delay that make my fantasy hard to achieve, probably already starting at the typical USB microphones (in headsets/cameras).

namibj · on April 9, 2020

20ms rtt through e.g. opus on a loopback network interface is already decidedly non-trivial to archive with "normal" hardware. When you do have low-latency devices, it becomes easy, but not everyone has those.

USB should not really be an issue here, however.

kragen · on April 10, 2020

Musicians building digital audio workstations commonly have to replace the whole software stack to get audio latency down to an acceptable (<10ms) level: JACK instead of PulseAudio, a Linux kernel recompiled with custom options for low latency, other software reconfigured to use the JACK APIs, and so on. Sometimes they can't use whatever standard audio hardware. (And remember that USB polling frequency is normally only 100 Hz: 10 ms worst-case by itself.)

Minimizing latency is certainly technically feasible, it's just hard for stupid reasons.

hunter2_ · on April 9, 2020

I haven't tried it yet, but sofasession.com seems optimized for this. Using wired Ethernet instead of WiFi can go a long way, from what I've heard. Has anyone here tried it?

atoav · on April 9, 2020

Depends on the type of music, something slow and choral can easily deal with high latencies, while something quick, rhythmic and precise can't be harder to deal with.

Moru · on April 9, 2020

Has anyone tried Mumble for this? It's very low latency but I can't find exactly how low the latency is. It ofcourse depends also on your internet connection and other settings but the base latency that comes from buffering the sound before sending. Mumble also has lots of settings for sound quality and different sound formats so might work for music if you try all the settings.

Nextgrid · on April 9, 2020

Mumble has a setting for the audio buffer size and in fact they make you set it during initial configuration. It works great, has low latency and doesn't use much bandwidth (I hosted a server on a 1Mbps DSL connection for several people back in the days).

Moru · on April 12, 2020

I'm hosting one on my Pi-Hole but no one want's to use it any more.

vitro · on April 8, 2020

Latency is not that big problem I'd say. We play a music where it does not matter that much, sometimes just playing one long tone for the length of everyone's breath.

I just would like to hear everybody at the same time, but what I hear is always one person's sound getting preference over others. Or sounds just alternate randomly based on the volume, I'd guess.

lmm · on April 8, 2020

Musicians already deal with that kind of issue when doing particular kinds of performance (e.g. famously at Wagner's festival opera house, where the orchestra is in a deep pit below the singers).

gnud · on April 9, 2020

Yes - then you all follow a conductor. And you _don't_ listen for cues, so there's no reason to worry about not hearing everyone at the same time...

nh2 · on April 8, 2020

My understanding is that WebRTC echo cancellation seems to work work by just attenuating down the everybody who is not the loudest speaker.

You could try an app like Mumble, where you can turn that off (and also have other detail controls, at the cost of a bit more initial setup).

kabes · on April 9, 2020

That's not how it works. In fact there are multiple algorithms depending on the browser, it's not defined in the spec. The most used one currently would be AEC3 from Google, which is quite a bit more advanced than what you describe.

BrowserMeeting · on April 13, 2020

Had no idea that you could do this. This would be awesome for music lessons

tasty_freeze · on April 8, 2020

Thanks for the quick reply. I'll be trying it out tonight.

padenot · on April 8, 2020

This audio processing is trivially deactivable by the websites themselves. Instead of doing:

> navigator.mediaDevices.getUserMedia({audio: true}).then(...)

to get a stream of the input device, one can do:

> navigator.mediaDevices.getUserMedia({audio: { autoGainControl: false, noiseSuppression: false, echoCancellation: false }}).then(...)

similarly, an _existing_ input audio stream can have its settings changed while it's running like so:

> stream.applyConstraints({ autoGainControl: true, noiseSuppression: true, echoCancellation: true })

this for examples re-enables the processing that we put on voice by default.

This probably works everywhere, we've written a blog post about this that explain a few more things: https://blog.mozilla.org/webrtc/fiddle-of-the-week-audio-con....

If the website doesn't want to offer a control to switch this on/off, I'm confident this can be done by a browser extension in no time (which would have the benefit to work for all websites).

tasty_freeze · on April 8, 2020

padenot, although I am a programmer of sorts, I don't do web development, so I'm at a loss. Say I go to the jitsi website (https://meet.jit.si/), type in my four word passcode, and get a conference connection with my teacher. When you say, "instead of doing..." doesn't apply to me, because I don't do anything. It sounds like what you are describing is what the developer of that web page needs to do, but me, as a user, doesn't see any of that.

padenot · on April 8, 2020

Yes this is what I meant when I said "by the websites themselves".

montroser · on April 8, 2020

Audio processing is a risky move -- so hard to get right. We've been using https://team.video at work, and one thing I absolutely love about it is how they handle audio / muting.

When you're speaking, you don't have to wonder if others can hear you because your microphone pulses in green visually as you speak. If your audio isn't working it shows in yellow with no pulsing, and you and everyone else can see your audio is not flowing.

Also, if someone else forgot to mute and their kid is making a ruckus, you can just mute them. You don't have to wait for a moment to interject and ask them verbally, you can just go ahead and do it.

Or, when you see someone else in their video feed trying to speak up, but they forgot to unmute, you just unmute them. No everyone saying, "you're muted" over each other.

It takes a second to get used to the idea that everyone has all the power, but in practice it just makes everything go way smoother.

kosinus · on April 8, 2020

Unmuting others sounds like a scary feature. You don't want someone to unmute you without your knowledge.

montroser · on April 8, 2020

It's only scary in the same way that it's scary how anyone walking down the street could kick you in the pants when you're walking down the street.

They could but they won't because we live in a society. Which is great because that means we don't have to walk around in steel suits to avoid getting kicked in the pants.

I choose to trust the people I work with every day. And then as a bonus, I don't have to hear people yelling "you're muted!" at one another. We just get on with it.

atoav · on April 9, 2020

No. If you mute yourself you are distancing yourself from a conversation. Unmuting someone is like following them secretly on the street into their house and listening in on them standing behind their curtain. It is creepy and wrong and shouldn't be possible. Maybe they are having a fight with their spouse? You shouldn't be able to listen in on someone who muted themselves without them aknowledging it

The only legitimate use case I see for this is if you are working with e.g. elderly people who have a hard time understanding the whole thing and even then it shouldn't be possible without them clicking on "Yes" explicitly.

BrandoElFollito · on April 9, 2020

I can understand the possibility to unmute someone ONCE at the beginning of the call when you see them speaking.

Then if I CHOOSE to mute myself out is nobody's business to unmute me. I do this for specific reasons and know hiw and when to unmute whrn needed.

namibj · on April 9, 2020

The difference there is that kicking people on the street lands you in jail (likely not the first time, but if you do it repeatedly...), and remote unmuting would likely require wiretap laws in unconvenctional ways to even get a judgment of whether it's illegal or not (not even considering what it brings with it).

Also, you're implying that one would only use this technology to communicate with people you work with every day. What about a meeting with outsiders/contractors/customers? You might not actually have those yourself, but someone usually has to do those.

trissypissy · on April 9, 2020

I know we're getting far from the original point here, but I'm going to seize this opportunity anyway: the so-called "thin blue line" is not the reason society is able to function. We work as a collective /despite/ the presence of police and the prison industrial complex, not /because/ of them.

montroser · on April 9, 2020

Exactly. In the case of a video call with your colleagues, everyone collectively manages mute states so that the group can be more productive.

Then, if one asshole starts unmuting maliciously, they get shunned real quick and then fired if they keep it up. We don't need to limit ourselves with draconian measures when social norms and expectations will already suffice.

kragen · on April 10, 2020

This is why I was so opposed to Zoom when my company started adopting it a few years ago: the room dictator or whatever it's called can unmute you. (Maybe only if they had muted you in the first place.)

cvwright · on April 9, 2020

Here's something that a colleague passed along to a group of CS profs.

It's written by a music professor and geared toward using Zoom for music, but several of us Zoom newbies found it to be helpful more generally. He mentions the issue of disabling the speech-centric audio postprocessing.

http://musictechexplained.com/

Disclaimer: The author apparently makes his money selling eBooks, so you may want to skip through the several pages of promotional material at the beginning of his PDF to get to the good stuff.

miki123211 · on April 8, 2020

Use TeamTalk[1]. If you need high audio quality, TT beats everything else you can find, maybe except very expensive software for radio stations. I've successfully used it to stream music and it works.

It's Teamspeak and Discord like, so you need to connect to a server, either public or self-hosted, join or create a channel, and then you will be able to talk to everyone on that channel. This is perfect for permanent communities where people just hang out, but works for one-offs too. It works on Windows, Linux, Mac, iOS and Android, no web access. The server is also available for Raspberry Pi. Half of it is open-source, but the core SDK needs a license if you're developing with it. The program itself is free, even for commercial use.

It uses Opus and lets you adjust the quality and processing, so you can get a lot out of it. We've been using it in our community for about 10 years now, including for radio broadcasting, and we haven't managed to find anything better since. I know of one local radio station and recording studio who successfully use it for remote work now.

To get the best experience, disable all audio processing in the preferences, on the sound system pane, so duplex mode, automatic gain control and noise reduction should be off. If you're on Windows, use Windows Audio Session as the backend for lowest latency.

Then, connect to a server, I use the German one for public stuff, as I'm close to it geographically and you don't need to register for it, but use whichever you want. After connecting, create a channel with application set to music, bitrate set to 150000 and channels set to stereo. Those are, at least, the parameters I use, and they work great. You can adjust the rest as you see fit.

There are some video and screensharing capabilities as far as I know, but I haven't used them. Audio is definitely the primary focus. If you need any assistance, my username here at gmail dot com is the way to go.

[1] bearware.dk for desktop, App Store and Google Play for iOS and Android.

ps. I'm not affiliated with the company in any way, it's just a tool I use daily and would recommend to anyone who knows his way around the computer. It definitely doesn't pass the grandma test, though.

jpdus · on April 8, 2020

On Zoom you can activate raw, non-preprocessed sound. (I never tried it and don't know whether it works well for music)

TheDesolate0 · on April 8, 2020

Bear (rar!) in mind this will only work if you have the uplink bandwidth to do so.

loeg · on April 8, 2020

"High"-bitrate lossy CBR compression is probably acceptable enough — at least compared with a voice codec! mp3 at max (320) is only 320 kB/s, doesn't have the security issues that variable-bitrate compression does, and preserves audio "ok" (it does delete the high frequencies above 20-22 kHz). No patent issues anymore, either.

Ogg-Vorbis may be an even better option for all kinds of reasons, but mp3 is more universally recognized.

pedrocr · on April 8, 2020

Supposedly Opus supersedes all other codecs at all bit rates:

https://en.wikipedia.org/wiki/Opus_(audio_format)#/media/Fil...

It should be a matter of giving it enough bandwidth and let it make good decisions based on that.

clarry · on April 8, 2020

MP3 and Vorbis have bad latency.

There's not much reason not to use Opus, which has better quality/bitrate and lower latency.

loeg · on April 9, 2020

That is a perfectly acceptable solution as well!

xiii1408 · on April 8, 2020

Awesome! I've been wondering how to do this, since I normally take calls in a quiet room with headphones, so there's no need for noise canceling. It would be nice if you could enable this on a per-call basis, though.

cpeterso · on April 8, 2020

Here is how to enable this Zoom feature ("Preserve Original Sound"):

https://support.zoom.us/hc/en-us/articles/115003279466-Prese...

kabes · on April 9, 2020

I work for a company that builds virtual classrooms based on webrtc. Our customers are mostly business schools, but we have some music schools. For them we activate a different profile that disables all audio processing and selects the music profile of opus (opus are in fact 2 different codecs, one aimed at speech, one aimed at music). It would likely be very easy to do something like this in jitsi meet as well, since webrtc has everything onboard. The tricky part is that you also need to disable echo cancellation. So everyone must be wearing headsets and so on.

jayunit · on April 9, 2020

This sounds really cool! What’s the company? (I also work on WebRTC-based classrooms, at Minerva - we haven’t looked outside of voice in the audio sphere though.)

sitkack · on April 11, 2020

For this use case, you need selective fidelity and shared control over a sampler, with each sample having low resolution video, high quality audio and an arbitrary number of tags or notations (with a time range).

oever · on April 9, 2020

Jamulus is excellent for playing music together. It has low latency and high audio quality. You can host your own server or use a public one.

http://llcon.sourceforge.net/

_spduchamp · on April 9, 2020

Have you tried NINJAM?

tasty_freeze · on April 9, 2020

Yes, but it is entirely unsuitable for real-time conversations. The only thing it works for is modal jams or something like 12 bar blues that loop the same fixed chord structure over and over.