The hard part is to separate background voices (e.g. TV, chatter, etc) from the primary speaker's voice. Basically do voice isolation.
Voice fingerprinting would help only in this context.
On average, an hour of speech contains about 9,000 to 15,000 words. This range accounts for different speaking speeds, which typically vary from 150 to 250 words per minute.
You have trial until April 1st. Then it converts to 120mins/week.
If you are a student, teacher, hospital worker or a non-profit - reach out to support@krisp.ai and you will receive 6 months free.
It definitely removes the background noise and increases legibility.
However the voices sound "pinched" to me. It is a lot like one of those head related transfer functions that is supposed to make you think the sound comes from above, but it sounds like multiple band reject filters were applied and makes me feel some kind of pressure in my head.
Apparently getting access to the microphone stream during calls, even from your own app, is really tough. There is only one provider that we know that implements the concept of "advanced audio filters" and that's Twilio's Voice SDK for iOS.