I’m really skeptical about TTS. There’s lots of theoretical and academic and research lab stuff that supposedly is quantum leaps forward in TTS. However, all the commercial offerings sound like TTS and never match the breathless hype. Try them out and you can tell within seconds it’s a robot talking.
The SBB (Swiss Rail) is now using a quite sophisticated TTS system
[1] that sounds very clean and has a Swiss German accent in the high German its talking (Zürich accent based which is what most of the complaints are about). The old system was built using over 10k recording which was a enormous task
The system now is able to announce pretty much anything including reasons for delay etc. and it does not sound like a bunch of recordings attached to each other.
I am hoping it will be expanded eventually to switch accents depending on the region just how it already switches to French or Italian depending where you are.
> Zürich accent based which is what most of the complaints are about
I've given up on this a long time ago. Ads are almost always in a Zurich accent, except if the ad is for extremely regional stuff like secret cheese recipes or tourist ads for Graubünden.
Zurich is the tech hub of Switzerland and the accent seems to be the default. Then again, who in their right mind would want an AI to talk with an accent from Thurgau (/s)
Except that... while the German-speaking one sounds relatively well, I am definitely unhappy with the new system in French. It sounds more unnatural (and harder to understand in certain cases) than the old system. Maybe it's just a lack of tuning of the French version vs. the German one, but still...
Except when you're talking over a phone call that sounds like two tin cans connected with a string even on a good day.
I'm regularly astonished at how bad international calls in particular have become, and you're regularly subjected to these even domestically, since so many call centers are in the Philippines or India these days. And this despite bandwidth being cheaper than ever.
Unfortunately, all the bandwidth in the world can't compensate for latency, and the loss of quality in ADC, DAC and compression steps as currently implemented.
I can't think of a single VOIP system that sounded good. Of the ones I use regularly, like Messenger, Teams and Skype, they rarely if ever sound better than just making a regular phone call with the same device.