Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes - ASR on browser/device has reached the quality threshold pretty much with distil whisper/ whisper 3.

Quality TTS with fast latency is still not there but getting better (you can use Tortoise but its slow and compute expensive as far as I know). Bark is another option but has mixed results.

I played around with a recent transformers.js one (using SpeechT5) you can run in your browser and am optimistic where we can go with some improvements:

https://tinyllms.vercel.app/dashboard/tts



I was recently pointed to xtts:

https://coqui.ai/blog/tts/open_xtts

I dunno about performance, but the model is 1.86GB.


Yes - that one is also a good option. Easy to use with services like Replicate: https://replicate.com/lucataco/xtts-v2




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: