Speech recognition is a lot harder than you'd think, and the problems of far field speech recognition especially. All of google's in-house ASR (and wakeword) has been dedicated to near-field devices.
Speech is not a "solve it once and it works everywhere" kind of problem.
I would ballpark that Google needed to collect 10k hours of _far-field_ speech to train this to a level that was acceptable for the enormous variety of noise conditions and accents that a far-field system needs to work on. That scale of data collection takes time and a lot of trench-warfare effort.
Sure I understand, but my point is that they should be ahead of the game on basically everything else. Echo was released over a year ago so surely Google has been working on far-field tech since _at least_ then.
To still be aiming for end of year seems like some serious missteps and catch-up going on.
Speech is not a "solve it once and it works everywhere" kind of problem.
I would ballpark that Google needed to collect 10k hours of _far-field_ speech to train this to a level that was acceptable for the enormous variety of noise conditions and accents that a far-field system needs to work on. That scale of data collection takes time and a lot of trench-warfare effort.