Would you really need a complicated LLM just to bark some instructions to a game?
There is a finite way of expressing instructions to the game, you don't want to get into a discussion about what 'my tanks' means in the middle of the action.
Siri could do this in 2011 and later versions work fine on device without connectivity.
I would love to see the code of an on device implementation that could accomplish a similar task. Pre LLM I have never seen an app that can take a variety of voice commands and execute on them with a good user experience. Are there any good open source implementations that can run without depending on proprietary calls to the device?
In fact I still haven't seen speech-to-command being implemented well in an end user app regardless of technology. Maybe I have looked at the wrong apps. I haven't used Siri to any large extent. Google home seems very gimmicky.
I recently tried doing a small bit of STT and I was underwhelmed by the quality I get on device. I used Whisper. It was bad enough that I concluded that I'd probably need to call to a hosted service instead.
And even if the TTS works really well, for most use cases you'll need to have an LLM reinterpret what your saying into commands.
IMO voice control is a bad fit for things that can be expressed by a few mouse clicks.
If you just want to interpret a few commands you don't need LLM, just hook up the TTS to the possible commands and it will limit itself to the possible words from the context. That's what Siri does and it works fine for stuff like turning the TV off (provided every action is available to the voice assistant code).
My not-connected 2015 car also uses that for handsfree navigation and while it's a bit slow it works quite reliably. I really think you overestimate the amount of stuff you really need an AI service for.