Ah, yes, I'd forgotten about Gato. Thank you for reminding me. There's so much research activity that the Gato paper feels as if it was published eons ago. There's only so much I can retain in my puny little human mind at once!
In any case, I'm not sure Gato qualifies as a "large" model with 1.2B parameters -- it's kinda right below the threshold at which it could or would start exhibiting emergent behaviors. Maybe a new Gato with 10's or 100's of billions of parameters operating in the physical world?
Yes. Gato was a good proof-of-concept that the Decision Transformer approach of 'just model literally everything as a sequence' scales well and doesn't exhibit some sort of catastrophic interference and can successfully imitation-learn from all the expert datasets, and a bit of transfer. But they need to push it at least another OOM or 2 to show major transfer, some emergences, and ideally do both from-scratch learning and additional learning on many new tasks. We continue to wait. :(
I hope it didn't all get rolled up into Gemini and become a state secret they'll never publish on again, or lost in the shuffle in the chaos of the DeepMind/Brain merger/liquidation.
In any case, I'm not sure Gato qualifies as a "large" model with 1.2B parameters -- it's kinda right below the threshold at which it could or would start exhibiting emergent behaviors. Maybe a new Gato with 10's or 100's of billions of parameters operating in the physical world?