Ah, yes, I'd forgotten about Gato. Thank you for reminding me. There's so much r...

gwern · on Dec 5, 2023

Yes. Gato was a good proof-of-concept that the Decision Transformer approach of 'just model literally everything as a sequence' scales well and doesn't exhibit some sort of catastrophic interference and can successfully imitation-learn from all the expert datasets, and a bit of transfer. But they need to push it at least another OOM or 2 to show major transfer, some emergences, and ideally do both from-scratch learning and additional learning on many new tasks. We continue to wait. :(

I hope it didn't all get rolled up into Gemini and become a state secret they'll never publish on again, or lost in the shuffle in the chaos of the DeepMind/Brain merger/liquidation.

cs702 · on Dec 5, 2023

> ...lost in the shuffle in the chaos of the DeepMind/Brain merger/liquidation

That's the most likely explanation, in my view.