> While I understand the core concept of 'just' picking the next word based on s...

wrp · on March 27, 2023

> ...the input is consumed and per the internal "world model" a high level internal representation of the input is built...

This is the aspect of ChatGPT I'm trying to understand. Can you point to any resources on this?

HarHarVeryFunny · on March 27, 2023

No - I'm not sure anyone outside of OpenAI knows, and maybe they only have a rough understanding themselves.

We don't even know the exact architecture of GPT-4 - is it just a Transformer, or does it have more to it ? The head of OpenAI, Sam Altman, was interviewed by Lex Fridman yesterday (you can find it on YouTube) and he mentioned that, paraphrasing, "OpenAI is all about performance of the model, even if that involves hacks ...".

While Sutskever describes GPT-4 as having learnt this "world model", Sam Altman instead describes it as having learnt a non-specific "something" from the training data. It seems they may still be trying to figure out much of how it is working themselves, although Altman also said that "it took a lot of understanding to build GPT-4", so apparently it's more than just a scaling up of earlier models.

Note too that my description of it's internal state being maintained/updated through the conversation is likely (without knowing the exact architecture) to be more functional than literal since if it were just a plain Transformer then it's internal state is going to be calculated from scratch for each word it is asked to generate, but evidentially there is a great deal of continuity between the internal state when the input is, say, prompt words 1-100 as when it is words 2-101 - so (assuming they haven't added any architectural modification to remember anything of prior state), the internal state isn't really "updated" as such, but rather regenerated into updated form.

Lots of questions, not so many answers, unfortunately!