I don't follow. Isn't this the flow for practically every neutral network i.e yo...

simonw · on Feb 18, 2024

Yes, but I've never seen it expressed so clearly as pseudocode before.

elcomet · on Feb 18, 2024

This is not specific to llms. So not really informative of how llms work. It also works for CNNs, LSTM, MLPs, or even any data processing program..

sigmoid10 · on Feb 18, 2024

Not really. LSTM for example would require a recursive element where you update the hidden state and then pass it through the same layer again as you complete the output sequence. In fact the pseudocode shows very nicely how much simpler transformers are. And MLP is already a component in the transformer architecture.

microtonal · on Feb 18, 2024

No? You could perfectly plug in an RNN or bidirectional RNN for layer. This is the pseudocode for applying multiple layers. It does not really matter what these layers are, transformer, RNN, convolution, dilated convolutions, etc. The recurrence happens within a layer, not between layers.

elcomet · on Feb 19, 2024

Exactly. Nothing prevents the list of layers to be the same or different layers.