The loop itself is claimed to be the problem. It doesn't matter whether you use ...

psb217 · on Sept 20, 2024

The per token error of the non-AR model wrapped with MPC is no higher than the per token error of the non-AR model without MPC. Likelihood of the entire sequence being off the true data manifold is just one minus the product of the per token errors, whether or not you're running with the MPC loop. Ie, wrapping the non-AR model in an MPC loop and thereby converting it to an AR model (with a built-in planning mechanism) doesn't increase its probability of going off track.

Per token error compounding over sequence length happens whether or not the model's autoregressive. The way in which per token errors correlate across a sequence might be more favorable wrt probability of producing bad sequences if you incorporate some explicit planning mechanism -- like the non-AR model wrapped in an MPC loop, but that's a more subtle argument than LeCun makes.

YeGoblynQueenne · on Sept 21, 2024

Yes. Also "other kinds of predictive models" in my comment refers to models other than generative language models, e.g. image classifiers or regression models etc. Those don't generate tokens, they output labels and the error of the labeling is constant (well, within error bounds). This was in response to OP's comment about "all prediction machines that make errors."