Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The loop itself is claimed to be the problem. It doesn't matter whether you use an AR or non-AR model. They both have a certain error probability that gets amplified in each iteration.


The per token error of the non-AR model wrapped with MPC is no higher than the per token error of the non-AR model without MPC. Likelihood of the entire sequence being off the true data manifold is just one minus the product of the per token errors, whether or not you're running with the MPC loop. Ie, wrapping the non-AR model in an MPC loop and thereby converting it to an AR model (with a built-in planning mechanism) doesn't increase its probability of going off track.

Per token error compounding over sequence length happens whether or not the model's autoregressive. The way in which per token errors correlate across a sequence might be more favorable wrt probability of producing bad sequences if you incorporate some explicit planning mechanism -- like the non-AR model wrapped in an MPC loop, but that's a more subtle argument than LeCun makes.


Yes. Also "other kinds of predictive models" in my comment refers to models other than generative language models, e.g. image classifiers or regression models etc. Those don't generate tokens, they output labels and the error of the labeling is constant (well, within error bounds). This was in response to OP's comment about "all prediction machines that make errors."




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: