I know all of these words, but I do not know what they mean together. What is cu...

FeepingCreature · on March 7, 2025

I fed the two comments verbatim into Grok:

> Curriculum learning: Training that begins with easy examples, gradually increasing difficulty.

> RL (Reinforcement Learning): Learning via trial-and-error with rewards, like training a robot or model to optimize actions.

> ~7y ago: ~7 years ago (circa 2018).

> OpenAI Gym days: Refers to using OpenAI Gym, a toolkit for RL, popular in robotics/AI research ~2016-2018.

> LLMs and robotics: Large Language Models (LLMs) now leverage RL techniques from robotics for better performance.

I think the last one is a semi-hallucinatory stretch. LLMs are large language models, ie. ChatGPT, Sonnet, Grok, R1. Robotics are ... robotics. Building robots.

The actual answer to what the comment is saying is that until maybe a year back, we trained language models - still with RL, but with RL on token error, which isn't "real" RL because it executes tasks "by coincidence". That is, it happens to be that when you train a model to predict text, it also gains the ability to do tasks in the bargain, because the text contains agents that do tasks. A year or so ago, we started training models by having them do a task, judging if the task was successful or failed, and then performing RL on task outcome rather than token prediction. This is a return to "classic RL", but we had to pass through the "token RL regime" first so that the model could make progress on realistic tasks at all. It also means that LLMs can now increasingly be employed in robotics, where task RL training rules, as there is no massive preexisting robotics movements dataset like there is for text.

(Also, NLP is Natural Language Processing, ie. what LLMs do.)

NitpickLawyer · on March 7, 2025

Curriculum learning posits that you get better results if you gradually increase the training "difficulty". That is, learn to walk before you run. So you'd do "additions and multiplications first" and then "now draw the rest of the integral" :)

RL - Reinforcement Learning. You have a carrot and a stick. You run a model through iterations (in LLMs you generate n completions), you score each of them based on some reward functions, and if the result is correct you give it a carrot (positive reward), if the result is incorrect you give it a stick (negative or 0 reward). (simplified ofc)

OpenAI gym is (was?) an environment that allowed "AI agents" to be simulated in an environment. You could for example play games, or solve puzzles, or things like that. oAI gym was a "wrapper" over those environments, with a standardised API (observe, step (provide action), reward; rinse and repeat). You could for example have an agent that learned to land a lunar lander in a simple game. Or play chess. Or control a 3d stick figure in a maze.

katzenversteher · on March 7, 2025

I can only help with RL, that's probably reinforcement learning. As far as I remember that means you let the model perform a task that can be "graded" and then depending on how well it did it get's a reward it want's to maximize. I believe (this it where I'm very insecure, I could be wrong) the neurons (weights / biases) of the neurons that where involved in reaching the highest reward get adjusted to have a bigger influece.

InvidFlower · on March 7, 2025

I think you'll get a lot of use from this video from Andrej Karpathy: https://www.youtube.com/watch?v=7xTGNNLPyMI

It is long, but don't get scared off. He goes over a ton of different stuff related to model training, but makes it very easy to understand.