Understanding Reinforcement Learning for LLMs — Part I: From Supervised Learning to Sequential Decision Making
Published:
Reinforcement learning has become one of the central ideas behind modern LLM post-training. Yet it is often discussed in a confusing way: sometimes as a mathematical framework, sometimes as an alignment recipe, sometimes as a set of algorithms such as PPO, DPO, GRPO, or RLHF.
