Reinforcement Learning

Foundations through deep RL, policy gradients, model-based methods, RL for language models, and landmark applications.

Curriculum

A structured path through the course content.

MDPs, reward signals, policies, and the RL framework.

Q-learning, SARSA, dynamic programming, and Monte Carlo methods.

DQN, experience replay, and deep reinforcement learning.

REINFORCE, PPO, A2C, and actor-critic methods.

World models, planning, and model-based approaches.

Hierarchical RL, multi-agent RL, and inverse RL.

RLHF, reward modeling, and RL in the LLM training pipeline.

AlphaGo, Atari, robotics, and milestone RL achievements.