Foundations
MDPs, reward signals, policies, and the RL framework.
Bellman Equations
The recursive decomposition of value into immediate reward plus discounted future value โ the fundamental identity of RL.
Exploration vs Exploitation
The core dilemma: exploit what you know for guaranteed reward, or explore the unknown for potentially better outcomes.
Markov Decision Processes
The mathematical framework formalizing sequential decision-making with states, actions, transition probabilities, and rewards.
Policies
The agentโs decision rule mapping states to actions โ the central object that RL algorithms learn.
Return and Discount Factor
Cumulative future reward geometrically discounted by gamma โ the objective every RL agent optimizes.
States, Actions, and Rewards
The three primitives of every RL problem: where you are, what you can do, and what you get for doing it.
Value Functions
Expected future return from a state (V) or state-action pair (Q) โ the backbone of most RL algorithms.
What Is Reinforcement Learning?
An agent learns to make sequential decisions by interacting with an environment and maximizing cumulative reward โ the third paradigm of machine learning.