Foundations

MDPs, reward signals, policies, and the RL framework.

Bellman Equations

The recursive decomposition of value into immediate reward plus discounted future value – the fundamental identity of RL.

Exploration vs Exploitation

The core dilemma: exploit what you know for guaranteed reward, or explore the unknown for potentially better outcomes.

Markov Decision Processes

The mathematical framework formalizing sequential decision-making with states, actions, transition probabilities, and rewards.

Policies

The agent’s decision rule mapping states to actions – the central object that RL algorithms learn.

Return and Discount Factor

Cumulative future reward geometrically discounted by gamma – the objective every RL agent optimizes.

States, Actions, and Rewards

The three primitives of every RL problem: where you are, what you can do, and what you get for doing it.

Value Functions

Expected future return from a state (V) or state-action pair (Q) – the backbone of most RL algorithms.

What Is Reinforcement Learning?

An agent learns to make sequential decisions by interacting with an environment and maximizing cumulative reward – the third paradigm of machine learning.