Tabular Methods
Q-learning, SARSA, dynamic programming, and Monte Carlo methods.
Dynamic Programming
Computing optimal policies via iterative Bellman updates when the full environment model is known – the theoretical foundation of reinforcement learning.
Eligibility Traces
Credit assignment mechanism that blends TD and Monte Carlo through exponentially decaying memory of visited states.
Monte Carlo Methods
Learning value estimates from complete episode returns – model-free RL through averaging sampled outcomes.
N-Step Methods
Bridging Monte Carlo and TD by bootstrapping after n steps – tunable bias-variance trade-off.
Q-Learning
Off-policy TD control that learns the optimal action-value function regardless of the behavior policy – the most influential tabular RL algorithm.
SARSA
On-policy TD control that updates Q-values using the action actually taken – safer than Q-learning in stochastic environments.
Temporal Difference Learning
Bootstrapping value estimates from incomplete episodes by updating toward one-step lookahead targets.