Tabular Methods

Q-learning, SARSA, dynamic programming, and Monte Carlo methods.

Dynamic Programming

Computing optimal policies via iterative Bellman updates when the full environment model is known – the theoretical foundation of reinforcement learning.

Eligibility Traces

Credit assignment mechanism that blends TD and Monte Carlo through exponentially decaying memory of visited states.

Monte Carlo Methods

Learning value estimates from complete episode returns – model-free RL through averaging sampled outcomes.

N-Step Methods

Bridging Monte Carlo and TD by bootstrapping after n steps – tunable bias-variance trade-off.

Q-Learning

Off-policy TD control that learns the optimal action-value function regardless of the behavior policy – the most influential tabular RL algorithm.

SARSA

On-policy TD control that updates Q-values using the action actually taken – safer than Q-learning in stochastic environments.

Temporal Difference Learning

Bootstrapping value estimates from incomplete episodes by updating toward one-step lookahead targets.