Function Approximation & Deep RL

DQN, experience replay, and deep reinforcement learning.

Neural network Q-function with experience replay and target networks – the breakthrough that launched deep RL.

Decoupling action selection from evaluation to correct DQN’s systematic overestimation of Q-values.

Separate network streams for state value and action advantage – learning “how good is this state” independently from “how good is this action.”

Storing and randomly sampling past transitions to break temporal correlations and improve sample efficiency.

Replacing lookup tables with parameterized functions to generalize across the vast state spaces of real-world problems.

Combining six orthogonal DQN improvements into one agent – the definitive value-based deep RL algorithm.

A frozen copy of the Q-network providing stable regression targets – preventing the “moving target” instability.