Model-Based RL

World models, planning, and model-based approaches.

The Dyna Architecture

A foundational framework that interleaves real environment experience with simulated experience generated by a learned model, unifying learning, planning, and acting in a single loop.

Model-Based vs. Model-Free RL

The fundamental architectural choice in reinforcement learning – learn a model of how the world works and plan with it, or learn what to do directly from raw experience.

Monte Carlo Tree Search

A tree-based planning algorithm that combines random simulation with upper confidence bounds to efficiently search large decision spaces – the planning engine that powered AlphaGo’s victory over the world Go champion.

MuZero

A planning algorithm that learns its own model of the environment – predicting rewards, values, and policies in a latent space – achieving superhuman performance across board games, Atari, and beyond, without ever being told the rules.

Planning with Learned Models

Using neural network dynamics models for lookahead search, trajectory optimization, and data augmentation.

World Models

Learning compressed latent representations of environment dynamics so an agent can “dream” – planning and even training entirely within an imagined version of the world.