BD Brain Drip
๐ŸŽฎ
Module 01 8 concepts

Foundations

MDPs, reward signals, policies, and the RL framework.

01

Bellman Equations

The recursive decomposition of value into immediate reward plus discounted future value โ€“ the fundamental identity of RL.

02

Exploration vs Exploitation

The core dilemma: exploit what you know for guaranteed reward, or explore the unknown for potentially better outcomes.

03

Markov Decision Processes

The mathematical framework formalizing sequential decision-making with states, actions, transition probabilities, and rewards.

04

Policies

The agentโ€™s decision rule mapping states to actions โ€“ the central object that RL algorithms learn.

05

Return and Discount Factor

Cumulative future reward geometrically discounted by gamma โ€“ the objective every RL agent optimizes.

06

States, Actions, and Rewards

The three primitives of every RL problem: where you are, what you can do, and what you get for doing it.

07

Value Functions

Expected future return from a state (V) or state-action pair (Q) โ€“ the backbone of most RL algorithms.

08

What Is Reinforcement Learning?

An agent learns to make sequential decisions by interacting with an environment and maximizing cumulative reward โ€“ the third paradigm of machine learning.