Reinforcement Learning

Author

Chao Ma

Published

March 5, 2026

Course notes on reinforcement learning, starting with David Silver’s foundational RL lectures.


David Silver RL Course - Lecture 4: Model-Free Prediction Model-free policy evaluation through Monte Carlo returns, first-visit vs every-visit updates, TD learning, the bias-variance tradeoff, and TD(lambda).

David Silver RL Course - Lecture 3: Planning by Dynamic Programming Dynamic programming in known MDPs: optimal substructure, iterative policy evaluation, policy iteration, value iteration, and the classical gridworld examples.

David Silver RL Course - Lecture 2: Markov Decision Process Markov property, transition matrices, Markov reward processes, return and discounting, Bellman equations, and the move from prediction to control in MDPs.

David Silver RL Course - Lecture 1: Introduction to Reinforcement Learning What makes RL different from supervised learning, the agent-environment loop, Markov state, policy/value/model, and the core RL tradeoffs.