Reinforcement Learning
Course notes on reinforcement learning, starting with David Silver’s foundational RL lectures.
David Silver RL Course - Lecture 4: Model-Free Prediction Model-free policy evaluation through Monte Carlo returns, first-visit vs every-visit updates, TD learning, the bias-variance tradeoff, and TD(lambda).
David Silver RL Course - Lecture 3: Planning by Dynamic Programming Dynamic programming in known MDPs: optimal substructure, iterative policy evaluation, policy iteration, value iteration, and the classical gridworld examples.
David Silver RL Course - Lecture 2: Markov Decision Process Markov property, transition matrices, Markov reward processes, return and discounting, Bellman equations, and the move from prediction to control in MDPs.
David Silver RL Course - Lecture 1: Introduction to Reinforcement Learning What makes RL different from supervised learning, the agent-environment loop, Markov state, policy/value/model, and the core RL tradeoffs.