Information Theory
Notes on entropy, information, coding, and probabilistic structure, starting from the Oxford Math information theory lectures.
Oxford Information Theory Lecture 1: Defining Entropy and Information Surprise as negative log-probability, entropy in bits, KL divergence as mismatch cost, mutual information as dependence, and conditional entropy through the chain rule.