Papers in Deep Learning

Author

Chao Ma

Published

February 8, 2026

Research paper reading notes focused on key ideas, math intuition, and practical takeaways.


Generative Adversarial Nets GAN training as a minimax game between discriminator and generator, with the optimal discriminator derivation and the global optimum condition \(p_g=p_{data}\).

Why TPU Is Fast for Dot Product (First-Gen TPU) Focus on the first-generation TPU inference design: 8-bit arithmetic, latency-first execution, and the systolic-array MMU that makes ASIC specialization much faster than general-purpose processors for matrix MAC workloads.

Transformer: Attention Is All You Need The Transformer removes recurrence and centers the model on scaled dot-product attention, enabling parallel training and strong long-range dependency modeling.

Attention: The Origin of Transformer Introduce learnable alignment scores and a dynamic context vector, replacing the fixed encoder bottleneck in seq2seq models.

Batch Normalization: Accelerating Deep Network Training Normalize mini-batch activations with learnable scale and shift to stabilize training, improve conditioning, and speed convergence.

LoRA: Low-Rank Adaptation of Large Language Models Freeze the base model and learn a low-rank update \(\Delta W=BA\) for selected layers, achieving strong performance with far fewer trainable parameters and easy deployment.