ML HW-SW Codesign
Notes on efficient AI systems where model design, compression, and hardware architecture are developed together.
Efficient AI Lecture 7: Neural Architecture Search (Part I) Classic efficient building blocks, cell-level NAS search spaces, elastic scaling dimensions, and the main architecture-search strategies from grid search to RL, differentiable search, and evolution.
Efficient AI Lecture 6: Quantization (Part II) Post-training quantization granularity, clipping and calibration, AdaRound, QAT with STE, and binary/ternary quantization methods for pushing precision lower without losing control.
Efficient AI Lecture 5: Quantization (Part I) Why low-bit arithmetic saves energy, how numeric formats trade off range and precision, and how K-means and linear quantization connect compression to hardware-friendly integer compute.
Efficient AI Lecture 4: Pruning and Sparsity (Part II) Layer-wise pruning ratios, automatic pruning with AMC and NetAdapt, fine-tuning after pruning, and the hardware systems that turn sparsity into real speed and energy gains.
Efficient AI Lecture 3: Pruning and Sparsity (Part 1) Why memory dominates energy, how pruning is formulated with an L0 constraint, the hardware tradeoff between unstructured and structured sparsity, and the main pruning criteria from magnitude to second-order and regression-based methods.
Efficient AI Lecture 1: Introduction Why efficient AI needs both algorithmic compression and hardware specialization: Deep Compression, EIE, MCUNetV3, efficient LMs, and the hardware trends driving co-design.