Lecture 8: Norms of Vectors and Matrices

Matrix Methods
Norms
MIT 18.065
Data Analysis
Author

Chao Ma

Published

November 29, 2025

Overview

This lecture covers vector and matrix norms with applications to regularization and sparsity:

  • Review vector p-norms and the geometry of unit balls
  • Explain when norms are valid (triangle inequality) and the S-norm defined by \(v^T S v\)
  • Compare \(\ell_1\), \(\ell_2\), and the “\(\frac{1}{2}\)-norm” as regularizers when solving \(Ax = b\)
  • Introduce matrix norms (spectral, Frobenius, nuclear) and relate them to singular values

Vector Norms

Definition

The p-norm of a vector \(v \in \mathbb{R}^n\) is defined as:

\[ \|v\|_p = \sqrt[p]{|v_1|^p + |v_2|^p + \cdots + |v_n|^p} \]

Common values: \(p = 0, 1, 2, \infty\)

  • \(\|v\|_0\): Number of nonzero components (not a true norm)
  • \(\|v\|_1\): Sum of absolute values (Manhattan norm)
  • \(\|v\|_2\): Euclidean length
  • \(\|v\|_\infty\): Maximum absolute value

Geometry of Unit Balls

The unit ball \(\|v\|_p = 1\) in \(\mathbb{R}^2\) has different shapes for different \(p\):

Unit balls for different p-norms Figure: Unit balls \(\|v\|_p = 1\) in \(\mathbb{R}^2\) for different values of \(p\). As \(p\) increases, the unit ball transitions from diamond (\(p=1\)) to circle (\(p=2\)) to square (\(p=\infty\)).


When Is It a True Norm?

Triangle inequality requirement: \(\|x + y\| \leq \|x\| + \|y\|\)

  • When \(p \geq 1\): \(\|\cdot\|_p\) is a valid norm
  • When \(p < 1\): Not a true norm (triangle inequality fails)

The “\(\frac{1}{2}\)-Norm”

For \(\|x\|_{1/2}\), this is not a true norm since \(p < 1\), but it provides a very strong sparsity penalty that pushes many components of \(x\) to zero when minimized.

Intuition: A norm must satisfy the triangle inequality (going straight is never longer than going in two steps). For \(p < 1\), the unit ball is not convex, so the triangle inequality fails.

The 1/2-norm unit ball Figure: The “\(\frac{1}{2}\)-norm” unit ball is non-convex. The triangle inequality fails because the straight path between two points can be longer than the sum of their norms.


S-Norm

Let \(S\) be a symmetric positive definite matrix. The S-norm is defined as:

\[ \|v\|_S = \sqrt{v^T S v} \]

Special case: When \(S = I\) (identity matrix), we get \(\|v\|_2\) (Euclidean norm).

Example

\[ S = \begin{bmatrix} 2 & 1 \\ 1 & 3 \end{bmatrix} \]

The unit ball \(v^T S v = 1\) forms an ellipse whose axes are determined by the eigenvectors of \(S\).

S-norm ellipse Figure: The unit ball for the S-norm \(\sqrt{v^T S v} = 1\) forms an ellipse. The shape and orientation depend on the eigenvalues and eigenvectors of the symmetric positive definite matrix \(S\).


Minimizing Norms: Regularization

When solving \(Ax = b\), we often want to minimize \(\|x\|_p\) to prefer certain solutions:

\(\ell_1\) Regularization

  • Property: Sparse solutions
  • Winner: Solution has many zero components (e.g., \([0, b]\) or \([a, 0]\))
  • Use case: Feature selection, compressed sensing

\(\ell_2\) Regularization

  • Property: Smooth, distributed solutions
  • Geometric interpretation: Find the point on the constraint line \(c_1 x_1 + c_2 x_2 = 0\) that intersects the smallest \(\|v\|_2 = c\) level set (circle)
  • Use case: Ridge regression, preventing overfitting

Minimizing different norms Figure: Comparison of minimizing \(\ell_1\) norm (diamond-shaped level sets, leading to sparse solutions at axes) versus \(\ell_2\) norm (circular level sets, leading to distributed solutions). The constraint line intersects different norms at different points, illustrating why \(\ell_1\) regularization produces sparsity.


Matrix Norms

Spectral Norm: \(\|A\|_2 = \sigma_1\)

The spectral norm measures the maximum “blow-up” of a vector:

\[ \|A\|_2 = \max_x \frac{\|Ax\|_2}{\|x\|_2} = \sigma_1 \]

where \(\sigma_1\) is the largest singular value of \(A\).

Winner: \(x = v_1\) (first right singular vector)


Frobenius Norm: \(\|A\|_F\)

The Frobenius norm is the square root of the sum of all squared entries:

\[ \|A\|_F = \sqrt{a_{11}^2 + a_{12}^2 + \cdots + a_{mn}^2} \]

Connection to SVD:

\[ \|A\|_F = \sqrt{\sigma_1^2 + \sigma_2^2 + \cdots + \sigma_r^2} \]

Why? Because \(A = U \Sigma V^T\), and both \(U\) and \(V\) are orthonormal matrices (they preserve \(\ell_2\) norm).


Nuclear Norm: \(\|A\|_N\)

The nuclear norm (also called trace norm) is the sum of singular values:

\[ \|A\|_N = \sigma_1 + \sigma_2 + \cdots + \sigma_r \]

Use case: Low-rank matrix completion (convex relaxation of rank minimization)


Summary of Matrix Norms

Matrix norms comparison Figure: Comparison of matrix norms - spectral norm \(\|A\|_2\) (largest singular value), Frobenius norm \(\|A\|_F\) (root sum of squared singular values), and nuclear norm \(\|A\|_N\) (sum of singular values).

Norm Formula Geometric Meaning
Spectral \(\|A\|_2 = \sigma_1\) Maximum amplification
Frobenius \(\|A\|_F = \sqrt{\sigma_1^2 + \cdots + \sigma_r^2}\) Root mean square of entries
Nuclear \(\|A\|_N = \sigma_1 + \cdots + \sigma_r\) Convex surrogate for rank

Key insight: All three matrix norms can be expressed in terms of singular values, connecting them to the fundamental SVD decomposition.