MIT 18.065 Lecture 10: Survey of Difficulties of Ax=b

Linear Algebra
Least Squares
Pseudo-Inverse
Ill-Conditioned Problems
Author

Chao Ma

Published

January 4, 2026

MIT 18.065 Course Page

Ax=b Difficulties

  • \(x=A^+b\)
  • size ok, condition( \(\frac{\sigma_1}{\sigma_r}\) ) ok, \(x=A\backslash b\)
  • \(m\gt n =r\),
    • A is overdetermined, there is no solution for x
    • use least square to find \(\hat{x}\)
    • \(A^\top A \hat{x}=A^\top b\)
  • \(m\lt n\)
    • A is underdetermined, x has many solutions
    • we pick the one with min norm
    • \(x=A^+b\)
  • columns in bad conditions
    • Gram-schmidt , factorize A to orthogonal
    • A=QR
  • Nearly singular
    • singular values very close to 0
    • the reverse will be quite big
    • add penalty to the solution
      • \(\min ||Ax-b||^2+\delta^2||x||^2\)
  • Problem too big
    • e.g. \(10^6\),iteratively
    • Krylov
  • way too big
    • e.g. \(10^{12}\)
    • randomize linear algebra
    • use probability to sample the matrix, work with samples

Survey of different difficulties when solving Ax=b

In deep learning, we are typically in an underdetermined regime (m < n), where infinitely many solutions exist. The key question is not whether a solution exists, but which solution the optimization algorithm selects. In practice, optimization methods such as (stochastic) gradient descent exhibit an implicit bias toward low-norm or simple solutions, even without explicit regularization.

Ill-Conditioned Problem

The solution is to solve \(\min ||Ax-b||_2^2+\delta^2||x||_2^2\).

Let \(A^*=\begin{bmatrix}A\\\delta I\end{bmatrix},b^*=\begin{bmatrix}b\\0\end{bmatrix},A^*x=b^*\).

so the least square solution of \(A^*x=b^*\) turns out to be \(\min||Ax-b||^2+\delta^2||x||^2\).

The normal equation is \((A^*)^\top A^* x=(A^*)^\top b^* \Rightarrow (A^\top A+\delta^2 I)x=A^\top b\).

\(x=(A^\top A+\delta^2I)^{-1}A^\top b\)

When \(\lim_{\delta \to 0}\), it turns out to be pseudo inverse solution.

Role of the Moore–Penrose Pseudo-Inverse in Different Regimes

Regime System Properties Existence of Solution What A^+ b Represents Key Interpretation
Square, full rank m = n, A invertible Unique exact solution \(A^{-1} b\) Pseudo-inverse reduces to the true inverse
Overdetermined \(m > n, \mathrm{rank}(A)=n\) No exact solution Least-squares solution Minimizes ||Ax - b||_2
Underdetermined \(m < n, \mathrm{rank}(A)=m\) Infinitely many solutions Minimum-norm solution Selects the smallest-\(\ell_2\) solution among all feasible ones
Rank-deficient \(\mathrm{rank}(A) < \min(m,n)\) Non-unique or unstable Stable, minimum-norm solution Removes null-space ambiguity
Ill-conditioned Small singular values Sensitive to noise Truncated / damped inverse Filters directions with little information
Penalty → 0 \(\text{Ridge with}\quad \lambda \to 0\) Well-defined limit \(A^+ b\) Pseudo-inverse is the zero-regularization limit
Gradient descent (init = 0) Linear least squares Implicitly chosen \(A^+ b\) Optimization bias toward minimum-norm