Command Palette

Search for a command to run...

Linear Methods of AI

Normal Equation System

What is Normal Equation System

Imagine you have a complex optimization problem, but it turns out there's an elegant shortcut. In least squares problems, instead of doing direct optimization, we can transform it into a system of equations that's easier to solve.

When we want to minimize

minxAxb22\min_x \|A \cdot x - b\|_2^2

it turns out the solution can be found by solving the following system of equations

ATAx=ATbA^T A \cdot x = A^T b

This equation is called the normal equation system because it involves the concept of orthogonality or "normal" (perpendicular) state in vector space.

Fundamental Relationship

There's a very interesting relationship between the minimization problem and this normal equation system. Vector x^Rn\hat{x} \in \mathbb{R}^n is a solution to the least squares problem if and only if that vector satisfies the normal equation system.

In other words, finding x^\hat{x} that makes Axb22\|A \cdot x - b\|_2^2 minimal is exactly the same as finding x^\hat{x} that satisfies ATAx^=ATbA^T A \cdot \hat{x} = A^T b.

Why This System Works

To understand why this relationship holds, we need to look at it from a geometric perspective.

When x^\hat{x} gives the minimum value for Ax^b22\|A\hat{x} - b\|_2^2, then the error vector Ax^bA\hat{x} - b must be orthogonal to all vectors in the column space of matrix AA.

This column space consists of all vectors that can be written as AxAx for xRnx \in \mathbb{R}^n. The orthogonality condition means

0=Ax,Ax^b0 = \langle Ax, A\hat{x} - b \rangle

for every vector xRnx \in \mathbb{R}^n. Using the properties of inner product, we can write

0=(Ax)T(Ax^b)=xT(ATAx^ATb)0 = (Ax)^T (A\hat{x} - b) = x^T (A^T A\hat{x} - A^T b)

Since this relationship must hold for all vectors xx, then

ATAx^ATb=0A^T A\hat{x} - A^T b = 0

This is what gives us the normal equation system.

Proof Using Pythagorean Theorem

We can also verify this result in a different way. Let x^\hat{x} be the solution of the normal equation system and xx be any vector in Rn\mathbb{R}^n.

Using the Pythagorean theorem, we can write

Axb22=Ax^bA(x^x)22\|Ax - b\|_2^2 = \|A\hat{x} - b - A(\hat{x} - x)\|_2^2
=Ax^b22+A(x^x)222Ax^b,A(x^x)= \|A\hat{x} - b\|_2^2 + \|A(\hat{x} - x)\|_2^2 - 2 \langle A\hat{x} - b, A(\hat{x} - x) \rangle
=Ax^b22+A(x^x)2202(x^x)T(ATAx^ATb)=0= \|A\hat{x} - b\|_2^2 + \underbrace{\|A(\hat{x} - x)\|_2^2}_{\geq 0} - 2 \underbrace{(\hat{x} - x)^T (A^T A\hat{x} - A^T b)}_{=0}

Since x^\hat{x} satisfies the normal equation system, then ATAx^ATb=0A^T A\hat{x} - A^T b = 0 and the squared norm is always non-negative. Therefore

Axb22=Ax^b22+A(x^x)22Ax^b22\|Ax - b\|_2^2 = \|A\hat{x} - b\|_2^2 + \|A(\hat{x} - x)\|_2^2 \geq \|A\hat{x} - b\|_2^2

This inequality proves that x^\hat{x} indeed gives the minimum value.

When Normal Equation System Can Be Solved

Not all normal equation systems can be solved easily. There are special conditions that must be met.

For matrix ARm×nA \in \mathbb{R}^{m \times n} with mnm \geq n, the symmetric matrix ATARn×nA^T A \in \mathbb{R}^{n \times n} can be inverted if and only if matrix AA has full rank, that is Rank(A)=n\text{Rank}(A) = n.

This condition is very important because it determines whether the normal equation system has a unique solution. When ATAA^T A can be inverted, the solution can be written explicitly as

x^=(ATA)1ATb\hat{x} = (A^T A)^{-1} A^T b

Proof of Invertible Condition

To understand when ATAA^T A can be inverted, we need to look at the relationship between null space (kernel) and rank.

If ATAA^T A can be inverted, then the null space of ATAA^T A contains only the zero vector. Since the null space of ATAA^T A includes the null space of AA, then AA also has only the zero vector in its null space. This means Rank(A)=n\text{Rank}(A) = n.

Conversely, if Rank(A)=n\text{Rank}(A) = n, then the equation Ax=0Ax = 0 has only the solution x=0x = 0. To see that ATAA^T A can be inverted, note that if ATAx=0A^T Ax = 0, then

0=x,ATAx=xTATAx=Ax,Ax0 = \langle x, A^T Ax \rangle = x^T A^T Ax = \langle Ax, Ax \rangle

Since the inner product is only zero when Ax=0Ax = 0, and we know this only happens when x=0x = 0, then ATAA^T A is indeed invertible.

Positive Definite Property

More than just being invertible, matrix ATAA^T A has a special property. When Rank(A)=n\text{Rank}(A) = n and x0x \neq 0, we have Ax0Ax \neq 0 and

xTATAx=Ax,Ax>0x^T A^T A x = \langle Ax, Ax \rangle > 0

This shows that ATAA^T A is a positive definite matrix. This property guarantees that the normal equation system not only has a unique solution, but is also numerically stable when solved with computational methods. Algorithms like Cholesky decomposition can be safely used to solve this system.