# Nakafa Learning Content

> For AI agents: use [llms.txt](https://nakafa.com/llms.txt) for the site index. Markdown versions are available by appending `.md` to content URLs or sending `Accept: text/markdown`.

URL: https://nakafa.com/en/subjects/ai-ds/linear-methods/normal-equation
Source: https://raw.githubusercontent.com/nakafaai/nakafa.com/refs/heads/main/packages/contents/material/lesson/ai-ds/linear-methods/normal-equation/en.mdx

Learn normal equations for least squares optimization. Learn closed-form solutions, orthogonality principles, and when systems are solvable.

---

## What is Normal Equation System

Imagine you have a complex optimization problem, but it turns out there's an elegant shortcut. In least squares problems, instead of doing direct optimization, we can transform it into a system of equations that's easier to solve.

When we want to minimize

```math
\min_x \|A \cdot x - b\|_2^2
```

it turns out the solution can be found by solving the following system of equations

```math
A^T A \cdot x = A^T b
```

This equation is called the **normal equation system** because it involves the concept of orthogonality or "normal" (perpendicular) state in vector space.

## Fundamental Relationship

There's a very interesting relationship between the minimization problem and this normal equation system. Vector $$\hat{x} \in \mathbb{R}^n$$ is a solution to the least squares problem if and only if that vector satisfies the normal equation system.

Visible text: There's a very interesting relationship between the minimization problem and this normal equation system. Vector is a solution to the least squares problem if and only if that vector satisfies the normal equation system.

In other words, finding $$\hat{x}$$ that makes $$\|A \cdot x - b\|_2^2$$ minimal is exactly the same as finding $$\hat{x}$$ that satisfies $$A^T A \cdot \hat{x} = A^T b$$.

Visible text: In other words, finding that makes minimal is exactly the same as finding that satisfies .

## Why This System Works

To understand why this relationship holds, we need to look at it from a geometric perspective.

When $$\hat{x}$$ gives the minimum value for $$\|A\hat{x} - b\|_2^2$$, then the error vector $$A\hat{x} - b$$ must be orthogonal to all vectors in the column space of matrix $$A$$.

Visible text: When gives the minimum value for , then the error vector must be orthogonal to all vectors in the column space of matrix .

This column space consists of all vectors that can be written as $$Ax$$ for $$x \in \mathbb{R}^n$$. The orthogonality condition means

Visible text: This column space consists of all vectors that can be written as for . The orthogonality condition means

```math
0 = \langle Ax, A\hat{x} - b \rangle
```

for every vector $$x \in \mathbb{R}^n$$. Using the properties of inner product, we can write

Visible text: for every vector . Using the properties of inner product, we can write

Component: MathContainer
Children:

```math
0 = (Ax)^T (A\hat{x} - b) = x^T (A^T A\hat{x} - A^T b)
```

Since this relationship must hold for all vectors $$x$$, then

Visible text: Since this relationship must hold for all vectors , then

```math
A^T A\hat{x} - A^T b = 0
```

This is what gives us the normal equation system.

## Proof Using Pythagorean Theorem

We can also verify this result in a different way. Let $$\hat{x}$$ be the solution of the normal equation system and $$x$$ be any vector in $$\mathbb{R}^n$$.

Visible text: We can also verify this result in a different way. Let be the solution of the normal equation system and be any vector in .

Using the Pythagorean theorem, we can write

Component: MathContainer
Children:

```math
\|Ax - b\|_2^2 = \|A\hat{x} - b - A(\hat{x} - x)\|_2^2
```

```math
= \|A\hat{x} - b\|_2^2 + \|A(\hat{x} - x)\|_2^2 - 2 \langle A\hat{x} - b, A(\hat{x} - x) \rangle
```

```math
= \|A\hat{x} - b\|_2^2 + \underbrace{\|A(\hat{x} - x)\|_2^2}_{\geq 0} - 2 \underbrace{(\hat{x} - x)^T (A^T A\hat{x} - A^T b)}_{=0}
```

Since $$\hat{x}$$ satisfies the normal equation system, then $$A^T A\hat{x} - A^T b = 0$$ and the squared norm is always non-negative. Therefore

Visible text: Since satisfies the normal equation system, then and the squared norm is always non-negative. Therefore

```math
\|Ax - b\|_2^2 = \|A\hat{x} - b\|_2^2 + \|A(\hat{x} - x)\|_2^2 \geq \|A\hat{x} - b\|_2^2
```

This inequality proves that $$\hat{x}$$ indeed gives the minimum value.

Visible text: This inequality proves that indeed gives the minimum value.

## When Normal Equation System Can Be Solved

Not all normal equation systems can be solved easily. There are special conditions that must be met.

For matrix $$A \in \mathbb{R}^{m \times n}$$ with $$m \geq n$$, the symmetric matrix $$A^T A \in \mathbb{R}^{n \times n}$$ can be inverted if and only if matrix $$A$$ has full rank, that is $$\text{Rank}(A) = n$$.

Visible text: For matrix with , the symmetric matrix can be inverted if and only if matrix has full rank, that is .

This condition is very important because it determines whether the normal equation system has a unique solution. When $$A^T A$$ can be inverted, the solution can be written explicitly as

Visible text: This condition is very important because it determines whether the normal equation system has a unique solution. When can be inverted, the solution can be written explicitly as

```math
\hat{x} = (A^T A)^{-1} A^T b
```

## Proof of Invertible Condition

To understand when $$A^T A$$ can be inverted, we need to look at the relationship between null space (kernel) and rank.

Visible text: To understand when can be inverted, we need to look at the relationship between null space (kernel) and rank.

If $$A^T A$$ can be inverted, then the null space of $$A^T A$$ contains only the zero vector. Since the null space of $$A^T A$$ includes the null space of $$A$$, then $$A$$ also has only the zero vector in its null space. This means $$\text{Rank}(A) = n$$.

Visible text: If can be inverted, then the null space of contains only the zero vector. Since the null space of includes the null space of , then also has only the zero vector in its null space. This means .

Conversely, if $$\text{Rank}(A) = n$$, then the equation $$Ax = 0$$ has only the solution $$x = 0$$. To see that $$A^T A$$ can be inverted, note that if $$A^T Ax = 0$$, then

Visible text: Conversely, if , then the equation has only the solution . To see that can be inverted, note that if , then

```math
0 = \langle x, A^T Ax \rangle = x^T A^T Ax = \langle Ax, Ax \rangle
```

Since the inner product is only zero when $$Ax = 0$$, and we know this only happens when $$x = 0$$, then $$A^T A$$ is indeed invertible.

Visible text: Since the inner product is only zero when , and we know this only happens when , then is indeed invertible.

## Positive Definite Property

More than just being invertible, matrix $$A^T A$$ has a special property. When $$\text{Rank}(A) = n$$ and $$x \neq 0$$, we have $$Ax \neq 0$$ and

Visible text: More than just being invertible, matrix has a special property. When and , we have and

```math
x^T A^T A x = \langle Ax, Ax \rangle > 0
```

This shows that $$A^T A$$ is a **positive definite** matrix. This property guarantees that the normal equation system not only has a unique solution, but is also numerically stable when solved with computational methods. Algorithms like Cholesky decomposition can be safely used to solve this system.

Visible text: This shows that is a **positive definite** matrix. This property guarantees that the normal equation system not only has a unique solution, but is also numerically stable when solved with computational methods. Algorithms like Cholesky decomposition can be safely used to solve this system.