# Nakafa Learning Content

> For AI agents: use [llms.txt](https://nakafa.com/llms.txt) for the site index. Markdown versions are available by appending `.md` to content URLs or sending `Accept: text/markdown`.

URL: https://nakafa.com/en/subjects/ai-ds/linear-methods/normal-equation-solution
Source: https://raw.githubusercontent.com/nakafaai/nakafa.com/refs/heads/main/packages/contents/material/lesson/ai-ds/linear-methods/normal-equation-solution/en.mdx

Implement normal equation solutions using QR decomposition and pseudoinverse. Learn numerical methods, polynomial fitting, and stability comparisons.

---

## Fundamental Properties

Normal equation systems $$A^T A x = A^T b$$ have special characteristics that distinguish them from ordinary linear systems. Imagine finding the best point on a line to represent scattered data points, this system provides a mathematical way to find that optimal solution.

Visible text: Normal equation systems have special characteristics that distinguish them from ordinary linear systems. Imagine finding the best point on a line to represent scattered data points, this system provides a mathematical way to find that optimal solution.

For a matrix $$A \in \mathbb{R}^{m \times n}$$ with $$m \geq n$$, the normal equation system $$A^T A x = A^T b$$ always has a solution. More specifically, this system has a unique solution exactly when matrix $$A$$ has full rank, that is when $$\text{Rank}(A) = n$$. Under this condition, the solution can be expressed as $$\hat{x} = (A^T A)^{-1} A^T b$$.

Visible text: For a matrix with , the normal equation system always has a solution. More specifically, this system has a unique solution exactly when matrix has full rank, that is when . Under this condition, the solution can be expressed as .

When matrix $$A$$ does not have full rank, the solution set of the normal equation system takes the form $$\hat{x} + \text{Kernel}(A)$$, where $$\hat{x}$$ is any particular solution of the system.

Visible text: When matrix does not have full rank, the solution set of the normal equation system takes the form , where is any particular solution of the system.

### Why the System is Always Solvable

The fundamental reason why normal equation systems always have a solution lies in the concept of orthogonal projection. The orthogonal projection of vector $$b$$ onto the column space $$\{Ax : x \in \mathbb{R}^n\}$$ always exists and is the solution to the linear least squares problem, which automatically is also a solution to the normal equation system.

Visible text: The fundamental reason why normal equation systems always have a solution lies in the concept of orthogonal projection. The orthogonal projection of vector onto the column space always exists and is the solution to the linear least squares problem, which automatically is also a solution to the normal equation system.

To understand why other solutions take the form $$\hat{x} + \text{Kernel}(A)$$, suppose $$\tilde{x}$$ is another solution of the system $$A^T A \tilde{x} = A^T b$$. Then $$\tilde{x}$$ is a solution to the normal equation system if and only if $$A^T A(\tilde{x} - \hat{x}) = 0$$, which is equivalent to $$(\tilde{x} - \hat{x})^T A^T A(\tilde{x} - \hat{x}) = 0$$, which is further equivalent to $$A(\tilde{x} - \hat{x}) = 0$$, or in other words $$\tilde{x} - \hat{x} \in \text{Kernel}(A)$$.

Visible text: To understand why other solutions take the form , suppose is another solution of the system . Then is a solution to the normal equation system if and only if , which is equivalent to , which is further equivalent to , or in other words .

## Moore-Penrose Pseudoinverse

For a matrix $$A \in \mathbb{R}^{m \times n}$$ with $$m \geq n$$ and $$\text{Rank}(A) = n$$, we can define the Moore-Penrose pseudoinverse as

Visible text: For a matrix with and , we can define the Moore-Penrose pseudoinverse as

```math
A^{\dagger} = (A^T A)^{-1} A^T
```

The Moore-Penrose pseudoinverse functions like the "best inverse" of a non-square matrix. It provides an optimal way to "cancel" linear transformations in the least squares context.

The Moore-Penrose pseudoinverse satisfies four Penrose axioms that uniquely determine its characteristics

Component: MathContainer
Children:

```math
AA^{\dagger}A = A
```

```math
A^{\dagger}AA^{\dagger} = A^{\dagger}
```

```math
(AA^{\dagger})^T = AA^{\dagger}
```

```math
(A^{\dagger}A)^T = A^{\dagger}A
```

These four properties are unique, meaning if a matrix $$B$$ satisfies all four axioms, then automatically $$B = A^{\dagger}$$. The Moore-Penrose pseudoinverse thus functions as the unique solution operator for linear least squares problems.

Visible text: These four properties are unique, meaning if a matrix satisfies all four axioms, then automatically . The Moore-Penrose pseudoinverse thus functions as the unique solution operator for linear least squares problems.

## Solution Using QR Decomposition

A more numerically stable approach to solving normal equation systems uses QR decomposition. For a matrix $$A \in \mathbb{R}^{m \times n}$$ with full rank and $$m \geq n$$, we can use the (thin) QR decomposition $$A = Q_1 R_1$$.

Visible text: A more numerically stable approach to solving normal equation systems uses QR decomposition. For a matrix with full rank and , we can use the (thin) QR decomposition .

With this decomposition, the normal equation system $$A^T A x = A^T b$$ can be solved through

Visible text: With this decomposition, the normal equation system can be solved through

```math
A^T A x = R_1^T Q_1^T Q_1 R_1 x = R_1^T R_1 x = R_1^T Q_1^T b = A^T b
```

Since $$R_1$$ is an invertible upper triangular matrix, this equation is equivalent to

Visible text: Since is an invertible upper triangular matrix, this equation is equivalent to

```math
R_1 x = Q_1^T b
```

This upper triangular system can be solved using back substitution, providing the solution $$x$$ directly and efficiently.

Visible text: This upper triangular system can be solved using back substitution, providing the solution directly and efficiently.

## Numerical Example

Let's apply this method to a concrete example. Suppose we have experimental data that we want to fit with a quadratic polynomial. We will use the following data

Component: MathContainer
Children:

```math
A = \begin{pmatrix} 9 & -3 & 1 \\ 4 & -2 & 1 \\ 1 & -1 & 1 \\ 0 & 0 & 1 \\ 1 & 1 & 1 \\ 4 & 2 & 1 \\ 9 & 3 & 1 \end{pmatrix}
```

```math
b = \begin{pmatrix} -2.2 \\ -4.2 \\ -4.2 \\ -1.8 \\ 1.8 \\ 8.2 \\ 15.8 \end{pmatrix}
```

Each row in matrix $$A$$ has the format $$[t_i^2, t_i, 1]$$ to find the coefficients of polynomial $$y = at^2 + bt + c$$, while vector $$b$$ contains the corresponding observation values.

Visible text: Each row in matrix has the format to find the coefficients of polynomial , while vector contains the corresponding observation values.

### QR Decomposition Process

The QR decomposition of matrix $$A$$ yields

Visible text: The QR decomposition of matrix yields

Component: MathContainer
Children:

```math
Q_1 = \begin{pmatrix} -0.64286 & -0.56695 & 0.16496 \\ -0.28571 & -0.37796 & -0.24744 \\ -0.07143 & -0.18898 & -0.49487 \\ -0.00000 & 0.00000 & -0.57735 \\ -0.07143 & 0.18898 & -0.49487 \\ -0.28571 & 0.37796 & -0.24744 \\ -0.64286 & 0.56695 & 0.16496 \end{pmatrix}
```

```math
R_1 = \begin{pmatrix} -14.00000 & -0.00000 & -2.00000 \\ 0.00000 & 5.29150 & 0.00000 \\ 0.00000 & 0.00000 & -1.73205 \end{pmatrix}
```

### Solution Steps

First, we compute $$Q_1^T b$$ to obtain

Visible text: First, we compute to obtain

```math
Q_1^T b = \begin{pmatrix} -9.7143 \\ 16.0257 \\ 3.4806 \end{pmatrix}
```

Next, we solve the upper triangular system $$R_1 x = Q_1^T b$$ using back substitution. Since $$R_1$$ is upper triangular, we start from the bottom equation.

Visible text: Next, we solve the upper triangular system using back substitution. Since is upper triangular, we start from the bottom equation.

From the third equation, $$-1.73205 x_3 = 3.4806$$, so $$x_3 = -2.00952$$.

Visible text: From the third equation, , so .

From the second equation, $$5.29150 x_2 = 16.0257$$, so $$x_2 = 3.02857$$.

Visible text: From the second equation, , so .

From the first equation, $$-14.00000 x_1 - 2.00000(-2.00952) = -9.7143$$, so $$x_1 = 0.98095$$.

Visible text: From the first equation, , so .

Thus, the complete solution is

```math
\hat{x} = \begin{pmatrix} 0.98095 \\ 3.02857 \\ -2.00952 \end{pmatrix}
```

### Fitting Results

Based on the obtained solution, the quadratic polynomial that best fits the data is

```math
y = 0.98095 \cdot t^2 + 3.02857 \cdot t - 2.00952
```

The following visualization shows how well this polynomial represents the original data

Component: LineEquation
Props:
- title: Quadratic Polynomial Fitting
- description: Polynomial curve generated from the normal equation system solution.
- cameraPosition: [15, 10, 15]
- data: (() => {
// Original data from matrix A (column t) and vector b
const originalData = [
[-3, -2.2],
[-2, -4.2],
[-1, -4.2],
[0, -1.8],
[1, 1.8],
[2, 8.2],
[3, 15.8]
];

// Polynomial coefficients from normal equation system solution
const a = 0.98095;
const b = 3.02857;
const c = -2.00952;

return [
{
points: Array.from({ length: 50 }, (_, i) => {
const t = -3 + (i * 6) / 49;
const y = a * t * t + b * t + c;
return { x: t, y: y, z: 0 };
}),
color: getColor("CYAN"),
smooth: true,
showPoints: false
},
{
points: originalData.map(([t, y]) => ({ x: t, y: y, z: 0 })),
color: getColor("ORANGE"),
smooth: false,
showPoints: true
}
];
})()

## Method Comparison

To solve normal equation systems, there are two main approaches that can be compared in terms of computation and numerical stability.

The Cholesky approach involves explicitly forming the matrix $$A^T A$$ first, then applying Cholesky decomposition since this matrix is positive definite. This method requires approximately $$n^2 \cdot m + \frac{1}{6}n^3 + O(n^2) + O(m \cdot n)$$ arithmetic operations. However, multiplication and decomposition can become sources of large error propagation, especially when $$m = n$$ where $$\text{cond}(A^T A) \approx \text{cond}(A)^2$$.

Visible text: The Cholesky approach involves explicitly forming the matrix first, then applying Cholesky decomposition since this matrix is positive definite. This method requires approximately arithmetic operations. However, multiplication and decomposition can become sources of large error propagation, especially when where .

The QR approach, conversely, can solve this problem with better numerical stability and comparable computational complexity. The main complexity is determined by $$n^2 \cdot m$$ operations for QR decomposition, making it comparable to the Cholesky approach. However, the significant advantage of QR lies in the fact that orthogonal transformations do not worsen the problem condition, unlike forming $$A^T A$$ in the Cholesky method.

Visible text: The QR approach, conversely, can solve this problem with better numerical stability and comparable computational complexity. The main complexity is determined by operations for QR decomposition, making it comparable to the Cholesky approach. However, the significant advantage of QR lies in the fact that orthogonal transformations do not worsen the problem condition, unlike forming in the Cholesky method.

The choice of the appropriate method depends on the data characteristics and the level of accuracy required in the specific application.