# Nakafa Learning Content

> For AI agents: use [llms.txt](https://nakafa.com/llms.txt) for the site index. Markdown versions are available by appending `.md` to content URLs or sending `Accept: text/markdown`.

URL: https://nakafa.com/en/subjects/ai-ds/linear-methods/regularization
Source: https://raw.githubusercontent.com/nakafaai/nakafa.com/refs/heads/main/packages/contents/material/lesson/ai-ds/linear-methods/regularization/en.mdx

Learn Tikhonov regularization for unstable linear systems. Learn to solve ill-conditioned problems, prevent overfitting, and stabilize parameter estimation.

---

## Problems in Linear Systems

When we deal with linear equation systems $$Ax = b$$ where $$A \in \mathbb{R}^{m \times n}$$ and $$b \in \mathbb{R}^m$$, challenging situations often arise. If $$m > n$$ and $$\text{Rank}(A|b) > \text{Rank}(A)$$, then the least squares system becomes unsolvable because the system is too constrained or has too many restrictions.

Visible text: When we deal with linear equation systems where and , challenging situations often arise. If and , then the least squares system becomes unsolvable because the system is too constrained or has too many restrictions.

Another equally problematic situation occurs when matrix $$A$$ does not have full rank, i.e., $$\text{Rank}(A) < n$$. In this condition, the equation system becomes under-constrained or has too much freedom.

Visible text: Another equally problematic situation occurs when matrix does not have full rank, i.e., . In this condition, the equation system becomes under-constrained or has too much freedom.

Imagine trying to determine the position of an object with too little or conflicting information. Regularization emerges as a solution to provide stability to these unstable problems.

## Definition of Regularization Problem

To address the instability problem, we introduce a modified least squares problem

```math
\min_x \left( \|Ax - b\|_2^2 + \omega^2 \|x - x_0\|_2^2 \right)
```

where $$x_0 \in \mathbb{R}^n$$ is the initial value or prior estimate for the model parameters and $$\omega^2 \in \mathbb{R}_0^+$$ is the weighting factor. The additional term

Visible text: where is the initial value or prior estimate for the model parameters and is the weighting factor. The additional term

```math
\omega^2 \|x - x_0\|_2^2
```

is called the Tikhonov regularization term.

This regularization term is like giving a "preference" to the system to choose solutions that are not too far from the initial estimate $$x_0$$. The larger the value of $$\omega$$, the stronger this preference becomes.

Visible text: This regularization term is like giving a "preference" to the system to choose solutions that are not too far from the initial estimate . The larger the value of , the stronger this preference becomes.

## Interpretation of Regularization

Through the regularization term, the least squares problem not only minimizes the difference $$\|Ax - b\|$$ between model and data, but also minimizes the difference $$\|x - x_0\|$$ between parameters and the prior estimate $$x_0$$, weighted by $$\omega^2$$.

Visible text: Through the regularization term, the least squares problem not only minimizes the difference between model and data, but also minimizes the difference between parameters and the prior estimate , weighted by .

Note that the prior estimate $$x_0$$ is chosen by the researcher. The solution $$\hat{x}$$ then not only describes the behavior of the process being investigated, but also reflects the researcher's initial assumptions.

Visible text: Note that the prior estimate is chosen by the researcher. The solution then not only describes the behavior of the process being investigated, but also reflects the researcher's initial assumptions.

## Matrix Formulation

The regularization problem can be written in matrix form as

Component: MathContainer
Children:

```math
\min_x \left\| \begin{pmatrix} Ax - b \\ \omega(x - x_0) \end{pmatrix} \right\|^2
```

```math
= \left\| \begin{pmatrix} A \\ \omega I \end{pmatrix} x - \begin{pmatrix} b \\ \omega x_0 \end{pmatrix} \right\|_2^2
```

The corresponding normal equation system becomes

Component: MathContainer
Children:

```math
\begin{pmatrix} A \\ \omega I \end{pmatrix}^T \begin{pmatrix} A \\ \omega I \end{pmatrix} x
```

```math
= \begin{pmatrix} A \\ \omega I \end{pmatrix}^T \begin{pmatrix} b \\ \omega x_0 \end{pmatrix}
```

or in simpler form

```math
(A^T A + \omega^2 I) x = A^T b + \omega^2 x_0
```

## Properties of Regularization Solution

For $$\omega > 0$$, the normal equation system

Visible text: For , the normal equation system

```math
(A^T A + \omega^2 I) x = A^T b + \omega^2 x_0
```

of the regularization problem always has a unique solution. Regularization thus restores the identifiability of all parameters.

The matrix $$\begin{pmatrix} A \\ \omega I \end{pmatrix}$$ has $$n$$ linearly independent rows in the $$\omega I$$ block for $$\omega > 0$$, thus achieving maximum rank $$n$$. The matrix $$A^T A + \omega^2 I$$ becomes positive definite for $$\omega > 0$$, ensuring that the problem becomes well-defined and has a stable solution.

Visible text: The matrix has linearly independent rows in the block for , thus achieving maximum rank . The matrix becomes positive definite for , ensuring that the problem becomes well-defined and has a stable solution.

## Individual Weights for Parameters

We can choose individual weighting factors $$\omega_i \geq 0$$ for each parameter $$i = 1, \ldots, n$$. In this case, the least squares problem becomes

Visible text: We can choose individual weighting factors for each parameter . In this case, the least squares problem becomes

Component: MathContainer
Children:

```math
\min_x \|Ax - b\|_2^2 + \sum_{i=1}^n \omega_i^2 (x_i - x_{0i})^2
```

```math
= \|Ax - b\|_2^2 + \|\Omega(x - x_0)\|_2^2
```

```math
= \left\| \begin{pmatrix} A \\ \Omega \end{pmatrix} x - \begin{pmatrix} b \\ \Omega x_0 \end{pmatrix} \right\|_2^2
```

with diagonal matrix

```math
\Omega = \begin{pmatrix} \omega_1 & 0 & \cdots \\ 0 & \ddots & \\ \vdots & & \omega_n \end{pmatrix}
```

The weighting factors $$\omega_i$$ are chosen such that the matrix $$\begin{pmatrix} A \\ \Omega \end{pmatrix}$$ has full rank.

Visible text: The weighting factors are chosen such that the matrix has full rank.

## Weight Selection Strategy

For parameters that are difficult to determine well, we choose large weighting factors $$\omega_i$$. Conversely, for parameters that can already be determined well, we can choose $$\omega_i = 0$$. Of course, all weighting factors $$\omega_i$$ can influence all parameters.

Visible text: For parameters that are difficult to determine well, we choose large weighting factors . Conversely, for parameters that can already be determined well, we can choose . Of course, all weighting factors can influence all parameters.

If we decide to fix a parameter to a specific value or turn it into a constant, we can set the factor $$\omega_i = \infty$$ in principle. This also applies when we add inequality conditions $$l_i \leq x_i \leq u_i$$ to the problem, which are then satisfied in the solution with equations $$x_i = l_i$$ or $$x_i = u_i$$.

Visible text: If we decide to fix a parameter to a specific value or turn it into a constant, we can set the factor in principle. This also applies when we add inequality conditions to the problem, which are then satisfied in the solution with equations or .

Through regularization, the solution depends not only on the data, but also on the researcher's initial assumptions. This provides flexibility in integrating domain knowledge into the parameter estimation process.