Regularization

Problems in Linear Systems

When we deal with linear equation systems $Ax = b$ where $A \in \mathbb{R}^{m \times n}$ and $b \in \mathbb{R}^m$ , challenging situations often arise. If $m > n$ and $\text{Rank}(A|b) > \text{Rank}(A)$ , then the least squares system becomes unsolvable because the system is too constrained or has too many restrictions.

Another equally problematic situation occurs when matrix $A$ does not have full rank, i.e., $\text{Rank}(A) < n$ . In this condition, the equation system becomes under-constrained or has too much freedom.

Imagine trying to determine the position of an object with too little or conflicting information. Regularization emerges as a solution to provide stability to these unstable problems.

Definition of Regularization Problem

To address the instability problem, we introduce a modified least squares problem

\min_x \left( \|Ax - b\|_2^2 + \omega^2 \|x - x_0\|_2^2 \right)

where $x_0 \in \mathbb{R}^n$ is the initial value or prior estimate for the model parameters and $\omega^2 \in \mathbb{R}_0^+$ is the weighting factor. The additional term

\omega^2 \|x - x_0\|_2^2

is called the Tikhonov regularization term.

This regularization term is like giving a "preference" to the system to choose solutions that are not too far from the initial estimate $x_0$ . The larger the value of $\omega$ , the stronger this preference becomes.

Interpretation of Regularization

Through the regularization term, the least squares problem not only minimizes the difference $\|Ax - b\|$ between model and data, but also minimizes the difference $\|x - x_0\|$ between parameters and the prior estimate $x_0$ , weighted by $\omega^2$ .

Note that the prior estimate $x_0$ is chosen by the researcher. The solution $\hat{x}$ then not only describes the behavior of the process being investigated, but also reflects the researcher's initial assumptions.

Matrix Formulation

The regularization problem can be written in matrix form as

\min_x \left\| \begin{pmatrix} Ax - b \\ \omega(x - x_0) \end{pmatrix} \right\|^2

= \left\| \begin{pmatrix} A \\ \omega I \end{pmatrix} x - \begin{pmatrix} b \\ \omega x_0 \end{pmatrix} \right\|_2^2

The corresponding normal equation system becomes

\begin{pmatrix} A \\ \omega I \end{pmatrix}^T \begin{pmatrix} A \\ \omega I \end{pmatrix} x

= \begin{pmatrix} A \\ \omega I \end{pmatrix}^T \begin{pmatrix} b \\ \omega x_0 \end{pmatrix}

or in simpler form

(A^T A + \omega^2 I) x = A^T b + \omega^2 x_0

Properties of Regularization Solution

For $\omega > 0$ , the normal equation system

(A^T A + \omega^2 I) x = A^T b + \omega^2 x_0

of the regularization problem always has a unique solution. Regularization thus restores the identifiability of all parameters.

The matrix $\begin{pmatrix} A \\ \omega I \end{pmatrix}$ has $n$ linearly independent rows in the $\omega I$ block for $\omega > 0$ , thus achieving maximum rank $n$ . The matrix $A^T A + \omega^2 I$ becomes positive definite for $\omega > 0$ , ensuring that the problem becomes well-defined and has a stable solution.

Individual Weights for Parameters

We can choose individual weighting factors $\omega_i \geq 0$ for each parameter $i = 1, \ldots, n$ . In this case, the least squares problem becomes

\min_x \|Ax - b\|_2^2 + \sum_{i=1}^n \omega_i^2 (x_i - x_{0i})^2

= \|Ax - b\|_2^2 + \|\Omega(x - x_0)\|_2^2

= \left\| \begin{pmatrix} A \\ \Omega \end{pmatrix} x - \begin{pmatrix} b \\ \Omega x_0 \end{pmatrix} \right\|_2^2

with diagonal matrix

\Omega = \begin{pmatrix} \omega_1 & 0 & \cdots \\ 0 & \ddots & \\ \vdots & & \omega_n \end{pmatrix}

The weighting factors $\omega_i$ are chosen such that the matrix $\begin{pmatrix} A \\ \Omega \end{pmatrix}$ has full rank.

Weight Selection Strategy

For parameters that are difficult to determine well, we choose large weighting factors $\omega_i$ . Conversely, for parameters that can already be determined well, we can choose $\omega_i = 0$ . Of course, all weighting factors $\omega_i$ can influence all parameters.

If we decide to fix a parameter to a specific value or turn it into a constant, we can set the factor $\omega_i = \infty$ in principle. This also applies when we add inequality conditions $l_i \leq x_i \leq u_i$ to the problem, which are then satisfied in the solution with equations $x_i = l_i$ or $x_i = u_i$ .

Through regularization, the solution depends not only on the data, but also on the researcher's initial assumptions. This provides flexibility in integrating domain knowledge into the parameter estimation process.

Command Palette