Command Palette

Search for a command to run...

Linear Methods of AI

Regularization

Problems in Linear Systems

When we deal with linear equation systems Ax=bAx = b where ARm×nA \in \mathbb{R}^{m \times n} and bRmb \in \mathbb{R}^m, challenging situations often arise. If m>nm > n and Rank(Ab)>Rank(A)\text{Rank}(A|b) > \text{Rank}(A), then the least squares system becomes unsolvable because the system is too constrained or has too many restrictions.

Another equally problematic situation occurs when matrix AA does not have full rank, i.e., Rank(A)<n\text{Rank}(A) < n. In this condition, the equation system becomes under-constrained or has too much freedom.

Imagine trying to determine the position of an object with too little or conflicting information. Regularization emerges as a solution to provide stability to these unstable problems.

Definition of Regularization Problem

To address the instability problem, we introduce a modified least squares problem

minx(Axb22+ω2xx022)\min_x \left( \|Ax - b\|_2^2 + \omega^2 \|x - x_0\|_2^2 \right)

where x0Rnx_0 \in \mathbb{R}^n is the initial value or prior estimate for the model parameters and ω2R0+\omega^2 \in \mathbb{R}_0^+ is the weighting factor. The additional term

ω2xx022\omega^2 \|x - x_0\|_2^2

is called the Tikhonov regularization term.

This regularization term is like giving a "preference" to the system to choose solutions that are not too far from the initial estimate x0x_0. The larger the value of ω\omega, the stronger this preference becomes.

Interpretation of Regularization

Through the regularization term, the least squares problem not only minimizes the difference Axb\|Ax - b\| between model and data, but also minimizes the difference xx0\|x - x_0\| between parameters and the prior estimate x0x_0, weighted by ω2\omega^2.

Note that the prior estimate x0x_0 is chosen by the researcher. The solution x^\hat{x} then not only describes the behavior of the process being investigated, but also reflects the researcher's initial assumptions.

Matrix Formulation

The regularization problem can be written in matrix form as

minx(Axbω(xx0))2\min_x \left\| \begin{pmatrix} Ax - b \\ \omega(x - x_0) \end{pmatrix} \right\|^2
=(AωI)x(bωx0)22= \left\| \begin{pmatrix} A \\ \omega I \end{pmatrix} x - \begin{pmatrix} b \\ \omega x_0 \end{pmatrix} \right\|_2^2

The corresponding normal equation system becomes

(AωI)T(AωI)x\begin{pmatrix} A \\ \omega I \end{pmatrix}^T \begin{pmatrix} A \\ \omega I \end{pmatrix} x
=(AωI)T(bωx0)= \begin{pmatrix} A \\ \omega I \end{pmatrix}^T \begin{pmatrix} b \\ \omega x_0 \end{pmatrix}

or in simpler form

(ATA+ω2I)x=ATb+ω2x0(A^T A + \omega^2 I) x = A^T b + \omega^2 x_0

Properties of Regularization Solution

For ω>0\omega > 0, the normal equation system

(ATA+ω2I)x=ATb+ω2x0(A^T A + \omega^2 I) x = A^T b + \omega^2 x_0

of the regularization problem always has a unique solution. Regularization thus restores the identifiability of all parameters.

The matrix (AωI)\begin{pmatrix} A \\ \omega I \end{pmatrix} has nn linearly independent rows in the ωI\omega I block for ω>0\omega > 0, thus achieving maximum rank nn. The matrix ATA+ω2IA^T A + \omega^2 I becomes positive definite for ω>0\omega > 0, ensuring that the problem becomes well-defined and has a stable solution.

Individual Weights for Parameters

We can choose individual weighting factors ωi0\omega_i \geq 0 for each parameter i=1,,ni = 1, \ldots, n. In this case, the least squares problem becomes

minxAxb22+i=1nωi2(xix0i)2\min_x \|Ax - b\|_2^2 + \sum_{i=1}^n \omega_i^2 (x_i - x_{0i})^2
=Axb22+Ω(xx0)22= \|Ax - b\|_2^2 + \|\Omega(x - x_0)\|_2^2
=(AΩ)x(bΩx0)22= \left\| \begin{pmatrix} A \\ \Omega \end{pmatrix} x - \begin{pmatrix} b \\ \Omega x_0 \end{pmatrix} \right\|_2^2

with diagonal matrix

Ω=(ω100ωn)\Omega = \begin{pmatrix} \omega_1 & 0 & \cdots \\ 0 & \ddots & \\ \vdots & & \omega_n \end{pmatrix}

The weighting factors ωi\omega_i are chosen such that the matrix (AΩ)\begin{pmatrix} A \\ \Omega \end{pmatrix} has full rank.

Weight Selection Strategy

For parameters that are difficult to determine well, we choose large weighting factors ωi\omega_i. Conversely, for parameters that can already be determined well, we can choose ωi=0\omega_i = 0. Of course, all weighting factors ωi\omega_i can influence all parameters.

If we decide to fix a parameter to a specific value or turn it into a constant, we can set the factor ωi=\omega_i = \infty in principle. This also applies when we add inequality conditions lixiuil_i \leq x_i \leq u_i to the problem, which are then satisfied in the solution with equations xi=lix_i = l_i or xi=uix_i = u_i.

Through regularization, the solution depends not only on the data, but also on the researcher's initial assumptions. This provides flexibility in integrating domain knowledge into the parameter estimation process.