Fundamental Properties
Normal equation systems have special characteristics that distinguish them from ordinary linear systems. Imagine finding the best point on a line to represent scattered data points, this system provides a mathematical way to find that optimal solution.
For a matrix with , the normal equation system always has a solution. More specifically, this system has a unique solution exactly when matrix has full rank, that is when . Under this condition, the solution can be expressed as .
When matrix does not have full rank, the solution set of the normal equation system takes the form , where is any particular solution of the system.
Why the System is Always Solvable
The fundamental reason why normal equation systems always have a solution lies in the concept of orthogonal projection. The orthogonal projection of vector onto the column space always exists and is the solution to the linear least squares problem, which automatically is also a solution to the normal equation system.
To understand why other solutions take the form , suppose is another solution of the system . Then is a solution to the normal equation system if and only if , which is equivalent to , which is further equivalent to , or in other words .
Moore-Penrose Pseudoinverse
For a matrix with and , we can define the Moore-Penrose pseudoinverse as
The Moore-Penrose pseudoinverse functions like the "best inverse" of a non-square matrix. It provides an optimal way to "cancel" linear transformations in the least squares context.
The Moore-Penrose pseudoinverse satisfies four Penrose axioms that uniquely determine its characteristics
These four properties are unique, meaning if a matrix satisfies all four axioms, then automatically . The Moore-Penrose pseudoinverse thus functions as the unique solution operator for linear least squares problems.
Solution Using QR Decomposition
A more numerically stable approach to solving normal equation systems uses QR decomposition. For a matrix with full rank and , we can use the (thin) QR decomposition .
With this decomposition, the normal equation system can be solved through
Since is an invertible upper triangular matrix, this equation is equivalent to
This upper triangular system can be solved using back substitution, providing the solution directly and efficiently.
Numerical Example
Let's apply this method to a concrete example. Suppose we have experimental data that we want to fit with a quadratic polynomial. We will use the following data
Each row in matrix has the format to find the coefficients of polynomial , while vector contains the corresponding observation values.
QR Decomposition Process
The QR decomposition of matrix yields
Solution Steps
First, we compute to obtain
Next, we solve the upper triangular system using back substitution. Since is upper triangular, we start from the bottom equation.
From the third equation, , so .
From the second equation, , so .
From the first equation, , so .
Thus, the complete solution is
Fitting Results
Based on the obtained solution, the quadratic polynomial that best fits the data is
The following visualization shows how well this polynomial represents the original data
Method Comparison
To solve normal equation systems, there are two main approaches that can be compared in terms of computation and numerical stability.
The Cholesky approach involves explicitly forming the matrix first, then applying Cholesky decomposition since this matrix is positive definite. This method requires approximately arithmetic operations. However, multiplication and decomposition can become sources of large error propagation, especially when where .
The QR approach, conversely, can solve this problem with better numerical stability and comparable computational complexity. The main complexity is determined by operations for QR decomposition, making it comparable to the Cholesky approach. However, the significant advantage of QR lies in the fact that orthogonal transformations do not worsen the problem condition, unlike forming in the Cholesky method.
The choice of the appropriate method depends on the data characteristics and the level of accuracy required in the specific application.