Command Palette

Search for a command to run...

Linear Methods of AI

Statistical Analysis

Fisher Information Matrix

Matrix ATAA^T A has a special name in the context of least squares problems. This matrix is called the Fisher information matrix, named after the famous statistician.

Imagine measuring how sharp the peak of a mountain is. The sharper the peak, the easier it is to determine the exact location of the peak. Similarly, the Fisher information matrix provides a measure of how well we can determine the optimal parameters.

Parameter Covariance Matrix

Matrix C=(ATA)1C = (A^T A)^{-1} is the covariance matrix of the parameter estimator x^=(ATA)1ATb\hat{x} = (A^T A)^{-1} A^T b. This matrix applies when we assume that components bib_i for i=1,,ni = 1, \ldots, n are independent values that are standard normally distributed.

With this assumption, the estimator x^\hat{x} follows a multivariate normal distribution

x^N(xtrue,C)\hat{x} \sim N(x_{true}, C)

where xtrueRnx_{true} \in \mathbb{R}^n is the unknown true parameter as the expected value and CRn×nC \in \mathbb{R}^{n \times n} as the covariance matrix.

Diagonal elements ciic_{ii} describe the variance of parameters, like measuring how far parameter estimates can deviate from their true values. From these values, confidence intervals for the parameters can be calculated. Off-diagonal elements cijc_{ij} with iji \neq j are covariances that show how the uncertainties of two parameters are related. From these covariances, correlations cij/ciicjjc_{ij}/\sqrt{c_{ii} \cdot c_{jj}} between parameters can be obtained.

What matters in parameter estimation is not only the estimator x^\hat{x} itself, but also its statistical significance as described by the covariance matrix CC. Like a doctor who not only provides test results, but also explains the level of confidence in those results. In statistics courses, these concepts are discussed in more detail.

QR Decomposition

The covariance matrix can be calculated using the reduced QR decomposition of AA. If A=QRA = QR, then it holds

C=(ATA)1C = (A^T A)^{-1}
=(RTQTQR)1= (R^T Q^T QR)^{-1}
=R1RT= R^{-1} R^{-T}

Weighted Least Squares

To meet requirements regarding measurement errors and provide appropriate weights to measurement data, weighted least squares problems are commonly used

minxi=1m(h(ti)xyi)2σi2\min_x \sum_{i=1}^m \frac{(h(t_i) \cdot x - y_i)^2}{\sigma_i^2}
=Axb22= \|Ax - b\|_2^2

This problem can be transformed by defining

A=Σ1(h(t1)h(tm))A = \Sigma^{-1} \begin{pmatrix} h(t_1) \\ \vdots \\ h(t_m) \end{pmatrix}
b=Σ1(y1ym)b = \Sigma^{-1} \begin{pmatrix} y_1 \\ \vdots \\ y_m \end{pmatrix}

with

Σ1=(1/σ1001/σm)\Sigma^{-1} = \begin{pmatrix} 1/\sigma_1 & 0 & \cdots \\ 0 & \ddots & \\ \vdots & & 1/\sigma_m \end{pmatrix}

Here σi2\sigma_i^2 is the variance of measurement errors yiy_i that are independent and normally distributed. Additionally, it is assumed that measurement errors have expected value 00, so there are no systematic errors. Thus bib_i is standard normally distributed.

In weighted least squares functions, measurement values with large measurement errors are given weaker weights compared to measurement values with small measurement errors. The analogy is like when we listen to opinions from various sources, we give greater weight to more reliable sources and smaller weight to less accurate sources.