Statistical Analysis

Fisher Information Matrix

Matrix $A^T A$ has a special name in the context of least squares problems. This matrix is called the Fisher information matrix, named after the famous statistician.

Imagine measuring how sharp the peak of a mountain is. The sharper the peak, the easier it is to determine the exact location of the peak. Similarly, the Fisher information matrix provides a measure of how well we can determine the optimal parameters.

Parameter Covariance Matrix

Matrix $C = (A^T A)^{-1}$ is the covariance matrix of the parameter estimator $\hat{x} = (A^T A)^{-1} A^T b$ . This matrix applies when we assume that components $b_i$ for $i = 1, \ldots, n$ are independent values that are standard normally distributed.

With this assumption, the estimator $\hat{x}$ follows a multivariate normal distribution

\hat{x} \sim N(x_{true}, C)

where $x_{true} \in \mathbb{R}^n$ is the unknown true parameter as the expected value and $C \in \mathbb{R}^{n \times n}$ as the covariance matrix.

Diagonal elements $c_{ii}$ describe the variance of parameters, like measuring how far parameter estimates can deviate from their true values. From these values, confidence intervals for the parameters can be calculated. Off-diagonal elements $c_{ij}$ with $i \neq j$ are covariances that show how the uncertainties of two parameters are related. From these covariances, correlations $c_{ij}/\sqrt{c_{ii} \cdot c_{jj}}$ between parameters can be obtained.

What matters in parameter estimation is not only the estimator $\hat{x}$ itself, but also its statistical significance as described by the covariance matrix $C$ . Like a doctor who not only provides test results, but also explains the level of confidence in those results. In statistics courses, these concepts are discussed in more detail.

QR Decomposition

The covariance matrix can be calculated using the reduced QR decomposition of $A$ . If $A = QR$ , then it holds

C = (A^T A)^{-1}

= (R^T Q^T QR)^{-1}

= R^{-1} R^{-T}

Weighted Least Squares

To meet requirements regarding measurement errors and provide appropriate weights to measurement data, weighted least squares problems are commonly used

\min_x \sum_{i=1}^m \frac{(h(t_i) \cdot x - y_i)^2}{\sigma_i^2}

= \|Ax - b\|_2^2

This problem can be transformed by defining

A = \Sigma^{-1} \begin{pmatrix} h(t_1) \\ \vdots \\ h(t_m) \end{pmatrix}

b = \Sigma^{-1} \begin{pmatrix} y_1 \\ \vdots \\ y_m \end{pmatrix}

with

\Sigma^{-1} = \begin{pmatrix} 1/\sigma_1 & 0 & \cdots \\ 0 & \ddots & \\ \vdots & & 1/\sigma_m \end{pmatrix}

Here $\sigma_i^2$ is the variance of measurement errors $y_i$ that are independent and normally distributed. Additionally, it is assumed that measurement errors have expected value $0$ , so there are no systematic errors. Thus $b_i$ is standard normally distributed.

In weighted least squares functions, measurement values with large measurement errors are given weaker weights compared to measurement values with small measurement errors. The analogy is like when we listen to opinions from various sources, we give greater weight to more reliable sources and smaller weight to less accurate sources.

Command Palette

Fisher Information Matrix

Parameter Covariance Matrix

QR Decomposition

Weighted Least Squares