# Nakafa Learning Content

> For AI agents: use [llms.txt](https://nakafa.com/llms.txt) for the site index. Markdown versions are available by appending `.md` to content URLs or sending `Accept: text/markdown`.

URL: https://nakafa.com/en/subjects/ai-ds/linear-methods/statistical-analysis
Source: https://raw.githubusercontent.com/nakafaai/nakafa.com/refs/heads/main/packages/contents/material/lesson/ai-ds/linear-methods/statistical-analysis/en.mdx

Learn Fisher information matrix, parameter covariance, weighted least squares, and confidence intervals for statistical models.

---

## Fisher Information Matrix

Matrix $$A^T A$$ has a special name in the context of least squares problems. This matrix is called the Fisher information matrix, named after the famous statistician.

Visible text: Matrix has a special name in the context of least squares problems. This matrix is called the Fisher information matrix, named after the famous statistician.

Imagine measuring how sharp the peak of a mountain is. The sharper the peak, the easier it is to determine the exact location of the peak. Similarly, the Fisher information matrix provides a measure of how well we can determine the optimal parameters.

## Parameter Covariance Matrix

Matrix $$C = (A^T A)^{-1}$$ is the covariance matrix of the parameter estimator $$\hat{x} = (A^T A)^{-1} A^T b$$. This matrix applies when we assume that components $$b_i$$ for $$i = 1, \ldots, n$$ are independent values that are standard normally distributed.

Visible text: Matrix is the covariance matrix of the parameter estimator . This matrix applies when we assume that components for are independent values that are standard normally distributed.

With this assumption, the estimator $$\hat{x}$$ follows a multivariate normal distribution

Visible text: With this assumption, the estimator follows a multivariate normal distribution

```math
\hat{x} \sim N(x_{true}, C)
```

where $$x_{true} \in \mathbb{R}^n$$ is the unknown true parameter as the expected value and $$C \in \mathbb{R}^{n \times n}$$ as the covariance matrix.

Visible text: where is the unknown true parameter as the expected value and as the covariance matrix.

Diagonal elements $$c_{ii}$$ describe the variance of parameters, like measuring how far parameter estimates can deviate from their true values. From these values, confidence intervals for the parameters can be calculated. Off-diagonal elements $$c_{ij}$$ with $$i \neq j$$ are covariances that show how the uncertainties of two parameters are related. From these covariances, correlations $$c_{ij}/\sqrt{c_{ii} \cdot c_{jj}}$$ between parameters can be obtained.

Visible text: Diagonal elements describe the variance of parameters, like measuring how far parameter estimates can deviate from their true values. From these values, confidence intervals for the parameters can be calculated. Off-diagonal elements with are covariances that show how the uncertainties of two parameters are related. From these covariances, correlations between parameters can be obtained.

What matters in parameter estimation is not only the estimator $$\hat{x}$$ itself, but also its statistical significance as described by the covariance matrix $$C$$. Like a doctor who not only provides test results, but also explains the level of confidence in those results. In statistics courses, these concepts are discussed in more detail.

Visible text: What matters in parameter estimation is not only the estimator itself, but also its statistical significance as described by the covariance matrix . Like a doctor who not only provides test results, but also explains the level of confidence in those results. In statistics courses, these concepts are discussed in more detail.

## QR Decomposition

The covariance matrix can be calculated using the reduced QR decomposition of $$A$$. If $$A = QR$$, then it holds

Visible text: The covariance matrix can be calculated using the reduced QR decomposition of . If , then it holds

Component: MathContainer
Children:

```math
C = (A^T A)^{-1}
```

```math
= (R^T Q^T QR)^{-1}
```

```math
= R^{-1} R^{-T}
```

## Weighted Least Squares

To meet requirements regarding measurement errors and provide appropriate weights to measurement data, weighted least squares problems are commonly used

Component: MathContainer
Children:

```math
\min_x \sum_{i=1}^m \frac{(h(t_i) \cdot x - y_i)^2}{\sigma_i^2}
```

```math
= \|Ax - b\|_2^2
```

This problem can be transformed by defining

Component: MathContainer
Children:

```math
A = \Sigma^{-1} \begin{pmatrix} h(t_1) \\ \vdots \\ h(t_m) \end{pmatrix}
```

```math
b = \Sigma^{-1} \begin{pmatrix} y_1 \\ \vdots \\ y_m \end{pmatrix}
```

with

```math
\Sigma^{-1} = \begin{pmatrix} 1/\sigma_1 & 0 & \cdots \\ 0 & \ddots & \\ \vdots & & 1/\sigma_m \end{pmatrix}
```

Here $$\sigma_i^2$$ is the variance of measurement errors $$y_i$$ that are independent and normally distributed. Additionally, it is assumed that measurement errors have expected value $$0$$, so there are no systematic errors. Thus $$b_i$$ is standard normally distributed.

Visible text: Here is the variance of measurement errors that are independent and normally distributed. Additionally, it is assumed that measurement errors have expected value , so there are no systematic errors. Thus is standard normally distributed.

In weighted least squares functions, measurement values with large measurement errors are given weaker weights compared to measurement values with small measurement errors. Think of listening to opinions from several sources. We give greater weight to more reliable sources and smaller weight to less accurate sources.