Command Palette

Search for a command to run...

Linear Methods of AI

Principal Component Analysis

Principal Component Analysis Example

Imagine you have a random vector xRnx \in \mathbb{R}^n that is normally distributed. This vector has a zero expected value 0Rn0 \in \mathbb{R}^n and a positive definite covariance matrix CRn×nC \in \mathbb{R}^{n \times n}. We can write it as a normal distribution like this:

xN(0,C)x \sim N(0, C)

Each individual parameter xix_i represents characteristics of the process we are observing. In practice, almost all entries of the covariance matrix CC can be non-zero. This means these parameters are strongly correlated due to covariances in the off-diagonal elements.

Through principal component analysis, we can determine the main influence factors that affect the process.

Covariance Matrix Diagonalization

To identify the main influence factors, we need to perform diagonalization on the covariance matrix CC. Let λ1λn>0\lambda_1 \geq \ldots \geq \lambda_n > 0 be the eigenvalues of CC with corresponding orthonormal eigenvectors v1,,vnv_1, \ldots, v_n.

Based on the spectral theorem, we can form the diagonal matrix and eigenvector matrix:

Λ=(λ100λn)\Lambda = \begin{pmatrix} \lambda_1 & & 0 \\ & \ddots & \\ 0 & & \lambda_n \end{pmatrix}
S=(v1vn)S = (v_1 \quad \ldots \quad v_n)

Then the fundamental relationship holds:

Λ=STCS\Lambda = S^T \cdot C \cdot S

Transformation to New Coordinates

In relation to the basis v1,,vnv_1, \ldots, v_n, new coordinates are defined as y=STxy = S^T x. What's interesting is that the variables yiy_i become independent and normally distributed with variance λi\lambda_i:

yiN(0,λi),i=1,,ny_i \sim N(0, \lambda_i), \quad i = 1, \ldots, n

These variables yiy_i are called the principal components of xx. The principal component with the largest variance λi\lambda_i describes the main influence factor of the observed process.

The analogy is like when you observe cloud movement in the sky. There are many factors affecting cloud movement, but westerly winds might have the greatest influence. The first principal component is like the main wind direction that contributes most to the cloud movement pattern.

Geometric Visualization

Geometrically, principal component analysis can be understood as a way to find the most optimal direction to represent data. Imagine data scattered like a cloud of points in two-dimensional space. Principal components show the direction where data has maximum variability.

Principal Component Analysis Visualization in R2\mathbb{R}^2
Transformation from original coordinates to principal factor directions that capture maximum data variability.

In the visualization above, Variable 1 and Variable 2 represent the original coordinates of your data. Meanwhile, Factor 1 and Factor 2 show the new principal component directions. Notice how the factor directions are not aligned with the original axes, but rather follow the actual data distribution pattern.

Factor 1 shows the direction with the greatest variability in the data, while Factor 2 shows the direction with the second greatest variability that is perpendicular to Factor 1. This transformation allows us to better understand the data structure because principal components capture the variability patterns that actually exist in the data.