# Nakafa Learning Content

> For AI agents: use [llms.txt](https://nakafa.com/llms.txt) for the site index. Markdown versions are available by appending `.md` to content URLs or sending `Accept: text/markdown`.

URL: https://nakafa.com/en/subjects/mathematics/statistics-regression/product-moment-correlation
Source: https://raw.githubusercontent.com/nakafaai/nakafa.com/refs/heads/main/packages/contents/material/lesson/mathematics/statistics-regression/product-moment-correlation/en.mdx

Calculate Pearson's correlation coefficient (r) to measure linear relationships. Learn formulas, scatter plot analysis, and interpret r values from -1 to +1.

---

## What Is Product Moment Correlation?

Product Moment Correlation, often called **Pearson Correlation** or simply denoted by **$$r$$**, is the most commonly used statistical measure to determine how strong and in what direction the **linear relationship** (straight-line pattern) is between two quantitative variables (numbers).

Visible text: Product Moment Correlation, often called **Pearson Correlation** or simply denoted by ****, is the most commonly used statistical measure to determine how strong and in what direction the **linear relationship** (straight-line pattern) is between two quantitative variables (numbers).

The value of $$r$$ tells us whether the two variables tend to move in the same direction (positive), opposite directions (negative), or if there is no linear relationship at all.

Visible text: The value of tells us whether the two variables tend to move in the same direction (positive), opposite directions (negative), or if there is no linear relationship at all.

## Correlation from Scatter Diagrams

The most intuitive way to understand the value of $$r$$ is by looking at how the data points are scattered on a diagram:

Visible text: The most intuitive way to understand the value of is by looking at how the data points are scattered on a diagram:

### Strong Positive Correlation

$$r$$ approaching $$+1$$ means both variables tend to move in the same
direction.

Visible text: approaching means both variables tend to move in the same
direction.

Component: ScatterDiagram
Props:
- title: Example of Strong Positive Correlation
- description: Data points cluster very closely forming an upward-sloping straight line pattern.
- xAxisLabel: Variable X
- yAxisLabel: Variable Y
- datasets: [
{
name: "Data",
color: "var(--chart-1)",
points: [
{ x: 1, y: 2 },
{ x: 2, y: 3.1 },
{ x: 3, y: 3.9 },
{ x: 4, y: 5.2 },
{ x: 5, y: 6.1 },
{ x: 6, y: 6.8 },
{ x: 7, y: 8.1 },
{ x: 8, y: 9.0 },
],
},
]
- calculateRegressionLine: true
- showResiduals: true
- regressionLineStyle: { color: "var(--chart-4)" }

If your data points look like this (rising from bottom left to top right and tightly clustered), the $$r$$ value will be close to $$+1$$.

Visible text: If your data points look like this (rising from bottom left to top right and tightly clustered), the value will be close to .

### Weak Positive Correlation

$$r$$ being positive but close to $$0$$ means both variables tend to
move in the same direction, but not very strongly.

Visible text: being positive but close to means both variables tend to
move in the same direction, but not very strongly.

Component: ScatterDiagram
Props:
- title: Example of Weak Positive Correlation
- description: Points tend to rise, but are more spread out from the straight line.
- xAxisLabel: Variable X
- yAxisLabel: Variable Y
- datasets: [
{
name: "Data",
color: "var(--chart-2)",
points: [
{ x: 1, y: 1 },
{ x: 2, y: 4 },
{ x: 3, y: 2 },
{ x: 4, y: 6 },
{ x: 5, y: 5 },
{ x: 6, y: 8 },
{ x: 7, y: 6 },
{ x: 8, y: 9 },
],
},
]
- calculateRegressionLine: true
- showResiduals: true
- regressionLineStyle: { color: "var(--chart-4)" }

If the points still show an upward trend but are more scattered like this, the $$r$$ value is positive but smaller (closer to $$0$$).

Visible text: If the points still show an upward trend but are more scattered like this, the value is positive but smaller (closer to ).

### Strong Negative Correlation

$$r$$ approaching $$-1$$ means both variables tend to move in opposite
directions.

Visible text: approaching means both variables tend to move in opposite
directions.

Component: ScatterDiagram
Props:
- title: Example of Strong Negative Correlation
- description: Data points cluster very closely forming a downward-sloping straight line pattern.
- xAxisLabel: Variable X
- yAxisLabel: Variable Y
- datasets: [
{
name: "Data",
color: "var(--chart-3)",
points: [
{ x: 1, y: 10 },
{ x: 2, y: 8.9 },
{ x: 3, y: 8.1 },
{ x: 4, y: 6.8 },
{ x: 5, y: 6.0 },
{ x: 6, y: 5.1 },
{ x: 7, y: 4.0 },
{ x: 8, y: 3.1 },
],
},
]
- calculateRegressionLine: true
- showResiduals: true
- regressionLineStyle: { color: "var(--chart-4)" }

If the points fall from top left to bottom right and are very tightly clustered, the $$r$$ value will be close to $$-1$$.

Visible text: If the points fall from top left to bottom right and are very tightly clustered, the value will be close to .

### No Linear Correlation

$$r$$ approaching 0 means the two variables have no linear relationship.

Visible text: approaching 0 means the two variables have no linear relationship.

Component: ScatterDiagram
Props:
- title: Example of No Linear Correlation
- description: Data points are scattered randomly without forming a straight line pattern.
- xAxisLabel: Variable X
- yAxisLabel: Variable Y
- datasets: [
{
name: "Data",
color: "var(--chart-5)",
points: [
{ x: 1, y: 5 },
{ x: 2, y: 2 },
{ x: 3, y: 7 },
{ x: 4, y: 4 },
{ x: 5, y: 9 },
{ x: 6, y: 3 },
{ x: 7, y: 8 },
{ x: 8, y: 1 },
],
},
]
- calculateRegressionLine: true
- showResiduals: true
- regressionLineStyle: { color: "var(--chart-4)" }

When the points are scattered randomly without a clear linear pattern, the $$r$$ value will be close to $$0$$.

Visible text: When the points are scattered randomly without a clear linear pattern, the value will be close to .

## How Is the Correlation Coefficient Calculated?

The Pearson correlation coefficient ($$r$$) essentially measures how **synchronously** two variables ($$X$$ and $$Y$$) move relative to their own variations.

Visible text: The Pearson correlation coefficient () essentially measures how **synchronously** two variables ( and ) move relative to their own variations.

**Imagine this:**

1.  **Individual Variation:**

    Each variable ($$X$$ and $$Y$$) has its own variability. Some values fluctuate a lot (large variation), while others are stable (small variation). This is measured by $$SS_{xx}$$ for X and $$SS_{yy}$$ for Y (formulas below).

2.  **Joint Variation (Covariance):**

    We also need to know how $$X$$ and $$Y$$ vary _together_. When X increases, does Y also tend to increase? Or decrease? This measure of joint variation is called **covariance**, calculated using $$SS_{xy}$$.

    - If $$SS_{xy}$$ is large and positive: $$X$$ and $$Y$$ often move in the same direction.
    - If $$SS_{xy}$$ is large and negative: $$X$$ and $$Y$$ often move in opposite directions.
    - If $$SS_{xy}$$ is close to zero: No clear pattern of joint movement.

3.  **Standardizing the Measure:**

    The problem is that the value of $$SS_{xy}$$ (covariance) is heavily influenced by the units of the data. For example, the covariance between height (cm) and weight (kg) will have a different value if we measure height in meters and weight in grams, even if the relationship is the same.

    To overcome this, we need to **standardize** the covariance measure. This is done by **dividing the covariance ($$SS_{xy}$$) by a measure of the individual variations** (adjusted using square roots: $$\sqrt{SS_{xx} SS_{yy}}$$).

    
    
    ```math
    r = \frac{\text{How much X and Y vary together}}{\text{Standardized measure of individual X and Y variations}} = \frac{SS_{xy}}{\sqrt{SS_{xx} SS_{yy}}}
    ```

Visible text: 1. **Individual Variation:**

 Each variable ( and ) has its own variability. Some values fluctuate a lot (large variation), while others are stable (small variation). This is measured by for X and for Y (formulas below).

2. **Joint Variation (Covariance):**

 We also need to know how and vary _together_. When X increases, does Y also tend to increase? Or decrease? This measure of joint variation is called **covariance**, calculated using .

 - If is large and positive: and often move in the same direction.
 - If is large and negative: and often move in opposite directions.
 - If is close to zero: No clear pattern of joint movement.

3. **Standardizing the Measure:**

 The problem is that the value of (covariance) is heavily influenced by the units of the data. For example, the covariance between height (cm) and weight (kg) will have a different value if we measure height in meters and weight in grams, even if the relationship is the same.

 To overcome this, we need to **standardize** the covariance measure. This is done by **dividing the covariance () by a measure of the individual variations** (adjusted using square roots: ).

The result of this division is **$$r$$**, the Pearson Correlation Coefficient. Because it's standardized, its value will always be between $$-1$$ and $$+1$$, regardless of the original data units. This allows us to compare the strength of linear relationships between different pairs of variables.

Visible text: The result of this division is ****, the Pearson Correlation Coefficient. Because it's standardized, its value will always be between and , regardless of the original data units. This allows us to compare the strength of linear relationships between different pairs of variables.

So, the value of $$r$$ is determined by comparing how strongly $$X$$ and $$Y$$ move together relative to how much they move individually.

Visible text: So, the value of is determined by comparing how strongly and move together relative to how much they move individually.

## Product Moment Correlation Formula

To calculate the value of $$r$$ precisely, we use formulas involving the **Sum of Squares**:

Visible text: To calculate the value of precisely, we use formulas involving the **Sum of Squares**:

```math
r = \frac{SS_{xy}}{\sqrt{SS_{xx} SS_{yy}}}
```

**What are $$SS_{xy}$$, $$SS_{xx}$$, and $$SS_{yy}$$?**

Visible text: **What are , , and ?**

These measure how varied our data is:

1.  **$$SS_{xx}$$ (Sum of Squares for $$x$$):** Measures how spread out the $$x$$ data is from its mean.

    
    
    ```math
    SS_{xx} = \sum (x - \bar{x})^2 = \sum x^2 - \frac{(\sum x)^2}{n}
    ```

2.  **$$SS_{yy}$$ (Sum of Squares for $$y$$):** Measures how spread out the $$y$$ data is from its mean.

    
    
    ```math
    SS_{yy} = \sum (y - \bar{y})^2 = \sum y^2 - \frac{(\sum y)^2}{n}
    ```

3.  **$$SS_{xy}$$ (Sum of Products of deviations for $$x$$ and $$y$$):** Measures how $$x$$ and $$y$$ vary _together_.

    
    
    ```math
    SS_{xy} = \sum (x - \bar{x})(y - \bar{y}) = \sum xy - \frac{(\sum x)(\sum y)}{n}
    ```

Visible text: 1. ** (Sum of Squares for ):** Measures how spread out the data is from its mean.

 
 

2. ** (Sum of Squares for ):** Measures how spread out the data is from its mean.

 
 

3. ** (Sum of Products of deviations for and ):** Measures how and vary _together_.

Key:

- $$n$$: Number of data pairs $$(x, y)$$.
- $$\sum x$$, $$\sum y$$: Sum of all $$x$$ and $$y$$
  values.
- $$\sum x^2$$, $$\sum y^2$$: Sum of the squares
  of each $$x$$ and $$y$$ value.
- $$\sum xy$$: Sum of the product of each $$x$$ and $$y$$ pair.
- $$\bar{x}$$, $$\bar{y}$$: Mean of $$x$$ and $$y$$ values.

Visible text: - : Number of data pairs .
- , : Sum of all and 
 values.
- , : Sum of the squares
 of each and value.
- : Sum of the product of each and pair.
- , : Mean of and values.

By calculating these three $$SS$$ values and plugging them into the formula for $$r$$, we get the Product Moment Correlation Coefficient.

Visible text: By calculating these three values and plugging them into the formula for , we get the Product Moment Correlation Coefficient.

## Interpreting the Correlation Coefficient

Once we have the value of $$r$$, we can interpret its strength and direction using the following general guidelines:

Visible text: Once we have the value of , we can interpret its strength and direction using the following general guidelines:

| Value of $$r$$      | Correlation Strength  | Description                                        |
| :------------------------------------ | :-------------------- | :------------------------------------------------- |
| $$1$$               | Perfect Positive      | All points lie exactly on an upward sloping line.  |
| $$0.7 \le r < 1$$   | Strong Positive       | Clear and strong positive linear relationship.     |
| $$0.3 < r < 0.7$$   | Moderate Positive     | Moderately visible positive linear relationship.   |
| $$0 < r \le 0.3$$   | Weak Positive         | Very low positive linear relationship.             |
| $$0$$               | No Linear Correlation | No linear relationship at all.                     |
| $$-0.3 \le r < 0$$  | Weak Negative         | Very low negative linear relationship.             |
| $$-0.7 < r < -0.3$$ | Moderate Negative     | Moderately visible negative linear relationship.   |
| $$-1 < r \le -0.7$$ | Strong Negative       | Clear and strong negative linear relationship.     |
| $$-1$$              | Perfect Negative      | All points lie exactly on a downward sloping line. |

Visible text: | Value of | Correlation Strength | Description |
| :------------------------------------ | :-------------------- | :------------------------------------------------- |
| | Perfect Positive | All points lie exactly on an upward sloping line. |
| | Strong Positive | Clear and strong positive linear relationship. |
| | Moderate Positive | Moderately visible positive linear relationship. |
| | Weak Positive | Very low positive linear relationship. |
| | No Linear Correlation | No linear relationship at all. |
| | Weak Negative | Very low negative linear relationship. |
| | Moderate Negative | Moderately visible negative linear relationship. |
| | Strong Negative | Clear and strong negative linear relationship. |
| | Perfect Negative | All points lie exactly on a downward sloping line. |