# Nakafa Learning Content

> For AI agents: use [llms.txt](https://nakafa.com/llms.txt) for the site index. Markdown versions are available by appending `.md` to content URLs or sending `Accept: text/markdown`.

URL: https://nakafa.com/en/subjects/mathematics/statistics-regression/coefficient-of-determination
Source: https://raw.githubusercontent.com/nakafaai/nakafa.com/refs/heads/main/packages/contents/material/lesson/mathematics/statistics-regression/coefficient-of-determination/en.mdx

Learn how r² measures how well your regression line explains data variation. Learn coefficient of determination with visual examples and calculations.

---

## What is the Coefficient of Determination?

After finding the best-fit linear regression line for our data, the next question is: **how well does that line actually represent or explain our data?**

The measure that answers this question is the **Coefficient of Determination**, denoted as **$$r^2$$** (read: r-squared).

Visible text: The measure that answers this question is the **Coefficient of Determination**, denoted as **** (read: r-squared).

Simply put, $$r^2$$ tells us the **proportion or percentage** of the variation (ups and downs in values) in the dependent variable (Y) that **can be explained** by the variation in the independent variable (X) using our linear regression model.

Visible text: Simply put, tells us the **proportion or percentage** of the variation (ups and downs in values) in the dependent variable (Y) that **can be explained** by the variation in the independent variable (X) using our linear regression model.

## Coefficient of Determination from a Scatter Diagram

The value of $$r^2$$ is closely related to how tightly the data points cluster around the regression line:

Visible text: The value of is closely related to how tightly the data points cluster around the regression line:

1. **High $$r^2$$ (approaching $$1$$ or $$100\%$$)**

   <ScatterDiagram
     title={
       <>
         High $$r^2$$
       </>
     }
     description="Data points are very close to the regression line."
     xAxisLabel="Variable X"
     yAxisLabel="Variable Y"
     datasets={[
       {
         name: "Data",
         color: "var(--chart-1)",
         points: [
           { x: 1, y: 2 },
           { x: 2, y: 3.1 },
           { x: 3, y: 3.9 },
           { x: 4, y: 5.2 },
           { x: 5, y: 6.1 },
           { x: 6, y: 6.8 },
           { x: 7, y: 8.1 },
           { x: 8, y: 9.0 },
         ],
       },
     ]}
     calculateRegressionLine
     showResiduals
     regressionLineStyle={{ color: "var(--chart-4)" }}
   />

   See how the data points above are very tightly packed and close to the regression line? This indicates a high $$r^2$$ value (for example, maybe around $$0.95$$ or $$95\%$$). This means that most of the variation in $$Y$$ values _can be explained_ well by the regression line (or by variable $$X$$).

2. **Low $$r^2$$ (approaching $$0$$ or $$0\%$$)**

   <ScatterDiagram
     title={
       <>
         Low $$r^2$$
       </>
     }
     description="Data points are scattered far from the regression line."
     xAxisLabel="Variable X"
     yAxisLabel="Variable Y"
     datasets={[
       {
         name: "Data",
         color: "var(--chart-2)",
         points: [
           { x: 1, y: 1 },
           { x: 2, y: 4 },
           { x: 3, y: 2 },
           { x: 4, y: 6 },
           { x: 5, y: 5 },
           { x: 6, y: 8 },
           { x: 7, y: 6 },
           { x: 8, y: 9 },
         ],
       },
     ]}
     calculateRegressionLine
     showResiduals
     regressionLineStyle={{ color: "var(--chart-4)" }}
   />

   Compare this with this diagram. The points are more spread out from the regression line (the residual lines are longer). This indicates a low $$r^2$$ value (for example, maybe around $$0.40$$ or $$40\%$$). This means that this regression line is _not very good_ at explaining the variation in $$Y$$ values; only a small portion of the variation in $$Y$$ can be explained by $$X$$ through this model.

Visible text: 1. **High (approaching or )**

 <ScatterDiagram
 title={
 <>
 High 
 </>
 }
 description="Data points are very close to the regression line."
 xAxisLabel="Variable X"
 yAxisLabel="Variable Y"
 datasets={[
 {
 name: "Data",
 color: "var(--chart-1)",
 points: [
 { x: 1, y: 2 },
 { x: 2, y: 3.1 },
 { x: 3, y: 3.9 },
 { x: 4, y: 5.2 },
 { x: 5, y: 6.1 },
 { x: 6, y: 6.8 },
 { x: 7, y: 8.1 },
 { x: 8, y: 9.0 },
 ],
 },
 ]}
 calculateRegressionLine
 showResiduals
 regressionLineStyle={{ color: "var(--chart-4)" }}
 />

 See how the data points above are very tightly packed and close to the regression line? This indicates a high value (for example, maybe around or ). This means that most of the variation in values _can be explained_ well by the regression line (or by variable ).

2. **Low (approaching or )**

 <ScatterDiagram
 title={
 <>
 Low 
 </>
 }
 description="Data points are scattered far from the regression line."
 xAxisLabel="Variable X"
 yAxisLabel="Variable Y"
 datasets={[
 {
 name: "Data",
 color: "var(--chart-2)",
 points: [
 { x: 1, y: 1 },
 { x: 2, y: 4 },
 { x: 3, y: 2 },
 { x: 4, y: 6 },
 { x: 5, y: 5 },
 { x: 6, y: 8 },
 { x: 7, y: 6 },
 { x: 8, y: 9 },
 ],
 },
 ]}
 calculateRegressionLine
 showResiduals
 regressionLineStyle={{ color: "var(--chart-4)" }}
 />

 Compare this with this diagram. The points are more spread out from the regression line (the residual lines are longer). This indicates a low value (for example, maybe around or ). This means that this regression line is _not very good_ at explaining the variation in values; only a small portion of the variation in can be explained by through this model.

## Calculating the Coefficient of Determination

The easiest way to calculate $$r^2$$ is by **squaring the Correlation Coefficient ($$r$$)** that we learned about earlier.

Visible text: The easiest way to calculate is by **squaring the Correlation Coefficient ()** that we learned about earlier.

```math
r^2 = (r)^2
```

So, if you've already calculated the value of $$r$$, just square it!

Visible text: So, if you've already calculated the value of , just square it!

Since the value of $$r$$ is always between $$-1$$ and $$+1$$ ($$-1 \le r \le 1$$), the value of $$r^2$$ will always be between $$0$$ and $$1$$.

Visible text: Since the value of is always between and (), the value of will always be between and .

```math
0 \le r^2 \le 1
```

**Mathematically (using Sum of Squares):**

The value of $$r^2$$ can also be calculated directly using the Sum of Squares values used to calculate $$r$$:

Visible text: The value of can also be calculated directly using the Sum of Squares values used to calculate :

```math
r^2 = \frac{(SS_{xy})^2}{SS_{xx} SS_{yy}}
```

## Interpretation as a Percentage

The value of $$r^2$$ is often converted into a percentage (by multiplying by $$100$$) for easier interpretation.

Visible text: The value of is often converted into a percentage (by multiplying by ) for easier interpretation.

- If $$r^2 = 0.81$$, it means that **$$81\%$$** of the total variation in variable $$Y$$ can be explained by the variation in variable $$X$$ through the linear regression model.
- The remaining variation ($$1 - r^2$$ or $$19\%$$ in this example) is explained by other factors not included in the model (could be other variables, or _random error_).

Visible text: - If , it means that **** of the total variation in variable can be explained by the variation in variable through the linear regression model.
- The remaining variation ( or in this example) is explained by other factors not included in the model (could be other variables, or _random error_).

The higher the percentage of $$r^2$$, the better our linear regression model is at explaining the relationship between $$X$$ and $$Y$$.

Visible text: The higher the percentage of , the better our linear regression model is at explaining the relationship between and .