# Nakafa Framework: LLM
URL: https://nakafa.com/en/subject/high-school/11/mathematics/statistics/coefficient-of-determination
Source: https://raw.githubusercontent.com/nakafaai/nakafa.com/refs/heads/main/packages/contents/subject/high-school/11/mathematics/statistics/coefficient-of-determination/en.mdx
Output docs content for large language models.
---
import { ScatterDiagram } from "@repo/design-system/components/contents/scatter-diagram";
export const metadata = {
  title: "Coefficient of Determination",
  description: "Learn how r² measures how well your regression line explains data variation. Master coefficient of determination with visual examples and calculations.",
  authors: [{ name: "Nabil Akbarazzima Fatih" }],
  date: "04/30/2025",
  subject: "Statistics",
};
## What is the Coefficient of Determination?
After finding the best-fit linear regression line for our data, the next question is: **how well does that line actually represent or explain our data?**
The measure that answers this question is the **Coefficient of Determination**, denoted as **** (read: r-squared).
Simply put,  tells us the **proportion or percentage** of the variation (ups and downs in values) in the dependent variable (Y) that **can be explained** by the variation in the independent variable (X) using our linear regression model.
## Coefficient of Determination from a Scatter Diagram
The value of  is closely related to how tightly the data points cluster around the regression line:
1. **High  (approaching 1 or 100%)**
   
         High 
       >
     }
     description="Data points are very close to the regression line."
     xAxisLabel="Variable X"
     yAxisLabel="Variable Y"
     datasets={[
       {
         name: "Data",
         color: "var(--chart-1)",
         points: [
           { x: 1, y: 2 },
           { x: 2, y: 3.1 },
           { x: 3, y: 3.9 },
           { x: 4, y: 5.2 },
           { x: 5, y: 6.1 },
           { x: 6, y: 6.8 },
           { x: 7, y: 8.1 },
           { x: 8, y: 9.0 },
         ],
       },
     ]}
     calculateRegressionLine
     showResiduals
     regressionLineStyle={{ color: "var(--chart-4)" }}
   />
   See how the data points above are very tightly packed and close to the regression line? This indicates a high  value (for example, maybe around 0.95 or 95%). This means that most of the variation in Y values _can be explained_ well by the regression line (or by variable X).
2. **Low  (approaching 0 or 0%)**
   
         Low 
       >
     }
     description="Data points are scattered far from the regression line."
     xAxisLabel="Variable X"
     yAxisLabel="Variable Y"
     datasets={[
       {
         name: "Data",
         color: "var(--chart-2)",
         points: [
           { x: 1, y: 1 },
           { x: 2, y: 4 },
           { x: 3, y: 2 },
           { x: 4, y: 6 },
           { x: 5, y: 5 },
           { x: 6, y: 8 },
           { x: 7, y: 6 },
           { x: 8, y: 9 },
         ],
       },
     ]}
     calculateRegressionLine
     showResiduals
     regressionLineStyle={{ color: "var(--chart-4)" }}
   />
   Compare this with this diagram. The points are more spread out from the regression line (the residual lines are longer). This indicates a low  value (for example, maybe around 0.40 or 40%). This means that this regression line is _not very good_ at explaining the variation in Y values; only a small portion of the variation in Y can be explained by X through this model.
## Calculating the Coefficient of Determination
The easiest way to calculate  is by **squaring the Correlation Coefficient ()** that we learned about earlier.
So, if you've already calculated the value of , just square it!
Since the value of  is always between -1 and +1 (), the value of  will always be between 0 and 1.
**Mathematically (using Sum of Squares):**
The value of  can also be calculated directly using the Sum of Squares values used to calculate :
## Interpretation as a Percentage
The value of  is often converted into a percentage (by multiplying by 100) for easier interpretation.
- If , it means that **81%** of the total variation in variable Y can be explained by the variation in variable X through the linear regression model.
- The remaining variation ( or 19% in this example) is explained by other factors not included in the model (could be other variables, or _random error_).
The higher the percentage of , the better our linear regression model is at explaining the relationship between X and Y.