# Nakafa Framework: LLM URL: https://nakafa.com/en/subject/high-school/11/mathematics/statistics/coefficient-of-determination Source: https://raw.githubusercontent.com/nakafaai/nakafa.com/refs/heads/main/packages/contents/subject/high-school/11/mathematics/statistics/coefficient-of-determination/en.mdx Output docs content for large language models. --- import { ScatterDiagram } from "@repo/design-system/components/contents/scatter-diagram"; export const metadata = { title: "Coefficient of Determination", description: "Learn how r² measures how well your regression line explains data variation. Master coefficient of determination with visual examples and calculations.", authors: [{ name: "Nabil Akbarazzima Fatih" }], date: "04/30/2025", subject: "Statistics", }; ## What is the Coefficient of Determination? After finding the best-fit linear regression line for our data, the next question is: **how well does that line actually represent or explain our data?** The measure that answers this question is the **Coefficient of Determination**, denoted as **** (read: r-squared). Simply put, tells us the **proportion or percentage** of the variation (ups and downs in values) in the dependent variable (Y) that **can be explained** by the variation in the independent variable (X) using our linear regression model. ## Coefficient of Determination from a Scatter Diagram The value of is closely related to how tightly the data points cluster around the regression line: 1. **High (approaching 1 or 100%)** High } description="Data points are very close to the regression line." xAxisLabel="Variable X" yAxisLabel="Variable Y" datasets={[ { name: "Data", color: "var(--chart-1)", points: [ { x: 1, y: 2 }, { x: 2, y: 3.1 }, { x: 3, y: 3.9 }, { x: 4, y: 5.2 }, { x: 5, y: 6.1 }, { x: 6, y: 6.8 }, { x: 7, y: 8.1 }, { x: 8, y: 9.0 }, ], }, ]} calculateRegressionLine showResiduals regressionLineStyle={{ color: "var(--chart-4)" }} /> See how the data points above are very tightly packed and close to the regression line? This indicates a high value (for example, maybe around 0.95 or 95%). This means that most of the variation in Y values _can be explained_ well by the regression line (or by variable X). 2. **Low (approaching 0 or 0%)** Low } description="Data points are scattered far from the regression line." xAxisLabel="Variable X" yAxisLabel="Variable Y" datasets={[ { name: "Data", color: "var(--chart-2)", points: [ { x: 1, y: 1 }, { x: 2, y: 4 }, { x: 3, y: 2 }, { x: 4, y: 6 }, { x: 5, y: 5 }, { x: 6, y: 8 }, { x: 7, y: 6 }, { x: 8, y: 9 }, ], }, ]} calculateRegressionLine showResiduals regressionLineStyle={{ color: "var(--chart-4)" }} /> Compare this with this diagram. The points are more spread out from the regression line (the residual lines are longer). This indicates a low value (for example, maybe around 0.40 or 40%). This means that this regression line is _not very good_ at explaining the variation in Y values; only a small portion of the variation in Y can be explained by X through this model. ## Calculating the Coefficient of Determination The easiest way to calculate is by **squaring the Correlation Coefficient ()** that we learned about earlier. So, if you've already calculated the value of , just square it! Since the value of is always between -1 and +1 (), the value of will always be between 0 and 1. **Mathematically (using Sum of Squares):** The value of can also be calculated directly using the Sum of Squares values used to calculate : ## Interpretation as a Percentage The value of is often converted into a percentage (by multiplying by 100) for easier interpretation. - If , it means that **81%** of the total variation in variable Y can be explained by the variation in variable X through the linear regression model. - The remaining variation ( or 19% in this example) is explained by other factors not included in the model (could be other variables, or _random error_). The higher the percentage of , the better our linear regression model is at explaining the relationship between X and Y.