# Nakafa Framework: LLM URL: https://nakafa.com/en/subject/high-school/11/mathematics/statistics/linear-regression-concept Source: https://raw.githubusercontent.com/nakafaai/nakafa.com/refs/heads/main/packages/contents/subject/high-school/11/mathematics/statistics/linear-regression-concept/en.mdx Output docs content for large language models. --- import { ScatterDiagram } from "@repo/design-system/components/contents/scatter-diagram"; export const metadata = { title: "Linear Regression Concept", description: "Learn linear regression to create best-fit lines through data points. Understand prediction, slope calculations, and how to model variable relationships.", authors: [{ name: "Nabil Akbarazzima Fatih" }], date: "04/30/2025", subject: "Statistics", }; ## What Is Linear Regression? With [Scatter Diagrams](/subject/high-school/11/mathematics/statistics/scatter-diagram/en), we can see the relationship between two variables (X data and Y data). Now, if the points on the scatter diagram seem to form a straight pattern (there's a linear correlation, whether positive or negative), we can try to draw a straight line that best fits through the middle of that cluster of points. This line is called the **Linear Regression Line**. The process of finding this line is called **Linear Regression**. ## The "Best-Fit" Line The Linear Regression Line is often called the _best-fit_ line. Why? Because out of the many possible straight lines that could be drawn, this is the line whose position is "closest" to all the data points overall. This line attempts to summarize the trend or linear pattern present in the data. ## Example of a Regression Line Let's say we have data on study time (hours) and exam scores again. The points tend to rise (positive correlation). See the line above? That is the linear regression line. The line shows the **general trend**: the longer the study time (X increases), the exam score (Y) also tends to increase following the direction of the line. ## What is the use of this regression line? One of its main uses is for **prediction**. For example, if a new student studies for 7 hours, we can use this regression line to estimate what their exam score might be, even though we don't have exact data for 7 hours. ## Mathematical Concept The linear regression line (the _best-fit_ line) is found using a method called the **Least Squares Method**. The idea is to find the straight line that **minimizes the sum of the squared vertical distances** from each data point to the line. Mathematically, the linear regression line has the form: Where: - (read: y-hat) is the **predicted value of y** by the regression line. - is the value of the independent variable. - is the **slope** of the line, indicating how much changes for each one-unit change in . - is the **y-intercept**, which is the predicted value of when . The values of and are calculated from the data we have using the following formulas:
Formula key: - is the number of data pairs. - is the sum of all x values. - is the sum of all y values. - is the sum of the product of each x and y pair. - is the sum of the square of each x value. - is the mean of the x values ( ). - is the mean of the y values ( ). With these formulas, we can obtain the single straight line that is considered to best represent the linear relationship pattern in our data.