Least Squares Method

What Is the Least Squares Method?

Imagine we have a set of data from observations or experiments consisting of pairs of values (x, y). If we plot these data points on a scatter diagram, we sometimes see a pattern or trend that resembles a straight line.

The question is, out of the many straight lines we could draw through these points, which is the best straight line that best represents the entire dataset?

The Least Squares Method is a mathematical procedure used to find one unique straight line that is considered the best fit for the set of data points.

Minimizing Squared Errors

The main idea is to minimize the error or residual from each data point to the prediction line. So how do we determine the "best fit" line?

Prediction Line: We try drawing a straight line ( $\hat{y} = a + bx$ ) among the data points.
Error (Residual): For each original data point ( $y_i$ ), there will be a vertical distance to the predicted line ( $\hat{y}_i$ ). This distance is called the error or residual:

$e_i = y_i - \hat{y}_i$
Minimize Sum of Squared Errors: The Least Squares Method works by finding the straight line that makes the sum of the squares of all errors ( $\sum e_i^2$ ) as small as possible. This is why it's called "Least Squares".

Why square the errors?

Squaring the errors makes all values positive, so errors above and below the line don't cancel each other out.
Larger errors contribute much more to the total sum (because they are squared), so this method strongly tries to minimize large errors.

Visualization Example

For example, a company wants to see the relationship between the advertising costs they incur (in millions of rupiah) and the number of products sold (in thousands of units). The data they collected is as follows:

Line Resulting from Least Squares Method (Ads vs Sales)

The line shows the linear trend relationship between advertising costs and sales, minimizing squared errors.

The straight line drawn on the diagram above is the best-fit line found using the Least Squares Method for this advertising cost and sales data. This line represents the general linear trend that most closely approximates all data points, and the dashed lines show the residuals being minimized.

Mathematical Basis

Mathematically, we are looking for the line with the equation:

\hat{y} = a + bx

Where the values of $a$ (intercept) and $b$ (slope) are chosen such that the value of:

\sum e_i^2 = \sum (y_i - \hat{y}_i)^2 = \sum (y_i - (a + bx_i))^2

is minimized.

Through calculus (which we don't need to derive here), formulas are found to obtain the values of $a$ and $b$ that satisfy this condition:

b = \frac{n(\sum xy) - (\sum x)(\sum y)}{n(\sum x^2) - (\sum x)^2}

a = \bar{y} - b\bar{x}

Formula key:

$n$ = Number of data pairs.
$\sum x$ , $\sum y$ = Sum of all x and y values.
$\sum xy$ = Sum of the product of each x and y pair.
$\sum x^2$ = Sum of the square of each x value.
$\bar{x}$ = Mean of x ( $\frac{\sum x}{n}$ ).
$\bar{y}$ = Mean of y ( $\frac{\sum y}{n}$ ).

Thus, the Least Squares Method provides a systematic and objective way to find the best straight line representing the linear trend in the data based on the principle of minimizing the sum of squared errors.

Command Palette

What Is the Least Squares Method?

Minimizing Squared Errors

Visualization Example

Mathematical Basis