Measuring Spread with Variance and Standard Deviation
There's a powerful way to measure how spread out our data is: Variance and Standard Deviation.
These two measures tell us how far, on average, each data point deviates from the group's mean (average).
- If the Variance or Standard Deviation value is small, it means the data points in the group tend to be uniform or similar, clustering close to the mean value.
- If the value is large, it means the data points are more varied or diverse, spreading further away from the mean value.
Formulas for Variance and Standard Deviation
-
Variance ()
Variance is the average of the squared differences of each data point from the mean. Confused? Simply put: calculate the difference between each data point and the mean, square the result, then average those squared differences.
The formula is:
Where:
- = Variance
- = Value of the i-th data point
- = Mean (average) of the data
- = Number of data points
- = Sum all the calculated results
-
Standard Deviation ()
Standard deviation is more commonly used because its unit is the same as the original data unit (variance has squared units). It's simple: just take the square root of the variance.
The formula is:
Comparing Variance in Two Age Groups
Let's use an example of two groups of age data to see how variance and standard deviation work. These groups are interesting because they have the same mean (), which is 16, but their data spreads are different.
- Group One (): 13, 14, 15, 15, 16, 16, 17, 17, 17, 17, 17, 18
- Group Two (): 1, 3, 4, 5, 7, 8, 12, 27, 28, 29, 32, 36
Now, let's calculate the Variance and Standard Deviation for each group.
Calculation for Group 1
We calculate for each data point in Group 1 ():
- (2 data points)
- (2 data points)
- (5 data points)
Now sum all these squared results ():
Calculate the Variance:
Calculate the Standard Deviation:
Calculation for Group 2
We calculate for each data point in Group 2 ():
Sum all these squared results ():
Calculate the Variance:
Calculate the Standard Deviation:
Interpreting the Results
- Variance of Group 1 () is much smaller than the Variance of Group 2 ().
- Standard Deviation of Group 1 () is also much smaller than the Standard Deviation of Group 2 ().
These results show that the age data in Group 1 is very clustered and uniform around the mean of 16, whereas the age data in Group 2 is widely spread out far from the mean of 16.
Alternative Formula for Variance
There's another way to calculate variance that is sometimes easier with a calculator or computer, especially for large datasets. The formula is:
This formula says: "Calculate the square of each data point and sum them up (), divide by n. Then subtract the square of the mean ()".
Let's recalculate the variance for Group 1 using this formula:
-
Calculate for Group 1:
-
Calculate for Group 1:
We also know .
-
Plug into the alternative formula:
The result is exactly the same as the first method! ().