What is Mean?
Besides Median (middle value) and Mode (most frequent value), we have another important way to see the "center" of data, which is the Mean or Average.
The Mean is the value we get if we distribute the total sum of all data evenly among all data members. Imagine you have several candies in different amounts, then you collect them all and divide them equally among your friends, that's the concept of Mean.
Calculating the Mean
Calculating the Mean is very simple:
Mean Formula:
Where:
- (read "x bar") is the symbol for Mean.
- (read "sigma x") means the Total sum of all data values (x).
- is the number of data points.
Case Study
Let's look at an example to make it clearer.
OSIS Social Action
Initial Situation:
The Student Council (OSIS) of School A (10 members) collected usable used clothes. The number of clothes collected by each member is:
3, 5, 7, 10, 5, 3, 4, 6, 9, 8
Finding the Initial Mean, Median, and Mode:
-
Sort the data (for Median & Mode):
3, 3, 4, 5, 5, 6, 7, 8, 9, 10 (there are 10 data points, )
-
Calculate the Mean:
So, the initial Mean = 6.
-
Find the Median:
The number of data points is even (). The middle data are at positions and .
The value at position 5 is 5, the value at position 6 is 6.
So, the initial Median = 5.5.
-
Find the Mode:
The most frequent values are 3 (2 times) and 5 (2 times).
So, the initial Mode = 3 and 5 (bimodal).
New Situation:
The next day, 2 other students joined and donated 20 and 22 items of clothing.
The new data becomes:
3, 3, 4, 5, 5, 6, 7, 8, 9, 10, 20, 22
Finding the New Mean, Median, and Mode:
-
Sort the data:
3, 3, 4, 5, 5, 6, 7, 8, 9, 10, 20, 22 (there are 12 data points, )
-
Calculate the New Mean:
So, the new Mean = 8.5.
-
Find the New Median:
The number of data points is even (). The middle data are at positions and .
The value at position 6 is 6, the value at position 7 is 7.
So, the new Median = 6.5.
-
Find the New Mode:
The most frequent values are still 3 (2 times) and 5 (2 times).
So, the new Mode = 3 and 5.
Impact of Adding Data
Let's observe the changes in the measures of central tendency from the initial to the new situation:
- Mean: Changed from 6 to 8.5 (increased by 2.5)
- Median: Changed from 5.5 to 6.5 (increased by 1)
- Mode: Remained 3 and 5 (unchanged)
What can we conclude? Adding two new data points (20 and 22), whose values are quite far from the initial data, had the most significant impact on the Mean. The Mean value was "pulled" by the larger new data values.
The Median also changed, but its change was not as large as the Mean's. The Mode didn't change at all.
Impact of Extreme Data (Outliers)
What if one of the new data points was very extreme? For example, the 12th student donated 100 items, not 22.
Data becomes: 3, 3, 4, 5, 5, 6, 7, 8, 9, 10, 20, 100 ()
-
Calculate the Extreme Mean:
The Mean becomes 15! Very far from the initial Mean (6) or the previous new Mean (8.5).
-
Find the Extreme Median:
Sorted data: 3, 3, 4, 5, 5, 6, 7, 8, 9, 10, 20, 100
Median is still . (Same as the previous new situation)
-
Find the Extreme Mode:
Mode remains 3 and 5.
Very extreme data (called outliers) greatly affect the Mean, but hardly affect the Median and Mode. This is the weakness of the Mean; it is sensitive to outliers. Median and Mode are more "robust" against outliers.
So, the Mean is a simple arithmetic average, but we need to be careful if there are data points with values very different from the rest, as they can make the Mean less representative.