Measures of Center

Mean, median, and mode

Measures of Center

Introduction

Measures of center describe the "typical" or "middle" value in a dataset. They help us answer: "What is a representative value?" The three main measures — mean, median, and mode — each have different properties and appropriate uses.

The Mean

Definition

Mean (xˉ\bar{x}): The arithmetic average

Formula: xˉ=xin=x1+x2+...+xnn\bar{x} = \frac{\sum x_i}{n} = \frac{x_1 + x_2 + ... + x_n}{n}

Where:

  • xi\sum x_i = sum of all values
  • nn = number of observations

Calculating the Mean

Example 1: Test scores: 85, 90, 78, 92, 88

xˉ=85+90+78+92+885=4335=86.6\bar{x} = \frac{85 + 90 + 78 + 92 + 88}{5} = \frac{433}{5} = 86.6

Mean test score = 86.6 points

Example 2: Heights (in inches): 64, 67, 65, 70, 64

xˉ=64+67+65+70+645=3305=66\bar{x} = \frac{64 + 67 + 65 + 70 + 64}{5} = \frac{330}{5} = 66

Mean height = 66 inches

Properties of the Mean

Uses all data:

  • Every value contributes
  • Change any value, mean changes
  • Adding up all deviations from mean = 0

Balance point:

  • If data were on a number line with equal weights, mean is where it would balance
  • Sum of distances below mean = sum of distances above mean

Sensitive to outliers:

  • Extreme values pull mean toward them
  • One very high/low value can change mean substantially

Example showing outlier effect:

Without outlier: 10, 12, 11, 13, 12
xˉ=585=11.6\bar{x} = \frac{58}{5} = 11.6

With outlier: 10, 12, 11, 13, 12, 100
xˉ=1586=26.3\bar{x} = \frac{158}{6} = 26.3

The outlier (100) dramatically increased the mean from 11.6 to 26.3!

When to Use the Mean

Appropriate when: ✓ Distribution is roughly symmetric
✓ No extreme outliers
✓ Need to use all data values
✓ Want mathematical properties (use in further calculations)

Not appropriate when: ❌ Distribution is heavily skewed
❌ Outliers present
❌ Want resistant measure
❌ Data is ordinal (ranked) only

The Median

Definition

Median: The middle value when data is ordered

  • 50th percentile
  • Splits data in half
  • Half values below, half above

Finding the Median

Step 1: Order data from smallest to largest

Step 2: Find middle position

If nn is odd: Median = middle value
Position = n+12\frac{n+1}{2}

If nn is even: Median = average of two middle values
Positions = n2\frac{n}{2} and n2+1\frac{n}{2} + 1

Examples

Example 1 (odd n): Scores: 78, 85, 90, 82, 88

Step 1: Order: 78, 82, 85, 88, 90
Step 2: n=5n = 5 (odd), position = 5+12=3\frac{5+1}{2} = 3
Median = 85 (the 3rd value)

Example 2 (even n): Scores: 78, 85, 90, 82, 88, 92

Step 1: Order: 78, 82, 85, 88, 90, 92
Step 2: n=6n = 6 (even), positions = 3 and 4
Step 3: Values are 85 and 88
Median = 85+882=86.5\frac{85 + 88}{2} = 86.5

Properties of the Median

Resistant to outliers:

  • Position-based, not value-based
  • Extreme values don't affect it much
  • More stable measure for skewed data

Example: Data: 10, 12, 11, 13, 12 → Median = 12
With outlier: 10, 12, 11, 13, 12, 100 → Median = 12

The outlier didn't change the median!

50-50 split:

  • Half the data ≤ median
  • Half the data ≥ median
  • Useful for understanding data distribution

Not affected by exact values:

  • Only needs order and middle position
  • Works well for ordinal data (rankings)

When to Use the Median

Appropriate when: ✓ Distribution is skewed
✓ Outliers are present
✓ Want resistant measure
✓ Data is ordinal (ordered categories)
✓ Interested in "typical" individual

Examples where median is better:

  • Income (right-skewed, few very high earners)
  • Home prices (right-skewed, few very expensive homes)
  • Reaction times (right-skewed, occasional very slow responses)

The Mode

Definition

Mode: The most frequently occurring value

  • Can have one mode (unimodal)
  • Can have multiple modes (bimodal, multimodal)
  • Can have no mode (all values occur once)

Finding the Mode

Count frequency of each value, identify most common

Example 1: Scores: 85, 90, 85, 92, 88, 85

  • 85 appears 3 times
  • 90, 92, 88 each appear once
  • Mode = 85

Example 2: Scores: 85, 90, 85, 92, 90, 88

  • 85 appears twice
  • 90 appears twice
  • Modes = 85 and 90 (bimodal)

Example 3: Scores: 85, 90, 92, 88, 82

  • All values appear once
  • No mode

When to Use the Mode

Appropriate when: ✓ Categorical data
✓ Want most common value
✓ Describing bimodal distributions

Examples:

  • "The most common car color is white" (mode of categorical data)
  • "The distribution is bimodal with peaks at 65 and 72" (describing shape)

Not very useful for: ❌ Continuous numerical data (values rarely repeat)
❌ Summarizing center of distribution

Comparing Mean and Median

Relationship to Distribution Shape

Symmetric distribution: MeanMedianMean \approx Median

Both measures give similar values, either can be used

Right-skewed distribution: Mean>MedianMean > Median

Mean pulled right by high values in tail
Median more representative of "typical" value

Left-skewed distribution: Mean<MedianMean < Median

Mean pulled left by low values in tail
Median more representative of "typical" value

Visual Representation

Symmetric: Mean and median at same location (center of distribution)

Right-skewed: Mean to the right of median (toward tail)

Left-skewed: Mean to the left of median (toward tail)

Choosing Between Mean and Median

Use Mean when:

  • Distribution is symmetric
  • No outliers or extreme skewness
  • Want to use all data
  • Need for further calculations (variance, hypothesis tests)

Use Median when:

  • Distribution is skewed
  • Outliers are present
  • Want resistant measure
  • Ordinal data
  • Interested in "typical" individual rather than arithmetic average

Real-world example: Income

Town income data:

  • Median income: 45,000 dollars
  • Mean income: 75,000 dollars

Mean is much higher because a few very wealthy residents pull it up. The median of 45,000 dollars better represents the "typical" resident's income.

Weighted Mean

Definition

Weighted Mean: When values have different importance or frequency

Formula: xˉw=wixiwi\bar{x}_w = \frac{\sum w_i x_i}{\sum w_i}

Where:

  • wiw_i = weight for each value
  • xix_i = data value

Example: Course Grade

Your course grade is calculated as:

  • Tests: 60% of grade (weight = 0.60)
  • Homework: 25% of grade (weight = 0.25)
  • Final: 15% of grade (weight = 0.15)

Scores:

  • Test average: 85
  • Homework average: 92
  • Final exam: 78

Weighted mean: xˉw=0.60(85)+0.25(92)+0.15(78)\bar{x}_w = 0.60(85) + 0.25(92) + 0.15(78) =51+23+11.7=85.7= 51 + 23 + 11.7 = 85.7

Course grade = 85.7%

Note: Cannot just average 85, 92, and 78 because they have different weights!

Trimmed Mean

Definition

Trimmed Mean: Mean calculated after removing extreme values

Common: 5% trimmed mean (remove lowest 5% and highest 5%)

Purpose

  • More resistant than regular mean
  • Still uses most of data
  • Compromise between mean and median

Example

Data (ordered): 10, 12, 13, 14, 15, 16, 17, 18, 19, 100

Regular mean: 23410=23.4\frac{234}{10} = 23.4 (affected by outlier 100)

10% trimmed mean: Remove lowest 10% (10) and highest 10% (100)
12+13+14+15+16+17+18+198=15.5\frac{12+13+14+15+16+17+18+19}{8} = 15.5

Trimmed mean (15.5) more representative than regular mean (23.4)

Common Mistakes

Using mean with skewed data
Use median instead!

Forgetting to order data for median
Always sort first!

Reporting mode for continuous data
Usually not meaningful when values don't repeat

Not specifying units
Always include units (inches, dollars, points, etc.)

Confusing which measure to use
Consider shape and outliers

Calculating mean of percentages
May need weighted mean if groups are different sizes

Quick Reference

Mean:

  • Formula: xˉ=xin\bar{x} = \frac{\sum x_i}{n}
  • When: Symmetric, no outliers
  • Property: Uses all data, sensitive to extremes
  • Symbol: xˉ\bar{x} (sample), μ\mu (population)

Median:

  • Method: Middle value when ordered
  • When: Skewed, outliers present
  • Property: Resistant, 50-50 split
  • Symbol: M or x~\tilde{x}

Mode:

  • Method: Most frequent value
  • When: Categorical data, describe shape
  • Property: Can have multiple or none

Relationship to shape:

  • Symmetric: Mean ≈ Median
  • Right-skewed: Mean > Median
  • Left-skewed: Mean < Median

Remember: The best measure of center depends on the distribution's shape and the presence of outliers. When in doubt, report both mean and median!

📚 Practice Problems

1Problem 1easy

Question:

Calculate the mean and median for this dataset: 8, 12, 15, 15, 18, 20, 22

💡 Show Solution

Step 1: Calculate the mean Mean = sum of all values / number of values Sum = 8 + 12 + 15 + 15 + 18 + 20 + 22 = 110 Number of values (n) = 7 Mean = 110 / 7 ≈ 15.71

Step 2: Calculate the median Data is already in order: 8, 12, 15, 15, 18, 20, 22 n = 7 (odd number) Median position = (n + 1) / 2 = (7 + 1) / 2 = 4th value Median = 15

Step 3: Verify Count: 1st, 2nd, 3rd, 4th, 5th, 6th, 7th Values: 8, 12, 15, [15], 18, 20, 22 ↑ median (4th value)

Answer: Mean ≈ 15.71, Median = 15

2Problem 2hard

Question:

A dataset has a mean of 50 and a median of 50. If you add a new value of 100 to the dataset, will the mean or median change more? Explain your reasoning.

💡 Show Solution

Step 1: Understand the initial condition Mean = 50, Median = 50 This suggests symmetric distribution Data is balanced around 50

Step 2: Analyze effect on MEAN The mean uses ALL values in its calculation New mean = (sum of old values + 100) / (n + 1)

Adding 100 (which is 50 above the current mean):

  • Pulls the mean UP
  • Amount depends on sample size
  • But definitely increases

If n = 9 (10 values total after adding 100):

  • Old sum ≈ 9 × 50 = 450
  • New sum = 450 + 100 = 550
  • New mean = 550 / 10 = 55
  • Change: +5 points

Step 3: Analyze effect on MEDIAN The median only depends on MIDDLE position(s) Adding one value:

  • Changes sample size from n to n+1
  • May shift which value(s) are in middle
  • But only by one position

If n was odd (say 9): old median was 5th value If n is now even (10): new median is average of 5th and 6th values The value 100 goes to the end, doesn't become a middle value Median shifts only slightly (maybe to 50.5 or 51 depending on data)

Step 4: Compare magnitude of changes Mean: Increased significantly (we calculated +5 for n=9) Median: Increased minimally (maybe +0 to +2 at most)

The mean is SENSITIVE to extreme values The median is RESISTANT to extreme values

Answer: The MEAN will change more. It's sensitive to all values, especially outliers. Adding 100 (far above 50) pulls the mean up substantially. The median is resistant - it only depends on middle positions, so adding one extreme value has minimal effect.

3Problem 3medium

Question:

Five students scored: 85, 90, 88, 92, and 95 on a test. A sixth student who was absent takes the test and scores 40. How does this affect the mean and median?

💡 Show Solution

Step 1: Calculate original statistics Original data: 85, 88, 90, 92, 95 (already ordered) n = 5

Original mean = (85 + 88 + 90 + 92 + 95) / 5 = 450 / 5 = 90 Original median = 3rd value = 90

Step 2: Add the new score New data: 40, 85, 88, 90, 92, 95 (ordered) n = 6

New mean = (40 + 85 + 88 + 90 + 92 + 95) / 6 = 490 / 6 ≈ 81.67 New median = average of 3rd and 4th values = (88 + 90) / 2 = 89

Step 3: Calculate changes Mean: 90 → 81.67 Change = -8.33 points (decreased by 9.3%)

Median: 90 → 89 Change = -1 point (decreased by 1.1%)

Step 4: Explain the difference Mean is NOT RESISTANT: Affected greatly by outliers The score of 40 is much lower than others, pulling mean down significantly

Median is RESISTANT: Only depends on middle values Adding one value only shifts the middle position slightly

Answer: Mean dropped from 90 to 81.67 (decrease of 8.33) Median dropped from 90 to 89 (decrease of 1) The mean was much more affected by the outlier than the median.

4Problem 4medium

Question:

A company has 10 employees with salaries: 30k,30k, 32k, 35k,35k, 35k, 38k,38k, 40k, 42k,42k, 45k, 48k,andtheCEOmakes48k, and the CEO makes 300k. Which measure of center (mean or median) better represents the "typical" employee salary? Explain.

💡 Show Solution

Step 1: Calculate both measures Data: 30, 32, 35, 35, 38, 40, 42, 45, 48, 300 (in thousands) n = 10

Mean = (30 + 32 + 35 + 35 + 38 + 40 + 42 + 45 + 48 + 300) / 10 = 645 / 10 = $64.5k

Median = average of 5th and 6th values = (38 + 40) / 2 = $39k

Step 2: Compare to actual data 9 employees make: 30k30k-48k (most around 35k35k-45k) 1 employee (CEO) makes: $300k

Mean (64.5k):Higherthanwhat9outof10employeesmake!Median(64.5k): Higher than what 9 out of 10 employees make! Median (39k): Right in the middle of what most employees make

Step 3: Determine which is better The mean is heavily influenced by the CEO's salary 64.5kdoesntrepresentwhata"typical"employeemakesMostemployeesmakemuchlessthan64.5k doesn't represent what a "typical" employee makes Most employees make much less than 64.5k

The median is resistant to the outlier $39k represents the middle of employee salaries Half make more, half make less

Step 4: Make recommendation Median is better here because:

  1. Data is strongly skewed right (one extreme value)
  2. Mean is misleading (inflated by CEO)
  3. Median represents actual middle of employee salaries
  4. If asked "what's a typical salary?" - $39k is more accurate

Answer: MEDIAN (39k)betterrepresentstypicalsalary.Themean(39k) better represents typical salary. The mean (64.5k) is inflated by the CEO's $300k salary. With skewed data and outliers, median is the better measure of center.

5Problem 5hard

Question:

For what type of distributions should you use the mean vs. median as the measure of center? Provide examples.

💡 Show Solution

USE THE MEAN when:

  1. Distribution is symmetric

    • Mean and median will be approximately equal
    • Mean uses all data points (more information)
    • Example: Heights, test scores (when roughly normal)
  2. No outliers or extreme values

    • Mean won't be distorted
    • All values contribute equally
    • Example: Temperatures in summer months
  3. You want a measure that uses all data

    • Mean incorporates every value
    • More sensitive to changes
    • Example: Quality control where all measurements matter
  4. Normal distribution

    • Mean is the best measure
    • Optimal statistical properties
    • Example: IQ scores, measurement errors

USE THE MEDIAN when:

  1. Distribution is skewed

    • Median not affected by skew
    • Better represents "typical" value
    • Example: Income (right-skewed), home prices
  2. Outliers are present

    • Median is resistant/robust
    • Not influenced by extreme values
    • Example: Salaries with CEO, test scores with one failure
  3. Ordinal data

    • When data is ranked/ordered but differences aren't equal
    • Can find middle rank
    • Example: Satisfaction ratings (1-5 scale)
  4. Open-ended distributions

    • When highest/lowest values are unknown
    • Example: Income ">$200k", age "65+"

SUMMARY TABLE: Symmetric, no outliers → Use MEAN Skewed or outliers → Use MEDIAN Want all data used → Use MEAN Want resistant measure → Use MEDIAN

Answer: Use mean for symmetric distributions without outliers (normal data). Use median for skewed distributions or data with outliers (income, housing prices). Median is resistant; mean uses all data.