Measures of Spread

Range, IQR, standard deviation, and variance

Measures of Spread

Introduction

While measures of center tell us the "typical" value, measures of spread (also called measures of variability or dispersion) tell us how spread out or variable the data is. Two datasets can have the same mean but very different spreads!

Example:

  • Class A scores: 70, 72, 73, 74, 75 (Mean = 72.8, very consistent)
  • Class B scores: 50, 60, 73, 80, 100 (Mean = 72.6, highly variable)

Both classes have similar means, but Class B has much more spread!

Range

Definition

Range: The difference between the maximum and minimum values

Formula: Range=Maxโˆ’MinRange = Max - Min

Calculating Range

Example 1: Test scores: 68, 75, 82, 91, 88

  • Max = 91
  • Min = 68
  • Range = 91 - 68 = 23 points

Example 2: Temperatures (ยฐF): 45, 52, 58, 51, 62, 48

  • Max = 62
  • Min = 45
  • Range = 62 - 45 = 17ยฐF

Properties of Range

Advantages: โœ“ Very easy to calculate and understand
โœ“ Gives sense of total spread
โœ“ Useful for quick assessment

Disadvantages: โŒ Only uses two values (ignores all others)
โŒ Extremely sensitive to outliers
โŒ Doesn't tell us about distribution between min and max
โŒ Increases with sample size (larger samples tend to have more extreme values)

Example of outlier sensitivity:

Without outlier: 10, 12, 13, 14, 15
Range = 15 - 10 = 5

With outlier: 10, 12, 13, 14, 15, 50
Range = 50 - 10 = 40

One outlier dramatically changed the range!

When to Use Range

Appropriate for:

  • Quick, rough sense of spread
  • Knowing the extreme values matters
  • Quality control (acceptable range of values)

Not appropriate when:

  • Outliers present
  • Need precise measure of variability
  • Comparing datasets of different sizes

Interquartile Range (IQR)

Definition

IQR: The range of the middle 50% of data

Formula: IQR=Q3โˆ’Q1IQR = Q3 - Q1

Where:

  • Q1 = First quartile (25th percentile)
  • Q3 = Third quartile (75th percentile)

Finding Quartiles and IQR

Step 1: Order data from smallest to largest

Step 2: Find median (Q2)

Step 3: Find median of lower half = Q1

Step 4: Find median of upper half = Q3

Step 5: Calculate IQR = Q3 - Q1

Example

Data: 12, 15, 17, 19, 20, 22, 25, 28, 30, 35, 40

Step 1: Already ordered

Step 2: Median (Q2) = 22 (middle value, n=11)

Step 3: Lower half: 12, 15, 17, 19, 20
Q1 = 17 (median of lower half)

Step 4: Upper half: 25, 28, 30, 35, 40
Q3 = 30 (median of upper half)

Step 5: IQR = 30 - 17 = 13

Interpretation: The middle 50% of data spans 13 units

Properties of IQR

Advantages: โœ“ Resistant to outliers (uses middle 50% only)
โœ“ More stable than range
โœ“ Useful with skewed data
โœ“ Basis for outlier detection (1.5 ร— IQR rule)

Disadvantages: โŒ Ignores 50% of data (lowest 25%, highest 25%)
โŒ Less mathematically useful than standard deviation
โŒ Harder to calculate than range

Using IQR to Identify Outliers

1.5 ร— IQR Rule:

Lower fence: Q1โˆ’1.5ร—IQRQ1 - 1.5 \times IQR
Upper fence: Q3+1.5ร—IQRQ3 + 1.5 \times IQR

Outliers: Values below lower fence or above upper fence

Example (from previous):

  • Q1 = 17, Q3 = 30, IQR = 13
  • Lower fence = 17 - 1.5(13) = 17 - 19.5 = -2.5
  • Upper fence = 30 + 1.5(13) = 30 + 19.5 = 49.5
  • Any values < -2.5 or > 49.5 are outliers

When to Use IQR

Appropriate when: โœ“ Distribution is skewed
โœ“ Outliers are present
โœ“ Want resistant measure
โœ“ Describing boxplots

Paired with: Median (both resistant measures)

Variance and Standard Deviation

Why We Need Them

Range and IQR don't use all data values. Variance and standard deviation measure average distance from the mean using ALL data points.

Variance (s2s^2)

Definition: Average squared deviation from the mean

Formula (sample variance): s2=โˆ‘(xiโˆ’xห‰)2nโˆ’1s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}

Steps to calculate:

  1. Find mean (xห‰\bar{x})
  2. Find each deviation: (xiโˆ’xห‰)(x_i - \bar{x})
  3. Square each deviation: (xiโˆ’xห‰)2(x_i - \bar{x})^2
  4. Sum squared deviations: โˆ‘(xiโˆ’xห‰)2\sum(x_i - \bar{x})^2
  5. Divide by nโˆ’1n-1

Note: We divide by nโˆ’1n-1 (not nn) for sample variance. This is called Bessel's correction and gives a better estimate of population variance.

Standard Deviation (ss)

Definition: Square root of variance

Formula (sample standard deviation): s=โˆ‘(xiโˆ’xห‰)2nโˆ’1s = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n-1}}

Why take square root?

  • Variance is in squared units (pointsยฒ, dollarsยฒ)
  • Standard deviation returns to original units (points, dollars)
  • More interpretable!

Example Calculation

Data: 10, 12, 14, 16, 18

Step 1: Find mean xห‰=10+12+14+16+185=705=14\bar{x} = \frac{10+12+14+16+18}{5} = \frac{70}{5} = 14

Step 2: Find deviations and square them

| xix_i | xiโˆ’xห‰x_i - \bar{x} | (xiโˆ’xห‰)2(x_i - \bar{x})^2 | |---------|---------------------|------------------------| | 10 | -4 | 16 | | 12 | -2 | 4 | | 14 | 0 | 0 | | 16 | 2 | 4 | | 18 | 4 | 16 |

Step 3: Sum squared deviations โˆ‘(xiโˆ’xห‰)2=16+4+0+4+16=40\sum(x_i - \bar{x})^2 = 16 + 4 + 0 + 4 + 16 = 40

Step 4: Calculate variance s2=405โˆ’1=404=10s^2 = \frac{40}{5-1} = \frac{40}{4} = 10

Step 5: Calculate standard deviation s=10โ‰ˆ3.16s = \sqrt{10} \approx 3.16

Interpretation: On average, values deviate from the mean by about 3.16 units.

Properties of Standard Deviation

Interpretation:

  • Typical distance from mean
  • Larger SD = more spread out
  • Smaller SD = more clustered around mean
  • SD = 0 only when all values are identical

Properties:

  • Always โ‰ฅ 0
  • Same units as original data
  • Sensitive to outliers (because we square deviations)
  • Used in many statistical procedures

Empirical Rule (for roughly normal distributions):

  • About 68% of data within 1 SD of mean
  • About 95% of data within 2 SD of mean
  • About 99.7% of data within 3 SD of mean

When to Use Standard Deviation

Appropriate when: โœ“ Distribution is roughly symmetric
โœ“ No extreme outliers
โœ“ Want to use all data
โœ“ Need for statistical inference
โœ“ Describing normal distributions

Paired with: Mean (both use all data, both sensitive to outliers)

Not appropriate when: โŒ Distribution is heavily skewed
โŒ Outliers present
โŒ Want resistant measure

Choosing the Right Measure

Decision Framework

Distribution Shape:

Symmetric, no outliers:

  • Center: Mean
  • Spread: Standard deviation
  • "The mean is [value] with a standard deviation of [value]"

Skewed or outliers present:

  • Center: Median
  • Spread: IQR
  • "The median is [value] with an IQR of [value]"

Comparison Table

| Measure | Resistant? | Uses All Data? | Units | |----------------------|------------|----------------|-----------------| | Range | No | No (only 2) | Original | | IQR | Yes | No (middle 50%)| Original | | Variance | No | Yes | Squared | | Standard Deviation | No | Yes | Original |

Effect of Transformations

Adding/Subtracting a Constant

Adding cc to every value:

  • Range: Unchanged
  • IQR: Unchanged
  • SD: Unchanged

Example: Convert test scores from points to percent by adding 50

  • Original SD = 5 points
  • New SD = 5 percent
  • Spread didn't change, just units!

Multiplying/Dividing by a Constant

Multiplying every value by cc:

  • Range: Multiplied by โˆฃcโˆฃ|c|
  • IQR: Multiplied by โˆฃcโˆฃ|c|
  • SD: Multiplied by โˆฃcโˆฃ|c|
  • Variance: Multiplied by c2c^2

Example: Convert heights from inches to centimeters (multiply by 2.54)

  • Original SD = 3 inches
  • New SD = 3 ร— 2.54 = 7.62 cm

Coefficient of Variation

Definition

Coefficient of Variation (CV): Ratio of standard deviation to mean

Formula: CV=sxห‰ร—100%CV = \frac{s}{\bar{x}} \times 100\%

Purpose

Compare variability across different units or scales

Example:

  • Heights: Mean = 66 inches, SD = 3 inches
    CV = (3/66) ร— 100% = 4.5%

  • Weights: Mean = 150 lbs, SD = 20 lbs
    CV = (20/150) ร— 100% = 13.3%

Weights are more variable relative to their mean than heights!

When to Use CV

โœ“ Comparing datasets with different units
โœ“ Comparing datasets with very different means
โœ“ Wanting relative (not absolute) measure of spread

Common Mistakes

โŒ Using SD with skewed data
Use IQR instead!

โŒ Forgetting units
Range, IQR, SD all have units!

โŒ Confusing variance and SD
Variance is squared units, SD is original units

โŒ Dividing by nn instead of nโˆ’1n-1
Sample SD uses nโˆ’1n-1 (degrees of freedom)

โŒ Reporting spread without center
Always report both!

โŒ Comparing SDs of very different datasets
Consider CV for fair comparison

Quick Reference

Range:

  • Formula: Maxโˆ’MinMax - Min
  • When: Quick assessment
  • Property: Sensitive to outliers

IQR:

  • Formula: Q3โˆ’Q1Q3 - Q1
  • When: Skewed data, outliers
  • Property: Resistant

Standard Deviation:

  • Formula: s=โˆ‘(xiโˆ’xห‰)2nโˆ’1s = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n-1}}
  • When: Symmetric, no outliers
  • Property: Uses all data

Choosing:

  • Symmetric โ†’ Mean & SD
  • Skewed โ†’ Median & IQR

Outlier Rule:

  • Outliers beyond Q1โˆ’1.5ร—IQRQ1 - 1.5 \times IQR or Q3+1.5ร—IQRQ3 + 1.5 \times IQR

Remember: Spread is just as important as center! Two datasets can have the same mean but completely different spreads. Always report both center AND spread when describing data!

๐Ÿ“š Practice Problems

1Problem 1easy

โ“ Question:

Calculate the range for this dataset: 45, 52, 48, 61, 55, 49, 58

๐Ÿ’ก Show Solution

Step 1: Identify minimum and maximum Data: 45, 52, 48, 61, 55, 49, 58

Minimum value = 45 Maximum value = 61

Step 2: Calculate range Range = Maximum - Minimum Range = 61 - 45 Range = 16

Step 3: Interpret The data spans 16 units Difference between highest and lowest values Simple measure of spread, but affected by outliers

Answer: Range = 16

2Problem 2easy

โ“ Question:

Given this five-number summary: Min=20, Q1=35, Median=50, Q3=65, Max=90. Calculate the IQR and range.

๐Ÿ’ก Show Solution

Step 1: Calculate IQR IQR = Q3 - Q1 IQR = 65 - 35 IQR = 30

Step 2: Calculate Range Range = Max - Min Range = 90 - 20 Range = 70

Step 3: Compare the two measures IQR = 30 (middle 50% of data spans 30 units) Range = 70 (all data spans 70 units)

Step 4: Interpret IQR is resistant to outliers (only uses middle 50%) Range is sensitive to outliers (uses extremes) IQR is better for skewed data

Answer: IQR = 30, Range = 70

3Problem 3medium

โ“ Question:

Two classes took the same test. Both have a mean of 75. Class A has a standard deviation of 5, and Class B has a standard deviation of 15. What does this tell you about the two classes?

๐Ÿ’ก Show Solution

Step 1: Understand standard deviation SD measures average distance from the mean Higher SD = more spread out Lower SD = more clustered around mean

Step 2: Analyze Class A (SD = 5) Scores tightly clustered around mean of 75 Most students scored close to 75 Typical deviation from mean: about 5 points Likely range: roughly 65-85 (most within 2 SD) Very consistent performance

Step 3: Analyze Class B (SD = 15) Scores widely spread around mean of 75 More variability in performance Typical deviation from mean: about 15 points Likely range: roughly 45-105 (most within 2 SD) Very inconsistent performance

Step 4: Compare the classes Class A: Homogeneous, similar ability levels, consistent Class B: Heterogeneous, mixed ability levels, varied

Possible explanations for Class B:

  • Some students very prepared, others not
  • Wider range of abilities
  • Some students may have guessed more
  • More diverse backgrounds/preparation

Step 5: Teaching implications Class A: Whole-class instruction may work well Class B: May need differentiated instruction

Answer: Class A (SD=5) has students performing very similarly, all close to 75. Class B (SD=15) has much more variability - some students did very well, others poorly. Both classes average the same, but Class B is much more spread out.

4Problem 4medium

โ“ Question:

Calculate the standard deviation for this small dataset: 2, 4, 6, 8, 10

๐Ÿ’ก Show Solution

Step 1: Calculate the mean Mean = (2 + 4 + 6 + 8 + 10) / 5 = 30 / 5 = 6

Step 2: Calculate deviations from mean Value | Deviation from mean 2 | 2 - 6 = -4 4 | 4 - 6 = -2 6 | 6 - 6 = 0 8 | 8 - 6 = 2 10 | 10 - 6 = 4

Step 3: Square the deviations (-4)ยฒ = 16 (-2)ยฒ = 4 (0)ยฒ = 0 (2)ยฒ = 4 (4)ยฒ = 16

Step 4: Find average of squared deviations (variance) For sample: divide by (n - 1) = 4 Variance = (16 + 4 + 0 + 4 + 16) / 4 = 40 / 4 = 10

Step 5: Take square root (standard deviation) SD = โˆš10 โ‰ˆ 3.16

Step 6: Interpret On average, values deviate about 3.16 units from the mean of 6 Makes sense: values are 2, 4, 6, 8, 10 (spread from -4 to +4)

Note: We used (n-1) because this is sample data For population, we'd use n

Answer: s โ‰ˆ 3.16

5Problem 5hard

โ“ Question:

Compare and contrast range, IQR, and standard deviation as measures of spread. When should you use each?

๐Ÿ’ก Show Solution

RANGE:

Definition: Maximum - Minimum

Advantages:

  • Very easy to calculate
  • Easy to understand
  • Shows total spread

Disadvantages:

  • Uses only 2 values (ignores all others)
  • Extremely sensitive to outliers
  • Doesn't show where data is concentrated

When to use:

  • Quick rough measure
  • When outliers aren't a concern
  • Small datasets

INTERQUARTILE RANGE (IQR):

Definition: Q3 - Q1 (middle 50% spread)

Advantages:

  • Resistant to outliers
  • Shows spread of middle 50%
  • Good with skewed data
  • Used to identify outliers

Disadvantages:

  • Ignores outer 50% of data
  • Doesn't use all information
  • Less precise than SD

When to use:

  • Skewed distributions
  • Data with outliers
  • Paired with median
  • Five-number summary

STANDARD DEVIATION (SD):

Definition: โˆš[ฮฃ(x - xฬ„)ยฒ / (n-1)] Average distance from mean

Advantages:

  • Uses ALL data values
  • Mathematically precise
  • Best for normal distributions
  • Standard in statistics
  • Used in inference

Disadvantages:

  • Not resistant to outliers
  • Hard to calculate by hand
  • Less intuitive
  • Assumes interval data

When to use:

  • Symmetric distributions
  • Normal distributions
  • No major outliers
  • Paired with mean
  • Statistical inference

SUMMARY TABLE: Resistant? Range: NO, IQR: YES, SD: NO Uses all data? Range: NO, IQR: NO, SD: YES Easy to calculate? Range: YES, IQR: MEDIUM, SD: NO Best for skewed data? Range: NO, IQR: YES, SD: NO Best for normal data? Range: NO, IQR: NO, SD: YES

PAIRING: Mean + SD (symmetric data, no outliers) Median + IQR (skewed data, outliers present)

Answer: Use range for quick estimates. Use IQR for skewed data or outliers (resistant). Use SD for normal distributions (uses all data, best statistical properties). Match with mean (SD) or median (IQR).