Box Plots
Create and interpret box-and-whisker plots
Box Plots
What is a Box Plot?
A box plot (also called box-and-whisker plot) is a visual way to display the distribution of data using five key numbers.
Purpose:
- Show spread of data
- Identify center of data
- Spot outliers
- Compare multiple data sets
Visual: A box with lines (whiskers) extending from each side
The Five-Number Summary
Box plots are based on five key values:
1. Minimum: Smallest value 2. Q1 (First Quartile): 25th percentile 3. Median (Q2): 50th percentile (middle value) 4. Q3 (Third Quartile): 75th percentile 5. Maximum: Largest value
Example data: 2, 4, 6, 8, 10, 12, 14, 16, 18
Minimum: 2 Q1: 6 (25% of data below this) Median: 10 (middle value) Q3: 14 (75% of data below this) Maximum: 18
Finding the Five-Number Summary
Step 1: Order the data (smallest to largest)
Step 2: Find the median (Q2)
- If odd number of values: middle value
- If even number of values: average of two middle values
Step 3: Find Q1
- Median of lower half (below Q2)
Step 4: Find Q3
- Median of upper half (above Q2)
Step 5: Find minimum and maximum
- Smallest and largest values
Example 1: 3, 7, 8, 10, 12, 15, 18, 20, 21
Already ordered n = 9 (odd)
Median (Q2): 5th value = 12
Lower half: 3, 7, 8, 10 Q1: Average of 7 and 8 = 7.5
Upper half: 15, 18, 20, 21 Q3: Average of 18 and 20 = 19
Five-number summary: Min: 3, Q1: 7.5, Median: 12, Q3: 19, Max: 21
Example 2: 5, 8, 10, 12, 15, 18
n = 6 (even)
Median: Average of 10 and 12 = 11
Lower half: 5, 8, 10 Q1: 8
Upper half: 12, 15, 18 Q3: 15
Five-number summary: Min: 5, Q1: 8, Median: 11, Q3: 15, Max: 18
Drawing a Box Plot
Step 1: Draw a number line with appropriate scale
Step 2: Mark the five-number summary above the line
Step 3: Draw a box from Q1 to Q3
Step 4: Draw a vertical line at the median inside the box
Step 5: Draw whiskers from box to min and max
Example: Five-number summary: 2, 5, 8, 12, 16
Number line from 0 to 20 Box from 5 to 12 Line at 8 inside box Left whisker from 5 to 2 Right whisker from 12 to 16
Parts of a Box Plot
The Box:
- Left edge: Q1
- Right edge: Q3
- Line inside: Median
- Width of box: Interquartile Range (IQR)
The Whiskers:
- Left whisker: From Q1 to minimum
- Right whisker: From Q3 to maximum
- Show range of lower and upper 25% of data
Important: 50% of data is inside the box!
Interquartile Range (IQR)
IQR = Q3 - Q1
Meaning: Middle 50% of data spread
Example: Q1 = 6, Q3 = 14
IQR = 14 - 6 = 8
Use: Measure of spread (variation)
Larger IQR = more spread out data Smaller IQR = more concentrated data
Reading Information from Box Plots
1. Center (Median): Where is the line inside the box?
2. Spread (Range and IQR): How far do whiskers extend? How wide is the box?
3. Symmetry: Is median in center of box? Are whiskers equal length?
4. Skewness: If right whisker longer → right-skewed (positive skew) If left whisker longer → left-skewed (negative skew)
Example: Box plot with:
- Longer right whisker
- Median closer to Q1
This is right-skewed (tail to the right) Most data on lower end
Outliers in Box Plots
Outlier: Value unusually far from the rest
Rule: A value is an outlier if:
- Less than Q1 - 1.5(IQR), OR
- Greater than Q3 + 1.5(IQR)
Example: Q1 = 8, Q3 = 16, IQR = 8
Lower boundary: 8 - 1.5(8) = 8 - 12 = -4 Upper boundary: 16 + 1.5(8) = 16 + 12 = 28
Any value below -4 or above 28 is an outlier
Displaying outliers:
- Mark with individual points (dots or asterisks)
- Draw whiskers to last non-outlier value
Example data: 5, 7, 9, 11, 13, 15, 40
40 is an outlier (way above the rest)
- Draw whisker to 15 (last non-outlier)
- Mark 40 as separate point
Modified Box Plot
Standard box plot: Whiskers extend to min and max
Modified box plot: Whiskers extend to last non-outlier
- Outliers shown as individual points
- More accurate representation when outliers present
Use modified when: Data contains outliers
Comparing Box Plots
Multiple box plots on same scale
Can compare:
1. Centers: Which median is higher?
2. Spreads: Which IQR is larger? Which range is larger?
3. Symmetry: Which is more symmetric?
4. Outliers: Which has outliers?
Example: Compare test scores for two classes
Class A: Median = 75, IQR = 10 Class B: Median = 80, IQR = 20
Analysis:
- Class B has higher median (better average)
- Class A has smaller IQR (more consistent)
- Class B more variable (some very high, some very low)
Advantages of Box Plots
1. Show five-number summary visually
2. Easy to compare multiple groups
3. Clearly identify outliers
4. Show skewness
5. Good for large data sets
6. Compact display
Disadvantages of Box Plots
1. Don't show individual values (except outliers)
2. Don't show frequency (how many at each value)
3. Don't show gaps in data
4. Can hide multiple modes (bimodal data)
5. Arbitrary outlier rule (1.5 IQR is convention)
Better for: Overall distribution and comparison Not as good for: Detailed frequency information
Creating Box Plot from Frequency Table
Example:
Value | Frequency ------|---------- 10 | 2 15 | 3 20 | 4 25 | 2 30 | 1
Step 1: List all values in order 10, 10, 15, 15, 15, 20, 20, 20, 20, 25, 25, 30
Step 2: Find five-number summary n = 12 Min: 10 Q1: 15 (median of first 6) Median: Average of 6th and 7th = (15+20)/2 = 17.5 Q3: 25 (median of last 6) Max: 30
Step 3: Draw box plot using these values
Percentiles and Box Plots
Box plot divides data into four parts (quartiles):
0% to 25%: Below Q1 (left whisker) 25% to 50%: Q1 to Median (left half of box) 50% to 75%: Median to Q3 (right half of box) 75% to 100%: Above Q3 (right whisker)
Each section contains 25% of the data!
Example: If there are 20 data points:
- 5 values below Q1
- 5 values from Q1 to median
- 5 values from median to Q3
- 5 values above Q3
Skewness from Box Plots
Symmetric:
- Median in center of box
- Equal whisker lengths
- Data evenly distributed
Right-skewed (positively skewed):
- Right whisker longer than left
- Median closer to Q1
- Tail extends to the right
- Example: Income data (few very high earners)
Left-skewed (negatively skewed):
- Left whisker longer than right
- Median closer to Q3
- Tail extends to the left
- Example: Test scores (few very low scores)
Real-World Applications
1. Comparing groups: Test scores across different classes Salaries across different companies Heights across different age groups
2. Quality control: Identify defective products (outliers) Monitor consistency (IQR)
3. Scientific data: Compare experimental results Analyze measurement variation
4. Sports statistics: Compare player performance Analyze team statistics
5. Business: Sales data across regions Customer satisfaction scores
Example Problem: Complete Analysis
Data: Daily temperatures (°F) for two weeks 68, 70, 72, 74, 75, 76, 78, 80, 81, 82, 83, 85, 88, 90
Find five-number summary:
Min: 68 Q1: 73 (average of 72 and 74) Median: 78.5 (average of 78 and 80) Q3: 83.5 (average of 83 and 85) Max: 90
Find IQR: IQR = 83.5 - 73 = 10.5
Check for outliers: Lower boundary: 73 - 1.5(10.5) = 73 - 15.75 = 57.25 Upper boundary: 83.5 + 1.5(10.5) = 83.5 + 15.75 = 99.25
No outliers (all data between 57.25 and 99.25)
Describe distribution:
- Right-skewed (right whisker slightly longer)
- No outliers
- IQR of 10.5 shows moderate variation
- Median of 78.5 is typical temperature
Double Box Plots
Two box plots on same scale for comparison
Example: Boys vs. Girls test scores
Boys: Min 60, Q1 70, Med 78, Q3 85, Max 92 Girls: Min 65, Q1 75, Med 82, Q3 88, Max 95
Draw both on same number line (vertically stacked)
Compare:
- Girls have higher median (82 vs 78)
- Girls have slightly larger IQR (13 vs 15)
- Girls have higher minimum and maximum
- Overall, girls performed better
Common Mistakes to Avoid
-
Not ordering data first Must arrange in order before finding quartiles!
-
Confusing median and mean Box plot uses median, not mean
-
Wrong quartile calculation Different methods exist, be consistent
-
Misidentifying outliers Use 1.5 IQR rule correctly
-
Drawing to scale incorrectly Number line must be evenly spaced
-
Forgetting to label Always label number line and title graph
-
Misreading whiskers Whiskers go to actual min/max (or last non-outlier)
Box Plot vs Other Displays
Box Plot vs Histogram:
- Box plot: Shows five-number summary, quartiles
- Histogram: Shows frequency, shape of distribution
Box Plot vs Dot Plot:
- Box plot: Summary, good for large data
- Dot plot: Individual values, good for small data
Box Plot vs Stem-and-Leaf:
- Box plot: Visual summary
- Stem-and-leaf: Preserves actual values
Use box plot when: Comparing groups, showing quartiles, large data sets
Technology for Box Plots
Graphing calculators:
- TI-84: STAT → PLOT → Modified Box Plot
- Enter data in lists
- Adjust window
- TRACE to see five-number summary
Software:
- Excel: Insert → Chart → Box and Whisker
- Google Sheets: Similar feature
- Online tools: Many free box plot generators
Advantages: Quick, accurate, can handle large data sets
Quick Reference
Five-Number Summary: Min, Q1, Median, Q3, Max
IQR: Q3 - Q1 (middle 50% spread)
Outlier Rule: Below Q1 - 1.5(IQR) or above Q3 + 1.5(IQR)
Box: From Q1 to Q3 (contains middle 50%)
Whiskers: From box to min and max (or last non-outlier)
Median line: Inside box
Skewness:
- Right-skewed: Right whisker longer
- Left-skewed: Left whisker longer
- Symmetric: Whiskers roughly equal
Practice Tips
- Always order data first
- Practice finding quartiles with odd and even data sets
- Draw to scale carefully
- Label all parts clearly
- Check for outliers using 1.5 IQR rule
- Compare multiple box plots for practice
- Understand what each part represents
- Relate to percentiles (25%, 50%, 75%)
- Practice reading and creating box plots
- Connect to real-world contexts
- Use technology to verify hand calculations
- Remember: 50% of data is in the box!
- Practice identifying skewness
- Work with both standard and modified box plots
Box plots are powerful tools for understanding data distribution and making comparisons. Master this skill and you'll have a valuable technique for analyzing data in statistics, science, and many other fields!
📚 Practice Problems
1Problem 1easy
❓ Question:
Find the five-number summary for: 3, 7, 8, 12, 13, 15, 18, 21, 23
💡 Show Solution
Step 1: Arrange data in order (already done): 3, 7, 8, 12, 13, 15, 18, 21, 23
Step 2: Find the minimum and maximum: Minimum = 3 Maximum = 23
Step 3: Find the median (Q2): There are 9 values, so the median is the 5th value. Median (Q2) = 13
Step 4: Find Q1 (median of lower half): Lower half: 3, 7, 8, 12 Q1 = (7 + 8)/2 = 7.5
Step 5: Find Q3 (median of upper half): Upper half: 15, 18, 21, 23 Q3 = (18 + 21)/2 = 19.5
Five-number summary: Min = 3, Q1 = 7.5, Q2 = 13, Q3 = 19.5, Max = 23
2Problem 2easy
❓ Question:
Calculate the interquartile range (IQR) for a data set with Q1 = 12 and Q3 = 28.
💡 Show Solution
Step 1: Recall the IQR formula: IQR = Q3 - Q1
Step 2: Substitute the values: IQR = 28 - 12
Step 3: Calculate: IQR = 16
Step 4: Interpret: The IQR is 16, which means the middle 50% of the data spans 16 units. This measures the spread of the middle half of the data.
Answer: IQR = 16
3Problem 3medium
❓ Question:
For a data set with Q1 = 20, Q3 = 35, determine if a value of 60 is an outlier.
💡 Show Solution
Step 1: Calculate the IQR: IQR = Q3 - Q1 = 35 - 20 = 15
Step 2: Calculate the outlier boundaries using the 1.5 × IQR rule: Lower boundary = Q1 - 1.5(IQR) = 20 - 1.5(15) = 20 - 22.5 = -2.5 Upper boundary = Q3 + 1.5(IQR) = 35 + 1.5(15) = 35 + 22.5 = 57.5
Step 3: Check if 60 is outside these boundaries: 60 > 57.5, so 60 is above the upper boundary.
Step 4: Conclusion: Yes, 60 is an outlier because it exceeds the upper boundary.
Any value below -2.5 or above 57.5 would be considered an outlier.
Answer: Yes, 60 is an outlier
4Problem 4medium
❓ Question:
A box plot shows Min = 5, Q1 = 12, Q2 = 18, Q3 = 25, Max = 40. Describe the distribution.
💡 Show Solution
Step 1: Calculate the IQR: IQR = Q3 - Q1 = 25 - 12 = 13
Step 2: Compare distances from median to quartiles: Distance from Q2 to Q1: 18 - 12 = 6 Distance from Q2 to Q3: 25 - 18 = 7 These are roughly equal (6 ≈ 7)
Step 3: Compare whisker lengths: Lower whisker (Q1 to Min): 12 - 5 = 7 Upper whisker (Max to Q3): 40 - 25 = 15 The upper whisker is longer.
Step 4: Determine skewness: Since the upper whisker is longer than the lower whisker, and the distances are fairly symmetric around the median, the distribution is slightly right-skewed (positively skewed).
Step 5: Additional observations:
- The box (IQR = 13) shows where the middle 50% of data lies
- Range = 40 - 5 = 35
- No obvious outliers mentioned
Answer: The distribution is approximately symmetric with a slight right skew. The middle 50% of data spans from 12 to 25.
5Problem 5hard
❓ Question:
Create a box plot for: 2, 4, 6, 7, 9, 10, 12, 15, 18, 20, 24. Identify any outliers.
💡 Show Solution
Step 1: Data is already in order. Find five-number summary: Min = 2 Q1 = 6 (median of lower half: 2, 4, 6, 7, 9) Q2 = 10 (median of all: 6th value) Q3 = 18 (median of upper half: 12, 15, 18, 20, 24) Max = 24
Step 2: Calculate IQR: IQR = Q3 - Q1 = 18 - 6 = 12
Step 3: Calculate outlier boundaries: Lower: Q1 - 1.5(IQR) = 6 - 1.5(12) = 6 - 18 = -12 Upper: Q3 + 1.5(IQR) = 18 + 1.5(12) = 18 + 18 = 36
Step 4: Check for outliers: All values (2, 4, 6, 7, 9, 10, 12, 15, 18, 20, 24) are between -12 and 36. No outliers exist.
Step 5: Draw the box plot:
- Draw a number line from 0 to 25
- Draw a box from Q1 (6) to Q3 (18)
- Draw a vertical line at the median Q2 (10) inside the box
- Draw a whisker from the box to Min (2)
- Draw a whisker from the box to Max (24)
Answer: Five-number summary: 2, 6, 10, 18, 24. No outliers.
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics