Scatter Plots
Create and interpret scatter plots
Scatter Plots
How do you visualize the relationship between two variables? Scatter plots reveal patterns, trends, and correlations in data! They're essential tools for data analysis in science, business, sports, and everyday life.
What Is a Scatter Plot?
A scatter plot displays pairs of numerical data as points on a coordinate plane.
Purpose:
- Show relationship between two variables
- Identify patterns or trends
- Detect correlations
- Spot outliers
Structure:
- x-axis: Independent variable (what you control or choose)
- y-axis: Dependent variable (what you measure or observe)
- Points: Each represents one data pair (x, y)
Creating a Scatter Plot
Steps:
- Collect data pairs (x, y)
- Choose appropriate scale for axes
- Label axes with variable names and units
- Plot each point
- Don't connect the points!
- Give the plot a title
Example: Study time vs. test scores
| Study Hours (x) | Test Score (y) | |-----------------|----------------| | 1 | 65 | | 2 | 70 | | 3 | 75 | | 4 | 85 | | 5 | 90 |
Plot points: (1, 65), (2, 70), (3, 75), (4, 85), (5, 90)
Title: "Study Time vs. Test Scores" x-axis: Hours of Study y-axis: Test Score (%)
Types of Correlation
Correlation describes the relationship between variables.
Positive Correlation:
- As x increases, y increases
- Points trend upward from left to right
- Example: Study time vs. test scores
Negative Correlation:
- As x increases, y decreases
- Points trend downward from left to right
- Example: Absences vs. test scores
No Correlation:
- No clear pattern
- Points scattered randomly
- Example: Shoe size vs. test scores
Strength of Correlation
Strong Correlation:
- Points close to forming a line
- Clear pattern
- Easy to predict y from x
Moderate Correlation:
- Some scatter, but pattern visible
- General trend exists
Weak Correlation:
- Points very scattered
- Barely visible pattern
- Hard to predict
No Correlation:
- Completely random scatter
- No pattern at all
Describing Scatter Plots
Complete description includes:
- Type: Positive, negative, or no correlation
- Strength: Strong, moderate, or weak
- Form: Linear or non-linear
- Outliers: Any unusual points
Example descriptions:
"Strong positive linear correlation"
- Points close to a line
- Clear upward trend
"Moderate negative linear correlation"
- General downward trend
- Some scatter
"No correlation"
- Random scatter
- No pattern
Line of Best Fit (Trend Line)
A line of best fit (or trend line) is a straight line that best represents the data.
Purpose:
- Shows overall trend
- Helps make predictions
- Represents relationship simply
Characteristics:
- Goes through the "middle" of the data
- Roughly equal points above and below
- Minimizes distance to all points
Drawing a trend line:
- Look at overall pattern
- Draw line through middle of points
- Balance points above and below
- Line should follow the trend
Note: Use a ruler for straight line!
Making Predictions
Use the trend line to predict values!
Interpolation:
- Predicting within the data range
- More reliable
- Example: Data from x = 1 to 10, predict for x = 5
Extrapolation:
- Predicting outside the data range
- Less reliable (trend may not continue)
- Example: Data from x = 1 to 10, predict for x = 15
Example: Trend line equation: y = 5x + 60
Predict test score for 6 hours of study: y = 5(6) + 60 = 30 + 60 = 90
Prediction: 90%
Outliers
An outlier is a point that doesn't fit the pattern.
Characteristics:
- Far from other points
- Far from trend line
- Unusual data value
Possible causes:
- Measurement error
- Recording error
- Unusual circumstance
- Genuine unusual case
Example: In study time vs. test scores, point (5, 40) would be an outlier
- High study time but low score
- Doesn't fit positive correlation
- Might indicate student was sick on test day
Reading Scatter Plots
Example: Temperature vs. Ice Cream Sales
Scatter plot shows positive correlation.
What it tells us:
- Warmer temperatures → more ice cream sales
- As x (temperature) increases, y (sales) increases
- Relationship is approximately linear
- Strong correlation (points close to line)
What it DOESN'T tell us:
- Causation (does temperature cause sales? Or vice versa? Or both influenced by summer?)
- Exact sales for each temperature (just general trend)
Correlation vs. Causation
IMPORTANT: Correlation ≠ Causation!
Correlation: Two variables are related
Causation: One variable CAUSES the other
Example 1: Ice cream sales vs. drowning incidents
- Correlation: Both increase in summer
- Causation: Ice cream doesn't cause drowning!
- Confounding variable: Hot weather (summer)
Example 2: Study time vs. test scores
- Correlation: Yes, positive
- Causation: Likely yes - studying helps scores
- Makes logical sense!
Golden rule: Correlation suggests possible relationship, but doesn't prove cause!
Real-World Applications
Education:
- Study time vs. grades
- Class attendance vs. performance
- Practice problems completed vs. test scores
Sports:
- Training hours vs. performance
- Height vs. vertical jump
- Speed vs. distance
Health:
- Exercise vs. heart rate
- Age vs. bone density
- Screen time vs. sleep quality
Business:
- Advertising spending vs. sales
- Price vs. demand
- Experience vs. salary
Science:
- Temperature vs. chemical reaction rate
- Fertilizer amount vs. plant growth
- Pressure vs. volume (gases)
Example Analysis
Data: Hours of TV per day vs. Hours of Sleep
| TV Hours (x) | Sleep Hours (y) | |--------------|-----------------| | 1 | 8.5 | | 2 | 8 | | 3 | 7.5 | | 4 | 7 | | 5 | 6 | | 6 | 5.5 |
Analysis:
- Type: Negative correlation
- Strength: Strong (points close to line)
- Form: Linear
- Interpretation: More TV watching associated with less sleep
- Outliers: None visible
- Trend line: Approximately y = -0.5x + 9
Prediction: For 7 hours of TV: y = -0.5(7) + 9 = -3.5 + 9 = 5.5 hours of sleep
Non-Linear Patterns
Not all scatter plots are linear!
Curved patterns:
- Quadratic (parabola shape)
- Exponential (rapid increase/decrease)
- Other curves
When to note:
- If pattern is clearly curved, mention it!
- "Non-linear relationship"
- May need different type of model (beyond Grade 8)
Example: Distance fallen vs. time (gravity)
- Curved pattern (quadratic)
- Not best fit with straight line
Common Mistakes to Avoid
❌ Mistake 1: Connecting the dots
- Wrong: Draw lines between consecutive points
- Right: Plot points separately, then draw trend line
❌ Mistake 2: Forcing a correlation
- Sometimes there really is NO correlation
- Random scatter is a valid pattern (or lack of pattern!)
❌ Mistake 3: Assuming causation from correlation
- Correlation doesn't prove cause-and-effect
- Look for confounding variables
❌ Mistake 4: Extrapolating too far
- Predictions far outside data range are unreliable
- Trends may not continue indefinitely
❌ Mistake 5: Ignoring outliers
- Outliers are important!
- They might be errors OR interesting exceptions
Creating Good Scatter Plots
Best practices:
1. Choose appropriate scales:
- Include all data points
- Don't waste space
- Use convenient intervals
2. Label clearly:
- Both axes with variable names
- Include units
- Give descriptive title
3. Plot accurately:
- Use graph paper or technology
- Precise point placement
- Double-check coordinates
4. Don't force patterns:
- Describe what you see
- Be honest about weak correlations
Using Technology
Graphing calculators and software can:
- Plot points automatically
- Calculate line of best fit (regression line)
- Find correlation coefficient (r)
- Make predictions easily
Correlation coefficient (r):
- Number from -1 to 1
- r = 1: Perfect positive correlation
- r = -1: Perfect negative correlation
- r = 0: No correlation
- |r| > 0.7: Strong correlation
- 0.3 < |r| < 0.7: Moderate correlation
- |r| < 0.3: Weak correlation
Problem-Solving Strategy
Analyzing scatter plots:
- Look at overall pattern
- Identify type of correlation
- Assess strength
- Note any outliers
- Draw or identify trend line
- Describe in complete sentences
Making predictions:
- Find or draw trend line
- Identify equation if given
- Substitute x-value
- Calculate y-value
- State prediction with units
Quick Reference
Parts of Scatter Plot:
- x-axis: Independent variable
- y-axis: Dependent variable
- Points: Data pairs
- Trend line: Line of best fit
Types of Correlation:
- Positive: ↗ (as x ↑, y ↑)
- Negative: ↘ (as x ↑, y ↓)
- None: random scatter
Strength:
- Strong: points close to line
- Moderate: some scatter
- Weak: very scattered
- None: random
Predictions:
- Interpolation: within data range (reliable)
- Extrapolation: outside data range (less reliable)
Practice Tips
Tip 1: Look for real patterns
- Don't force a correlation if it's not there
- Weak/no correlation is a valid observation!
Tip 2: Consider the context
- Does the relationship make sense?
- Could there be a confounding variable?
Tip 3: Check outliers carefully
- Might be errors to fix
- Or interesting special cases to investigate
Tip 4: Use descriptive language
- "Strong positive linear correlation"
- "Moderate negative correlation with outlier at (x, y)"
- Be specific!
Summary
Scatter plots display relationships between two numerical variables:
Key features:
- Points represent data pairs
- x-axis: independent variable
- y-axis: dependent variable
- Don't connect the points!
Correlation types:
- Positive: both increase together
- Negative: one increases, other decreases
- None: no pattern
Analysis includes:
- Type and strength of correlation
- Form (linear or non-linear)
- Outliers
- Trend line for predictions
Important notes:
- Correlation ≠ causation
- Interpolation > extrapolation
- Outliers tell stories too!
Scatter plots are powerful tools for visualizing data, identifying trends, and making predictions in countless real-world situations!
📚 Practice Problems
1Problem 1easy
❓ Question:
A scatter plot shows hours studied on the x-axis and test scores on the y-axis. As hours increase, scores increase. What type of correlation is this?
💡 Show Solution
When both variables increase together, the correlation is positive.
The points trend upward from left to right.
Answer: Positive correlation
2Problem 2easy
❓ Question:
A scatter plot shows temperature and heating costs. As temperature increases, heating costs decrease. What type of correlation is this?
💡 Show Solution
When one variable increases and the other decreases, the correlation is negative.
The points trend downward from left to right.
Answer: Negative correlation
3Problem 3medium
❓ Question:
A scatter plot has trend line equation y = 3x + 10. Predict y when x = 7.
💡 Show Solution
Substitute x = 7 into the equation:
y = 3(7) + 10
y = 21 + 10
y = 31
Answer: y = 31
4Problem 4medium
❓ Question:
Data shows ice cream sales and sunglasses sales both increase in summer. Is this correlation or causation?
💡 Show Solution
Both increase together (positive correlation), but ice cream sales don't CAUSE sunglasses sales.
Both are caused by warm weather - a confounding variable.
This is correlation but NOT causation.
Answer: Correlation, not causation
5Problem 5hard
❓ Question:
A scatter plot shows strong positive correlation between study time (1-10 hours) and test scores. The trend line is y = 5x + 50. Is it reasonable to predict a score of 200 for 30 hours of study?
💡 Show Solution
Using the equation: y = 5(30) + 50 = 200
However, this is EXTRAPOLATION (outside data range of 1-10 hours).
Also, test scores likely have a maximum (100%), so 200 is unrealistic.
The trend may not continue beyond the data range.
Answer: No, not reasonable - extrapolation is unreliable and exceeds realistic test scores
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics