Tests for Means
One-sample and two-sample t-tests
Hypothesis Tests for Means
One-Sample t-Test
Test: Does sample provide evidence that population mean differs from claimed value?
Hypotheses:
- H₀: μ = μ₀
- Hₐ: μ ≠ μ₀ (or μ > μ₀ or μ < μ₀)
Conditions:
- Random sample
- Population approximately normal OR n ≥ 30 (CLT)
- n < 10% of population
Test Statistic:
df = n - 1
P-Value for t-Test
Use t-distribution with df = n - 1
Two-sided: P(|t| ≥ observed)
Right-sided: P(t ≥ observed)
Left-sided: P(t ≤ observed)
Calculator: tcdf
Example 1: One-Sample t-Test
Company claims mean wait time is 5 minutes. Sample: n = 25, = 5.8, s = 1.5. Test at α = 0.05.
STATE:
- μ = true mean wait time
- H₀: μ = 5
- Hₐ: μ ≠ 5
- α = 0.05
PLAN:
- One-sample t-test
- Random: Assume ✓
- Normal: n = 25, assume roughly normal ✓
- Independent: 25 < 10% of all customers ✓
DO:
df = 24
P-value = 2 × P(t ≥ 2.67) ≈ 2(0.0067) ≈ 0.013
CONCLUDE: P-value = 0.013 < 0.05, reject H₀. Sufficient evidence mean wait time differs from 5 minutes.
Two-Sample t-Test
Compare two independent groups:
Hypotheses:
- H₀: μ₁ = μ₂ (or μ₁ - μ₂ = 0)
- Hₐ: μ₁ ≠ μ₂ (or μ₁ > μ₂ or μ₁ < μ₂)
Test Statistic:
df: Use calculator (Welch's approximation) or conservative min(n₁-1, n₂-1)
Note: Do NOT pool (unlike proportions)
Conditions for Two-Sample t-Test
Both groups:
- Random/independent samples
- Each approximately normal OR both n ≥ 30
- Each n < 10% of population
Example 2: Two-Sample t-Test
Compare new vs old teaching method:
- New: n₁ = 30, = 85, s₁ = 8
- Old: n₂ = 28, = 80, s₂ = 10
STATE:
- μ₁ = mean score with new method
- μ₂ = mean score with old method
- H₀: μ₁ = μ₂
- Hₐ: μ₁ > μ₂
- α = 0.05
PLAN:
- Two-sample t-test
- Conditions: Both n ≥ 30, random, independent ✓
DO:
df ≈ 50 (calculator gives exact)
P-value = P(t ≥ 2.09) ≈ 0.021
CONCLUDE: P-value = 0.021 < 0.05, reject H₀. Sufficient evidence new method produces higher scores.
t vs z
Use t-test when:
- Population σ unknown (almost always!)
- Using sample s
Use z-test when:
- Population σ known (rare)
- Proportions (different formula)
For large n: t ≈ z (distributions nearly identical)
Checking Normality
Small samples (n < 15):
- Data must be close to normal
- Check with dotplot, boxplot, normal probability plot
- No outliers, roughly symmetric
Medium samples (15 ≤ n < 30):
- Can tolerate slight skew
- No extreme outliers
Large samples (n ≥ 30):
- CLT applies
- Can proceed unless severe outliers/skew
Robustness
t-procedures fairly robust to normality if:
- n reasonably large
- No extreme outliers
- Not severely skewed
Less robust with:
- Very small n
- Extreme outliers (affect and s)
One-Sided vs Two-Sided
Choose before seeing data!
Two-sided: Looking for any difference
One-sided: Specific direction predicted
One-sided has more power (for that direction) but:
- Can't detect effect in other direction
- Generally less conservative
Calculator Commands (TI-83/84)
One-sample: STAT → TESTS → 2:T-Test
- μ₀, , s, n, direction
- Calculate
Two-sample: STAT → TESTS → 4:2-SampTTest
- , s₁, n₁, , s₂, n₂
- Pooled: No
- Calculate
Relationship to CI
For two-sided test at α:
Equivalent: (1-α) CI contains μ₀?
- If yes → fail to reject
- If no → reject
CI more informative: Shows range of plausible values
Common Mistakes
❌ Using z when should use t
❌ Pooling variances in two-sample t-test
❌ Not checking normality with small samples
❌ Confusing one-sample with paired
❌ Using wrong df
Practical Significance
Statistical significance ≠ practical importance
Example: Large sample (n = 10,000) finds mean = 100.2 vs claimed 100
- Might be statistically significant
- But is 0.2 difference practically important?
Always consider:
- Effect size (magnitude of difference)
- Context (what matters in practice)
- Cost/benefit
Quick Reference
One-sample: , df = n - 1
Two-sample:
Conditions: Random, approximately normal (or n ≥ 30), independent
Use t (not z) when σ unknown
Remember: t-tests are workhorses of statistics. Check conditions, especially normality for small samples. Use calculator for exact P-values and df!
📚 Practice Problems
1Problem 1easy
❓ Question:
A machine fills bottles with mean 500 mL. A sample of 25 bottles has x̄ = 497 mL, s = 6 mL. Test at α = 0.05 if mean fill is less than 500 mL. Assume normality.
💡 Show Solution
Step 1: Set up hypotheses Claim: μ = 500 mL Suspect: μ < 500 mL (underfilling)
H₀: μ = 500 Hₐ: μ < 500 (one-tailed, left)
Step 2: Check conditions n = 25
RANDOM: Assume random sample ✓ NORMAL: Population normal (given) ✓
- With n = 25 < 30, need this assumption INDEPENDENT: Assume 25 ≤ 0.10N ✓
Use t-test (σ unknown)
Step 3: Calculate test statistic df = n - 1 = 24
SE = s/√n = 6/√25 = 6/5 = 1.2
t = (x̄ - μ₀)/SE = (497 - 500)/1.2 = -3/1.2 = -2.50
Step 4: Find p-value Left-tailed test df = 24, t = -2.50
From t-table: P(t < -2.50) ≈ 0.01
p-value ≈ 0.01
Step 5: Make decision p-value ≈ 0.01 α = 0.05
Is 0.01 < 0.05? YES
REJECT H₀
Step 6: State conclusion At the α = 0.05 significance level, there is sufficient evidence that the mean fill is less than 500 mL. The machine appears to be underfilling bottles.
Step 7: Practical interpretation Sample mean: 497 mL Target: 500 mL Difference: -3 mL
This 3 mL shortage is:
- Statistically significant
- Not just random variation
- Machine needs adjustment
Answer: H₀: μ = 500, Hₐ: μ < 500 Test statistic: t = -2.50 (df = 24) P-value: 0.01 Decision: Reject H₀ at α = 0.05 Conclusion: Mean fill is significantly less than 500 mL
2Problem 2easy
❓ Question:
Explain when to use a t-test versus a z-test for testing a mean.
💡 Show Solution
Step 1: The key difference Z-TEST: Population SD (σ) is KNOWN T-TEST: Population SD (σ) is UNKNOWN (use s)
Step 2: When to use Z-TEST for mean Conditions:
- σ is known (rare!)
- Random sample
- Normal population OR n ≥ 30
Test statistic: z = (x̄ - μ₀)/(σ/√n)
Use z-table for p-value
Step 3: When to use T-TEST for mean Conditions:
- σ is unknown (almost always!)
- Random sample
- Normal population (if n < 30) OR n ≥ 30 (can use CLT)
- Independent observations
Test statistic: t = (x̄ - μ₀)/(s/√n)
Use t-table with df = n - 1
Step 4: Why σ is rarely known In practice:
- If we knew σ, we'd probably know μ
- Population parameters rarely known
- Must estimate from sample
- Use s as estimate of σ
Real-world: Almost always use t-test!
Step 5: Small vs large samples
SMALL SAMPLE (n < 30): Must use T-TEST:
- Need normality assumption
- T-distribution accounts for uncertainty in s
- Heavier tails than z
- More conservative
LARGE SAMPLE (n ≥ 30): Can use T-TEST (preferred):
- CLT applies
- t-distribution → normal
- Still use t because σ unknown
Could use Z-TEST if σ known (very rare)
Step 6: T-distribution properties Depends on df = n - 1:
- Heavier tails when df small
- More probability in extremes
- Approaches normal as df → ∞
Examples: df = 5: Very heavy tails df = 30: Close to normal df = 100: Essentially normal
Step 7: Comparison for 95% critical values Z*: Always 1.96
T* depends on df: df = 5: t* = 2.571 (much larger!) df = 10: t* = 2.228 df = 20: t* = 2.086 df = 30: t* = 2.042 df = ∞: t* → 1.96
Step 8: Decision flowchart for means Is σ known? ├─ YES → Z-test (rare) │ Need: random, normal or n≥30 │ └─ NO → T-test (almost always) Need: random, normal (if n<30), independent
Step 9: Common scenarios
SCENARIO 1: n = 20, s = 5, σ unknown → T-TEST (df = 19) Need normality assumption
SCENARIO 2: n = 100, s = 12, σ unknown → T-TEST (df = 99) CLT applies, t ≈ z
SCENARIO 3: n = 50, σ = 8 known → Z-TEST But this is very rare!
SCENARIO 4: n = 15, s = 3, σ unknown, skewed population → T-TEST not appropriate! n too small, population not normal
Step 10: Summary table Z-TEST T-TEST ┌────────────────┬─────────────────┐ When to use │ σ known (rare) │ σ unknown │ ├────────────────┼─────────────────┤ Test stat │ z=(x̄-μ₀)/(σ/√n)│ t=(x̄-μ₀)/(s/√n)│ ├────────────────┼─────────────────┤ Reference │ z-table │ t-table (df=n-1)│ ├────────────────┼─────────────────┤ Sample size │ n≥30 if not │ n≥30 OR normal │ requirement │ normal │ population │ └────────────────┴─────────────────┘
Answer: USE T-TEST when:
- σ is unknown (use sample s)
- Almost all real-world situations
- Need: random, normal (n<30) or n≥30, independent
- Test statistic: t = (x̄ - μ₀)/(s/√n)
- Use t-distribution with df = n-1
USE Z-TEST when:
- σ is KNOWN (very rare!)
- Population SD given in problem
- Test statistic: z = (x̄ - μ₀)/(σ/√n)
- Use standard normal distribution
In practice, almost always use t-test because σ is rarely known!
3Problem 3medium
❓ Question:
Students claim they study an average of 20 hours per week. A random sample of 36 students shows x̄ = 18.5 hours, s = 6 hours. Test at α = 0.01 if the mean is less than 20.
💡 Show Solution
Step 1: Set up hypotheses Claim: μ = 20 hours Test: μ < 20 (students study less)
H₀: μ = 20 Hₐ: μ < 20 (one-tailed, left)
Step 2: Check conditions n = 36
RANDOM: Random sample (given) ✓ NORMAL: n = 36 ≥ 30, CLT applies ✓ INDEPENDENT: Assume 36 ≤ 0.10N ✓
Use t-test (σ unknown)
Step 3: Calculate SE SE = s/√n = 6/√36 = 6/6 = 1
Step 4: Calculate test statistic df = n - 1 = 35
t = (x̄ - μ₀)/SE = (18.5 - 20)/1 = -1.5/1 = -1.50
Step 5: Find p-value Left-tailed test df = 35, t = -1.50
From t-table: P(t < -1.50) ≈ 0.07
p-value ≈ 0.07
Step 6: Make decision p-value = 0.07 α = 0.01
Is 0.07 < 0.01? NO
FAIL TO REJECT H₀
Step 7: State conclusion At the α = 0.01 significance level, there is insufficient evidence that students study less than 20 hours per week.
The observed difference could reasonably occur by chance.
Step 8: Additional interpretation Sample mean: 18.5 hours Claimed mean: 20 hours Difference: -1.5 hours
While students in sample study less:
- Not statistically significant at α = 0.01
- p-value (0.07) >> α (0.01)
- Could be sampling variation
- Cannot reject claim
Step 9: What if α = 0.10? If we used α = 0.10: p = 0.07 < 0.10 Would reject H₀!
But with strict α = 0.01: Need stronger evidence This result not convincing enough
Answer: H₀: μ = 20, Hₐ: μ < 20 Test statistic: t = -1.50 (df = 35) P-value: 0.07 Decision: Fail to reject H₀ at α = 0.01 Conclusion: Insufficient evidence mean is less than 20 hours
4Problem 4medium
❓ Question:
A researcher tests H₀: μ = 100 vs Hₐ: μ ≠ 100 with n = 16, x̄ = 105, s = 8. Find the test statistic and p-value. What conclusion at α = 0.05?
💡 Show Solution
Step 1: Identify test type H₀: μ = 100 Hₐ: μ ≠ 100 (TWO-TAILED!) σ unknown → use t-test
Step 2: Check conditions (assume met) RANDOM: Assume ✓ NORMAL: n = 16 < 30, need normality assumption ✓ INDEPENDENT: Assume ✓
Step 3: Calculate SE SE = s/√n = 8/√16 = 8/4 = 2
Step 4: Calculate test statistic df = n - 1 = 15
t = (x̄ - μ₀)/SE = (105 - 100)/2 = 5/2 = 2.50
Step 5: Find p-value (TWO-TAILED!) df = 15, t = 2.50
From t-table: P(t > 2.50) ≈ 0.012
TWO-TAILED p-value: p = 2 × 0.012 = 0.024
Step 6: Make decision p-value = 0.024 α = 0.05
Is 0.024 < 0.05? YES
REJECT H₀
Step 7: State conclusion At the α = 0.05 significance level, there is sufficient evidence that the true mean differs from 100.
The sample data suggests μ ≠ 100.
Step 8: Direction of difference x̄ = 105 > 100 Evidence suggests μ > 100
But we tested two-tailed:
- Could be μ > 100 or μ < 100
- Data suggests higher
- Significant in either direction
Step 9: Strength of evidence p = 0.024 fairly small
Interpretation:
- If μ really = 100
- Only 2.4% chance of x̄ this far from 100
- Fairly unlikely under H₀
- Moderate evidence against H₀
Step 10: What if one-tailed? If we had tested Hₐ: μ > 100: One-tailed p = 0.012 Even stronger evidence!
But problem states ≠ (two-tailed) Must use p = 0.024
Step 11: Connection to CI 95% CI for μ: t* = 2.131 (df = 15) CI = 105 ± 2.131(2) = 105 ± 4.26 = (100.74, 109.26)
100 is NOT in this interval: Confirms rejection at α = 0.05!
Answer: Test statistic: t = 2.50 (df = 15) P-value: 0.024 (two-tailed) Decision: Reject H₀ at α = 0.05 Conclusion: Sufficient evidence that μ ≠ 100
The mean differs significantly from 100, appearing to be higher based on x̄ = 105.
5Problem 5hard
❓ Question:
Two students test H₀: μ = 50. Student A uses α = 0.05, Student B uses α = 0.01. Both get the same data: t = 2.15, df = 20. What decision does each make? Explain why decisions differ.
💡 Show Solution
Step 1: Find the p-value (same for both) df = 20, t = 2.15
Need to know: one-tailed or two-tailed? Assume TWO-TAILED (testing μ ≠ 50)
From t-table: P(t > 2.15) ≈ 0.022
Two-tailed p-value: p = 2 × 0.022 = 0.044
Step 2: Student A's decision (α = 0.05) p-value = 0.044 α = 0.05
Is 0.044 < 0.05? YES
Student A: REJECT H₀
Conclusion: Sufficient evidence at 0.05 level that μ ≠ 50
Step 3: Student B's decision (α = 0.01) p-value = 0.044 α = 0.01
Is 0.044 < 0.01? NO
Student B: FAIL TO REJECT H₀
Conclusion: Insufficient evidence at 0.01 level that μ ≠ 50
Step 4: Why different decisions? Same data, same p-value, different standards!
p = 0.044 is:
- Small enough for α = 0.05 (less stringent)
- NOT small enough for α = 0.01 (more stringent)
Step 5: Understanding α as threshold Think of α as "evidence requirement"
α = 0.05: Need p < 0.05
- Willing to accept 5% error rate
- Less strict
- Easier to reject H₀
α = 0.01: Need p < 0.01
- Want stronger evidence
- More strict
- Harder to reject H₀
Step 6: Interpret p-value p = 0.044 = 4.4%
Meaning: "If H₀ true, 4.4% chance of results this extreme"
Student A thinks:
- 4.4% is rare enough (< 5%)
- Unlikely under H₀
- Reject H₀
Student B thinks:
- 4.4% not rare enough (not < 1%)
- Not convincing enough
- Don't reject H₀
Step 7: Neither is "wrong"! Both are correct for their chosen α!
Different standards:
- Student A uses conventional α = 0.05
- Student B uses stricter α = 0.01
Appropriate α depends on context
Step 8: When to use different α levels α = 0.10 (liberal):
- Preliminary research
- Exploratory studies
- Don't want to miss potential effects
α = 0.05 (standard):
- Most common
- Good balance
- Convention in many fields
α = 0.01 (conservative):
- High-stakes decisions
- Medical treatments
- Want strong evidence
α = 0.001 (very conservative):
- Particle physics
- Exceptional claims
- Need extraordinary evidence
Step 9: This is a borderline case! p = 0.044 is close to 0.05
Barely significant at α = 0.05 Not significant at α = 0.01
Shows importance of:
- Choosing α BEFORE seeing data
- Reporting actual p-value
- Not treating 0.05 as magic cutoff
Step 10: Better reporting Instead of just "significant" or "not":
Report: "t(20) = 2.15, p = 0.044"
Lets reader judge:
- Evidence is moderate
- Borderline significance
- Just below conventional α
- Not overwhelming evidence
Step 11: Practical advice When results are borderline:
- Report exact p-value
- Don't just say "significant"
- Consider practical significance
- May need more data
- Be cautious in conclusions
Step 12: What both students agree on Both agree:
- Data shows t = 2.15
- This is fairly unusual if H₀ true
- Evidence leans against H₀
- But evidence is not overwhelming
Disagree:
- Is evidence strong ENOUGH?
- Depends on chosen standard
Answer: STUDENT A (α = 0.05): REJECT H₀ p = 0.044 < 0.05 → Significant Sufficient evidence μ ≠ 50
STUDENT B (α = 0.01): FAIL TO REJECT H₀
p = 0.044 > 0.01 → Not significant
Insufficient evidence at stricter standard
WHY DIFFERENT? α is the threshold for decision. Same p-value (0.044) meets less strict standard (0.05) but not stricter standard (0.01). Both decisions are correct for their chosen significance level. This shows that α = 0.05 is not a magic cutoff - it's a conventional standard that can be adjusted based on context and consequences of errors.
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics