Tests for Means

One-sample and two-sample t-tests

Hypothesis Tests for Means

One-Sample t-Test

Test: Does sample provide evidence that population mean differs from claimed value?

Hypotheses:

  • H₀: μ = μ₀
  • Hₐ: μ ≠ μ₀ (or μ > μ₀ or μ < μ₀)

Conditions:

  • Random sample
  • Population approximately normal OR n ≥ 30 (CLT)
  • n < 10% of population

Test Statistic:

t=xˉμ0s/nt = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}

df = n - 1

P-Value for t-Test

Use t-distribution with df = n - 1

Two-sided: P(|t| ≥ observed)
Right-sided: P(t ≥ observed)
Left-sided: P(t ≤ observed)

Calculator: tcdf

Example 1: One-Sample t-Test

Company claims mean wait time is 5 minutes. Sample: n = 25, xˉ\bar{x} = 5.8, s = 1.5. Test at α = 0.05.

STATE:

  • μ = true mean wait time
  • H₀: μ = 5
  • Hₐ: μ ≠ 5
  • α = 0.05

PLAN:

  • One-sample t-test
  • Random: Assume ✓
  • Normal: n = 25, assume roughly normal ✓
  • Independent: 25 < 10% of all customers ✓

DO:

t=5.851.5/25=0.80.32.67t = \frac{5.8 - 5}{1.5/\sqrt{25}} = \frac{0.8}{0.3} \approx 2.67

df = 24

P-value = 2 × P(t ≥ 2.67) ≈ 2(0.0067) ≈ 0.013

CONCLUDE: P-value = 0.013 < 0.05, reject H₀. Sufficient evidence mean wait time differs from 5 minutes.

Two-Sample t-Test

Compare two independent groups:

Hypotheses:

  • H₀: μ₁ = μ₂ (or μ₁ - μ₂ = 0)
  • Hₐ: μ₁ ≠ μ₂ (or μ₁ > μ₂ or μ₁ < μ₂)

Test Statistic:

t=(xˉ1xˉ2)0s12n1+s22n2t = \frac{(\bar{x}_1 - \bar{x}_2) - 0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}

df: Use calculator (Welch's approximation) or conservative min(n₁-1, n₂-1)

Note: Do NOT pool (unlike proportions)

Conditions for Two-Sample t-Test

Both groups:

  • Random/independent samples
  • Each approximately normal OR both n ≥ 30
  • Each n < 10% of population

Example 2: Two-Sample t-Test

Compare new vs old teaching method:

  • New: n₁ = 30, xˉ1\bar{x}_1 = 85, s₁ = 8
  • Old: n₂ = 28, xˉ2\bar{x}_2 = 80, s₂ = 10

STATE:

  • μ₁ = mean score with new method
  • μ₂ = mean score with old method
  • H₀: μ₁ = μ₂
  • Hₐ: μ₁ > μ₂
  • α = 0.05

PLAN:

  • Two-sample t-test
  • Conditions: Both n ≥ 30, random, independent ✓

DO:

t=85806430+10028=52.13+3.57=52.392.09t = \frac{85 - 80}{\sqrt{\frac{64}{30} + \frac{100}{28}}} = \frac{5}{\sqrt{2.13 + 3.57}} = \frac{5}{2.39} \approx 2.09

df ≈ 50 (calculator gives exact)

P-value = P(t ≥ 2.09) ≈ 0.021

CONCLUDE: P-value = 0.021 < 0.05, reject H₀. Sufficient evidence new method produces higher scores.

t vs z

Use t-test when:

  • Population σ unknown (almost always!)
  • Using sample s

Use z-test when:

  • Population σ known (rare)
  • Proportions (different formula)

For large n: t ≈ z (distributions nearly identical)

Checking Normality

Small samples (n < 15):

  • Data must be close to normal
  • Check with dotplot, boxplot, normal probability plot
  • No outliers, roughly symmetric

Medium samples (15 ≤ n < 30):

  • Can tolerate slight skew
  • No extreme outliers

Large samples (n ≥ 30):

  • CLT applies
  • Can proceed unless severe outliers/skew

Robustness

t-procedures fairly robust to normality if:

  • n reasonably large
  • No extreme outliers
  • Not severely skewed

Less robust with:

  • Very small n
  • Extreme outliers (affect xˉ\bar{x} and s)

One-Sided vs Two-Sided

Choose before seeing data!

Two-sided: Looking for any difference
One-sided: Specific direction predicted

One-sided has more power (for that direction) but:

  • Can't detect effect in other direction
  • Generally less conservative

Calculator Commands (TI-83/84)

One-sample: STAT → TESTS → 2:T-Test

  • μ₀, xˉ\bar{x}, s, n, direction
  • Calculate

Two-sample: STAT → TESTS → 4:2-SampTTest

  • xˉ1\bar{x}_1, s₁, n₁, xˉ2\bar{x}_2, s₂, n₂
  • Pooled: No
  • Calculate

Relationship to CI

For two-sided test at α:

Equivalent: (1-α) CI contains μ₀?

  • If yes → fail to reject
  • If no → reject

CI more informative: Shows range of plausible values

Common Mistakes

❌ Using z when should use t
❌ Pooling variances in two-sample t-test
❌ Not checking normality with small samples
❌ Confusing one-sample with paired
❌ Using wrong df

Practical Significance

Statistical significance ≠ practical importance

Example: Large sample (n = 10,000) finds mean = 100.2 vs claimed 100

  • Might be statistically significant
  • But is 0.2 difference practically important?

Always consider:

  • Effect size (magnitude of difference)
  • Context (what matters in practice)
  • Cost/benefit

Quick Reference

One-sample: t=xˉμ0s/nt = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}, df = n - 1

Two-sample: t=xˉ1xˉ2s12n1+s22n2t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}

Conditions: Random, approximately normal (or n ≥ 30), independent

Use t (not z) when σ unknown

Remember: t-tests are workhorses of statistics. Check conditions, especially normality for small samples. Use calculator for exact P-values and df!

📚 Practice Problems

1Problem 1easy

Question:

A machine fills bottles with mean 500 mL. A sample of 25 bottles has x̄ = 497 mL, s = 6 mL. Test at α = 0.05 if mean fill is less than 500 mL. Assume normality.

💡 Show Solution

Step 1: Set up hypotheses Claim: μ = 500 mL Suspect: μ < 500 mL (underfilling)

H₀: μ = 500 Hₐ: μ < 500 (one-tailed, left)

Step 2: Check conditions n = 25

RANDOM: Assume random sample ✓ NORMAL: Population normal (given) ✓

  • With n = 25 < 30, need this assumption INDEPENDENT: Assume 25 ≤ 0.10N ✓

Use t-test (σ unknown)

Step 3: Calculate test statistic df = n - 1 = 24

SE = s/√n = 6/√25 = 6/5 = 1.2

t = (x̄ - μ₀)/SE = (497 - 500)/1.2 = -3/1.2 = -2.50

Step 4: Find p-value Left-tailed test df = 24, t = -2.50

From t-table: P(t < -2.50) ≈ 0.01

p-value ≈ 0.01

Step 5: Make decision p-value ≈ 0.01 α = 0.05

Is 0.01 < 0.05? YES

REJECT H₀

Step 6: State conclusion At the α = 0.05 significance level, there is sufficient evidence that the mean fill is less than 500 mL. The machine appears to be underfilling bottles.

Step 7: Practical interpretation Sample mean: 497 mL Target: 500 mL Difference: -3 mL

This 3 mL shortage is:

  • Statistically significant
  • Not just random variation
  • Machine needs adjustment

Answer: H₀: μ = 500, Hₐ: μ < 500 Test statistic: t = -2.50 (df = 24) P-value: 0.01 Decision: Reject H₀ at α = 0.05 Conclusion: Mean fill is significantly less than 500 mL

2Problem 2easy

Question:

Explain when to use a t-test versus a z-test for testing a mean.

💡 Show Solution

Step 1: The key difference Z-TEST: Population SD (σ) is KNOWN T-TEST: Population SD (σ) is UNKNOWN (use s)

Step 2: When to use Z-TEST for mean Conditions:

  1. σ is known (rare!)
  2. Random sample
  3. Normal population OR n ≥ 30

Test statistic: z = (x̄ - μ₀)/(σ/√n)

Use z-table for p-value

Step 3: When to use T-TEST for mean Conditions:

  1. σ is unknown (almost always!)
  2. Random sample
  3. Normal population (if n < 30) OR n ≥ 30 (can use CLT)
  4. Independent observations

Test statistic: t = (x̄ - μ₀)/(s/√n)

Use t-table with df = n - 1

Step 4: Why σ is rarely known In practice:

  • If we knew σ, we'd probably know μ
  • Population parameters rarely known
  • Must estimate from sample
  • Use s as estimate of σ

Real-world: Almost always use t-test!

Step 5: Small vs large samples

SMALL SAMPLE (n < 30): Must use T-TEST:

  • Need normality assumption
  • T-distribution accounts for uncertainty in s
  • Heavier tails than z
  • More conservative

LARGE SAMPLE (n ≥ 30): Can use T-TEST (preferred):

  • CLT applies
  • t-distribution → normal
  • Still use t because σ unknown

Could use Z-TEST if σ known (very rare)

Step 6: T-distribution properties Depends on df = n - 1:

  • Heavier tails when df small
  • More probability in extremes
  • Approaches normal as df → ∞

Examples: df = 5: Very heavy tails df = 30: Close to normal df = 100: Essentially normal

Step 7: Comparison for 95% critical values Z*: Always 1.96

T* depends on df: df = 5: t* = 2.571 (much larger!) df = 10: t* = 2.228 df = 20: t* = 2.086 df = 30: t* = 2.042 df = ∞: t* → 1.96

Step 8: Decision flowchart for means Is σ known? ├─ YES → Z-test (rare) │ Need: random, normal or n≥30 │ └─ NO → T-test (almost always) Need: random, normal (if n<30), independent

Step 9: Common scenarios

SCENARIO 1: n = 20, s = 5, σ unknown → T-TEST (df = 19) Need normality assumption

SCENARIO 2: n = 100, s = 12, σ unknown → T-TEST (df = 99) CLT applies, t ≈ z

SCENARIO 3: n = 50, σ = 8 known → Z-TEST But this is very rare!

SCENARIO 4: n = 15, s = 3, σ unknown, skewed population → T-TEST not appropriate! n too small, population not normal

Step 10: Summary table Z-TEST T-TEST ┌────────────────┬─────────────────┐ When to use │ σ known (rare) │ σ unknown │ ├────────────────┼─────────────────┤ Test stat │ z=(x̄-μ₀)/(σ/√n)│ t=(x̄-μ₀)/(s/√n)│ ├────────────────┼─────────────────┤ Reference │ z-table │ t-table (df=n-1)│ ├────────────────┼─────────────────┤ Sample size │ n≥30 if not │ n≥30 OR normal │ requirement │ normal │ population │ └────────────────┴─────────────────┘

Answer: USE T-TEST when:

  • σ is unknown (use sample s)
  • Almost all real-world situations
  • Need: random, normal (n<30) or n≥30, independent
  • Test statistic: t = (x̄ - μ₀)/(s/√n)
  • Use t-distribution with df = n-1

USE Z-TEST when:

  • σ is KNOWN (very rare!)
  • Population SD given in problem
  • Test statistic: z = (x̄ - μ₀)/(σ/√n)
  • Use standard normal distribution

In practice, almost always use t-test because σ is rarely known!

3Problem 3medium

Question:

Students claim they study an average of 20 hours per week. A random sample of 36 students shows x̄ = 18.5 hours, s = 6 hours. Test at α = 0.01 if the mean is less than 20.

💡 Show Solution

Step 1: Set up hypotheses Claim: μ = 20 hours Test: μ < 20 (students study less)

H₀: μ = 20 Hₐ: μ < 20 (one-tailed, left)

Step 2: Check conditions n = 36

RANDOM: Random sample (given) ✓ NORMAL: n = 36 ≥ 30, CLT applies ✓ INDEPENDENT: Assume 36 ≤ 0.10N ✓

Use t-test (σ unknown)

Step 3: Calculate SE SE = s/√n = 6/√36 = 6/6 = 1

Step 4: Calculate test statistic df = n - 1 = 35

t = (x̄ - μ₀)/SE = (18.5 - 20)/1 = -1.5/1 = -1.50

Step 5: Find p-value Left-tailed test df = 35, t = -1.50

From t-table: P(t < -1.50) ≈ 0.07

p-value ≈ 0.07

Step 6: Make decision p-value = 0.07 α = 0.01

Is 0.07 < 0.01? NO

FAIL TO REJECT H₀

Step 7: State conclusion At the α = 0.01 significance level, there is insufficient evidence that students study less than 20 hours per week.

The observed difference could reasonably occur by chance.

Step 8: Additional interpretation Sample mean: 18.5 hours Claimed mean: 20 hours Difference: -1.5 hours

While students in sample study less:

  • Not statistically significant at α = 0.01
  • p-value (0.07) >> α (0.01)
  • Could be sampling variation
  • Cannot reject claim

Step 9: What if α = 0.10? If we used α = 0.10: p = 0.07 < 0.10 Would reject H₀!

But with strict α = 0.01: Need stronger evidence This result not convincing enough

Answer: H₀: μ = 20, Hₐ: μ < 20 Test statistic: t = -1.50 (df = 35) P-value: 0.07 Decision: Fail to reject H₀ at α = 0.01 Conclusion: Insufficient evidence mean is less than 20 hours

4Problem 4medium

Question:

A researcher tests H₀: μ = 100 vs Hₐ: μ ≠ 100 with n = 16, x̄ = 105, s = 8. Find the test statistic and p-value. What conclusion at α = 0.05?

💡 Show Solution

Step 1: Identify test type H₀: μ = 100 Hₐ: μ ≠ 100 (TWO-TAILED!) σ unknown → use t-test

Step 2: Check conditions (assume met) RANDOM: Assume ✓ NORMAL: n = 16 < 30, need normality assumption ✓ INDEPENDENT: Assume ✓

Step 3: Calculate SE SE = s/√n = 8/√16 = 8/4 = 2

Step 4: Calculate test statistic df = n - 1 = 15

t = (x̄ - μ₀)/SE = (105 - 100)/2 = 5/2 = 2.50

Step 5: Find p-value (TWO-TAILED!) df = 15, t = 2.50

From t-table: P(t > 2.50) ≈ 0.012

TWO-TAILED p-value: p = 2 × 0.012 = 0.024

Step 6: Make decision p-value = 0.024 α = 0.05

Is 0.024 < 0.05? YES

REJECT H₀

Step 7: State conclusion At the α = 0.05 significance level, there is sufficient evidence that the true mean differs from 100.

The sample data suggests μ ≠ 100.

Step 8: Direction of difference x̄ = 105 > 100 Evidence suggests μ > 100

But we tested two-tailed:

  • Could be μ > 100 or μ < 100
  • Data suggests higher
  • Significant in either direction

Step 9: Strength of evidence p = 0.024 fairly small

Interpretation:

  • If μ really = 100
  • Only 2.4% chance of x̄ this far from 100
  • Fairly unlikely under H₀
  • Moderate evidence against H₀

Step 10: What if one-tailed? If we had tested Hₐ: μ > 100: One-tailed p = 0.012 Even stronger evidence!

But problem states ≠ (two-tailed) Must use p = 0.024

Step 11: Connection to CI 95% CI for μ: t* = 2.131 (df = 15) CI = 105 ± 2.131(2) = 105 ± 4.26 = (100.74, 109.26)

100 is NOT in this interval: Confirms rejection at α = 0.05!

Answer: Test statistic: t = 2.50 (df = 15) P-value: 0.024 (two-tailed) Decision: Reject H₀ at α = 0.05 Conclusion: Sufficient evidence that μ ≠ 100

The mean differs significantly from 100, appearing to be higher based on x̄ = 105.

5Problem 5hard

Question:

Two students test H₀: μ = 50. Student A uses α = 0.05, Student B uses α = 0.01. Both get the same data: t = 2.15, df = 20. What decision does each make? Explain why decisions differ.

💡 Show Solution

Step 1: Find the p-value (same for both) df = 20, t = 2.15

Need to know: one-tailed or two-tailed? Assume TWO-TAILED (testing μ ≠ 50)

From t-table: P(t > 2.15) ≈ 0.022

Two-tailed p-value: p = 2 × 0.022 = 0.044

Step 2: Student A's decision (α = 0.05) p-value = 0.044 α = 0.05

Is 0.044 < 0.05? YES

Student A: REJECT H₀

Conclusion: Sufficient evidence at 0.05 level that μ ≠ 50

Step 3: Student B's decision (α = 0.01) p-value = 0.044 α = 0.01

Is 0.044 < 0.01? NO

Student B: FAIL TO REJECT H₀

Conclusion: Insufficient evidence at 0.01 level that μ ≠ 50

Step 4: Why different decisions? Same data, same p-value, different standards!

p = 0.044 is:

  • Small enough for α = 0.05 (less stringent)
  • NOT small enough for α = 0.01 (more stringent)

Step 5: Understanding α as threshold Think of α as "evidence requirement"

α = 0.05: Need p < 0.05

  • Willing to accept 5% error rate
  • Less strict
  • Easier to reject H₀

α = 0.01: Need p < 0.01

  • Want stronger evidence
  • More strict
  • Harder to reject H₀

Step 6: Interpret p-value p = 0.044 = 4.4%

Meaning: "If H₀ true, 4.4% chance of results this extreme"

Student A thinks:

  • 4.4% is rare enough (< 5%)
  • Unlikely under H₀
  • Reject H₀

Student B thinks:

  • 4.4% not rare enough (not < 1%)
  • Not convincing enough
  • Don't reject H₀

Step 7: Neither is "wrong"! Both are correct for their chosen α!

Different standards:

  • Student A uses conventional α = 0.05
  • Student B uses stricter α = 0.01

Appropriate α depends on context

Step 8: When to use different α levels α = 0.10 (liberal):

  • Preliminary research
  • Exploratory studies
  • Don't want to miss potential effects

α = 0.05 (standard):

  • Most common
  • Good balance
  • Convention in many fields

α = 0.01 (conservative):

  • High-stakes decisions
  • Medical treatments
  • Want strong evidence

α = 0.001 (very conservative):

  • Particle physics
  • Exceptional claims
  • Need extraordinary evidence

Step 9: This is a borderline case! p = 0.044 is close to 0.05

Barely significant at α = 0.05 Not significant at α = 0.01

Shows importance of:

  1. Choosing α BEFORE seeing data
  2. Reporting actual p-value
  3. Not treating 0.05 as magic cutoff

Step 10: Better reporting Instead of just "significant" or "not":

Report: "t(20) = 2.15, p = 0.044"

Lets reader judge:

  • Evidence is moderate
  • Borderline significance
  • Just below conventional α
  • Not overwhelming evidence

Step 11: Practical advice When results are borderline:

  • Report exact p-value
  • Don't just say "significant"
  • Consider practical significance
  • May need more data
  • Be cautious in conclusions

Step 12: What both students agree on Both agree:

  • Data shows t = 2.15
  • This is fairly unusual if H₀ true
  • Evidence leans against H₀
  • But evidence is not overwhelming

Disagree:

  • Is evidence strong ENOUGH?
  • Depends on chosen standard

Answer: STUDENT A (α = 0.05): REJECT H₀ p = 0.044 < 0.05 → Significant Sufficient evidence μ ≠ 50

STUDENT B (α = 0.01): FAIL TO REJECT H₀
p = 0.044 > 0.01 → Not significant Insufficient evidence at stricter standard

WHY DIFFERENT? α is the threshold for decision. Same p-value (0.044) meets less strict standard (0.05) but not stricter standard (0.01). Both decisions are correct for their chosen significance level. This shows that α = 0.05 is not a magic cutoff - it's a conventional standard that can be adjusted based on context and consequences of errors.