Confidence Intervals for Proportions

Estimating population proportions

Confidence Intervals for Proportions

What is a Confidence Interval?

Confidence Interval (CI): Range of plausible values for population parameter

Form: statistic ± margin of error

Interpretation: We are C% confident the interval contains the true parameter

Example: 95% CI for p: (0.52, 0.58)
We are 95% confident true population proportion is between 0.52 and 0.58

One-Sample CI for Proportion

Formula:

p^±zp^(1p^)n\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}

Where:

  • p^\hat{p} = sample proportion
  • z* = critical value (from confidence level)
  • n = sample size

Critical Values

Common confidence levels:

| Confidence Level | z* | |------------------|-----| | 90% | 1.645 | | 95% | 1.96 | | 99% | 2.576 |

Higher confidence → wider interval

Example 1: Simple CI

Survey: 400 voters, 220 support candidate

p^=220400=0.55\hat{p} = \frac{220}{400} = 0.55

95% CI:

SE=0.55(0.45)400=0.00061880.0249SE = \sqrt{\frac{0.55(0.45)}{400}} = \sqrt{0.0006188} \approx 0.0249

CI=0.55±1.96(0.0249)=0.55±0.049CI = 0.55 \pm 1.96(0.0249) = 0.55 \pm 0.049

(0.501,0.599)(0.501, 0.599)

Interpretation: We are 95% confident between 50.1% and 59.9% of voters support the candidate.

Conditions for CI

Random: Random sample
Normal: np̂ ≥ 10 and n(1-p̂) ≥ 10
Independent: n ≤ 10% of population

Check ALL before proceeding!

Margin of Error

Margin of Error (ME):

ME=zp^(1p^)nME = z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}

Factors affecting ME:

  • Larger z* (higher confidence) → larger ME
  • Larger n → smaller ME
  • p̂ near 0.5 → larger ME (maximum variability)

Sample Size for Desired ME

To achieve margin of error m:

n=(zm)2p^(1p^)n = \left(\frac{z^*}{m}\right)^2 \hat{p}(1-\hat{p})

Conservative approach (if no estimate): Use p̂ = 0.5

n=(zm)2(0.25)n = \left(\frac{z^*}{m}\right)^2 (0.25)

Example: Want ME = 0.03 with 95% confidence

n=(1.960.03)2(0.25)=(65.33)2(0.25)1068n = \left(\frac{1.96}{0.03}\right)^2 (0.25) = (65.33)^2(0.25) \approx 1068

Need at least 1068 people!

Interpreting Confidence Level

95% confidence means:

  • If we repeated sampling many times and built 95% CI each time
  • About 95% of intervals would contain true p
  • About 5% would miss true p

NOT:

  • "95% chance p is in our interval" (p is fixed!)
  • "95% of data is in interval"

Our specific interval either contains p or it doesn't (we just don't know which)

Increasing Confidence

Want higher confidence (say 99% instead of 95%):

  • Use larger z* (2.576 instead of 1.96)
  • Interval becomes wider
  • Trade-off: More confidence but less precision

Example 2: With Interpretation

Survey of 500 students: 285 have jobs

p^=285500=0.57\hat{p} = \frac{285}{500} = 0.57

Conditions:

  • Random: Assume random sample ✓
  • Normal: 500(0.57) = 285 ≥ 10, 500(0.43) = 215 ≥ 10 ✓
  • Independent: 500 < 10% of all students (assume) ✓

90% CI:

SE=0.57(0.43)5000.0221SE = \sqrt{\frac{0.57(0.43)}{500}} \approx 0.0221

CI=0.57±1.645(0.0221)=0.57±0.036CI = 0.57 \pm 1.645(0.0221) = 0.57 \pm 0.036

(0.534,0.606)(0.534, 0.606)

Interpretation: We are 90% confident that between 53.4% and 60.6% of all students have jobs.

Common Mistakes

❌ Saying "95% of data in interval"
❌ Saying "95% chance p in interval"
❌ Not checking conditions
❌ Using t* instead of z* for proportions
❌ Rounding p̂ too early

Two-Sample CI for Difference in Proportions

Comparing two groups:

(p^1p^2)±zp^1(1p^1)n1+p^2(1p^2)n2(\hat{p}_1 - \hat{p}_2) \pm z^* \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}

Conditions: Each group meets conditions separately

Interpretation: If interval contains 0, no significant difference

Calculator Commands (TI-83/84)

STAT → TESTS → A:1-PropZInt

Enter:

  • x (count of successes)
  • n (sample size)
  • C-Level (confidence level as decimal)

Calculate → gives interval

Relationship to Hypothesis Testing

If testing H₀: p = p₀ at significance level α:

Equivalent: Check if (1-α)% CI contains p₀

  • If p₀ in CI → fail to reject H₀
  • If p₀ not in CI → reject H₀

Quick Reference

Formula: p^±zp^(1p^)n\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}

Conditions: Random, np̂ ≥ 10 and n(1-p̂) ≥ 10, n < 10%N

Common z:* 1.645 (90%), 1.96 (95%), 2.576 (99%)

Sample size: n=(zm)2p(1p)n = \left(\frac{z^*}{m}\right)^2 p(1-p)

Remember: Higher confidence → wider interval. Larger sample → narrower interval. Always check conditions and interpret in context!

📚 Practice Problems

1Problem 1easy

Question:

In a random sample of 400 voters, 220 support a proposition. Construct a 95% confidence interval for the true proportion of voters who support the proposition.

💡 Show Solution

Step 1: Identify the information n = 400 (sample size) x = 220 (number of successes) p̂ = 220/400 = 0.55 (sample proportion)

Confidence level: 95%

Step 2: Check conditions for proportion CI RANDOM: Sample is random ✓ NORMAL: np̂ ≥ 10 and n(1-p̂) ≥ 10 400(0.55) = 220 ≥ 10 ✓ 400(0.45) = 180 ≥ 10 ✓ INDEPENDENT: n ≤ 0.10N 400 ≤ 0.10(all voters) - assume yes ✓

All conditions met!

Step 3: Find critical value 95% confidence → α = 0.05 z* = 1.96 (from table for 95% CI)

Step 4: Calculate standard error SE = √[p̂(1-p̂)/n] = √[0.55(0.45)/400] = √[0.2475/400] = √0.00061875 ≈ 0.0249

Step 5: Calculate margin of error ME = z* × SE = 1.96 × 0.0249 ≈ 0.0488

Step 6: Construct confidence interval CI = p̂ ± ME = 0.55 ± 0.049 = (0.501, 0.599)

Or: (0.50, 0.60) rounded

Step 7: Interpret the interval We are 95% confident that the true proportion of voters who support the proposition is between 0.50 and 0.60 (or 50% and 60%).

This means:

  • If we repeated this sampling process many times
  • About 95% of intervals would contain true p
  • This specific interval either contains p or doesn't
  • But the process is reliable 95% of the time

Answer: 95% CI for p: (0.50, 0.60)

We are 95% confident that between 50% and 60% of all voters support the proposition.

2Problem 2easy

Question:

A quality control inspector finds 8 defects in a sample of 200 items. Construct a 90% confidence interval for the defect rate.

💡 Show Solution

Step 1: Calculate sample proportion n = 200 x = 8 p̂ = 8/200 = 0.04

Step 2: Check conditions RANDOM: Assume random sample ✓ NORMAL: np̂ = 200(0.04) = 8 < 10 ✗ n(1-p̂) = 200(0.96) = 192 ≥ 10 ✓

Condition fails! But let's proceed with caution. (In practice, might use exact binomial method)

Step 3: Find z* for 90% confidence 90% confidence → z* = 1.645

Step 4: Calculate SE SE = √[p̂(1-p̂)/n] = √[0.04(0.96)/200] = √[0.0384/200] = √0.000192 ≈ 0.0139

Step 5: Calculate ME ME = 1.645 × 0.0139 ≈ 0.023

Step 6: Construct CI CI = 0.04 ± 0.023 = (0.017, 0.063) = (1.7%, 6.3%)

Step 7: Interpret with caution We are 90% confident the true defect rate is between 1.7% and 6.3%.

Note: This interval may not be as reliable since np̂ < 10.

Answer: 90% CI: (0.017, 0.063) or (1.7%, 6.3%)

Caution: The success-failure condition is marginally violated (only 8 successes), so this normal-based interval may not be fully reliable.

3Problem 3medium

Question:

A researcher wants to estimate the proportion of defective items with a margin of error no more than 0.03 at 90% confidence. How large a sample is needed if no prior estimate exists?

💡 Show Solution

Step 1: Identify what we need ME = 0.03 Confidence level = 90% → z* = 1.645 No prior estimate → use p̂ = 0.5

Step 2: Use sample size formula n = (z*)²p̂(1-p̂)/ME²

Step 3: Calculate n = (1.645)²(0.5)(0.5)/(0.03)² = 2.706(0.25)/0.0009 = 0.6765/0.0009 ≈ 751.67

Step 4: Round UP Always round UP to ensure ME is no larger than desired n = 752

Step 5: Why use p̂ = 0.5? The product p̂(1-p̂) is maximized at p̂ = 0.5 This gives the most conservative (largest) sample size Guarantees ME ≤ 0.03 regardless of true p

Answer: n = 752

Need a sample of at least 752 items to achieve a margin of error no more than 0.03 at 90% confidence.

4Problem 4medium

Question:

Two polls: Poll A (n=500, p̂=0.52) and Poll B (n=1000, p̂=0.51). Both use 95% confidence. Which poll has a smaller margin of error? Calculate both.

💡 Show Solution

Step 1: Recall margin of error formula ME = z*√[p̂(1-p̂)/n]

For 95% CI: z* = 1.96

Step 2: Calculate ME for Poll A p̂ = 0.52, n = 500

ME_A = 1.96√[0.52(0.48)/500] = 1.96√[0.2496/500] = 1.96√0.0004992 = 1.96(0.0223) ≈ 0.044

Step 3: Calculate ME for Poll B p̂ = 0.51, n = 1000

ME_B = 1.96√[0.51(0.49)/1000] = 1.96√[0.2499/1000] = 1.96√0.0002499 = 1.96(0.0158) ≈ 0.031

Step 4: Compare Poll A: ME ≈ 0.044 or 4.4% Poll B: ME ≈ 0.031 or 3.1%

Poll B has smaller margin of error!

Step 5: Why is Poll B better? Larger sample size (1000 vs 500) ME ∝ 1/√n Doubling n reduces ME by factor of √2 ≈ 1.41

500 × 2 = 1000 ME_A/ME_B = √(1000/500) = √2 ≈ 1.41 0.044/0.031 ≈ 1.42 ✓

Step 6: Effect of p̂ Poll B also has p̂ closer to 0.5 But this increases ME slightly Effect of larger n dominates

Answer: Poll B has smaller ME (0.031 vs 0.044)

Poll B's larger sample size (1000 vs 500) gives more precision despite having p̂ closer to 0.5.

5Problem 5hard

Question:

Explain why we can't construct a valid confidence interval for a proportion when the sample proportion is 0 or 1.

💡 Show Solution

Step 1: Recall CI formula CI = p̂ ± z*√[p̂(1-p̂)/n]

SE = √[p̂(1-p̂)/n]

Step 2: What happens when p̂ = 0? SE = √[0(1)/n] = 0 CI = 0 ± 0 = (0, 0)

This says we're 100% certain p = 0 Unreasonable from a sample!

Step 3: What happens when p̂ = 1? SE = √[1(0)/n] = 0 CI = 1 ± 0 = (1, 1)

This says we're 100% certain p = 1 Also unreasonable!

Step 4: Normal approximation fails Need: np̂ ≥ 10 AND n(1-p̂) ≥ 10

When p̂ = 0: np̂ = 0 < 10 ✗ When p̂ = 1: n(1-p̂) = 0 < 10 ✗

Can't use normal-based method!

Step 5: What to do instead Use: Wilson score interval, Agresti-Coull, or exact binomial methods These give reasonable intervals even with extreme values

Answer: When p̂ = 0 or 1, SE = 0, giving a degenerate interval. Normal approximation conditions fail. Alternative methods should be used.