Type I and Type II Errors

Understanding testing errors and power

Type I and Type II Errors

The Four Possible Outcomes

| Decision Reality | H₀ True | H₀ False | |-------------------|---------|----------| | Fail to reject H₀ | ✓ Correct | Type II Error | | Reject H₀ | Type I Error | ✓ Correct |

Type I Error: Reject H₀ when it's actually true (false positive)

Type II Error: Fail to reject H₀ when it's actually false (false negative)

Type I Error (α)

Definition: Rejecting true null hypothesis

Probability: α (significance level)

Example: Medical test

  • H₀: Patient healthy
  • Type I: Diagnose disease when patient is healthy

Consequences: False alarm, unnecessary treatment, wasted resources

Control: Set α before testing (0.05, 0.01, etc.)

Type II Error (β)

Definition: Failing to reject false null hypothesis

Probability: β (depends on true parameter value, sample size, α)

Example: Medical test

  • H₀: Patient healthy
  • Type II: Miss disease in sick patient

Consequences: Miss real effect, fail to treat, potential harm

Control: Increase sample size, increase α (trade-off!)

Power

Power: Probability of correctly rejecting false H₀

Power=1β\text{Power} = 1 - \beta

Higher power = better test (more likely to detect real effect)

Factors increasing power:

  1. Larger sample size (n)
  2. Larger effect size (further from H₀)
  3. Less variability (smaller σ)
  4. Higher α (but increases Type I risk)

Example: Coin Testing

Test if coin is fair:

  • H₀: p = 0.5 (fair)
  • Hₐ: p ≠ 0.5 (biased)
  • Flip 20 times, α = 0.05

Type I Error:

  • Coin actually fair (p = 0.5)
  • Get unusual result (like 15 heads)
  • Reject H₀ (conclude biased)
  • Error: Called fair coin biased

Type II Error:

  • Coin actually biased (say p = 0.7)
  • Get result that looks reasonable for fair coin (like 11 heads)
  • Fail to reject H₀
  • Error: Failed to detect biased coin

Calculating Type I Error Probability

Type I Error probability = α (by design)

Example: If α = 0.05, P(Type I Error) = 0.05

Interpretation: 5% of the time we reject H₀, H₀ is actually true

Calculating Power (Advanced)

Requires:

  • Specific alternative value
  • Sample size
  • Variability
  • α

Example: Test H₀: μ = 100 vs Hₐ: μ > 100

  • α = 0.05, n = 25, σ = 15
  • True μ = 106

Power calculation:

  1. Find critical value for rejection
  2. Find probability of exceeding it when μ = 106
  3. This is the power

Typically use software for exact power calculations

Trade-offs

Decreasing α (stricter):

  • ↓ Type I Error risk
  • ↑ Type II Error risk
  • ↓ Power

Increasing α:

  • ↑ Type I Error risk
  • ↓ Type II Error risk
  • ↑ Power

Can't minimize both simultaneously with fixed n!

Solution: Increase n (decreases both error types)

Choosing α

Common practice: α = 0.05

More conservative (α = 0.01): When Type I Error very costly

  • Example: Approving new drug (don't want false positive)

Less conservative (α = 0.10): When Type II Error very costly

  • Example: Screening test (don't want to miss cases)

Balance: Consider consequences of each error type

Real-World Examples

Criminal Trial:

  • H₀: Defendant innocent
  • Type I: Convict innocent person (false conviction)
  • Type II: Acquit guilty person (false acquittal)
  • System prioritizes avoiding Type I (innocent until proven guilty)

Medical Screening:

  • H₀: Patient disease-free
  • Type I: False positive (unnecessary worry, follow-up tests)
  • Type II: False negative (miss disease, delayed treatment)
  • Balance depends on disease severity

Quality Control:

  • H₀: Process working properly
  • Type I: Stop working process (wasted time, money)
  • Type II: Miss defective process (bad products shipped)

Relationship Between Errors

For fixed n:

  • Lowering α → higher β (inverse relationship)
  • Can't have both low α and low β

Increasing n:

  • Can lower both α and β
  • Only way to improve both

Increasing effect size:

  • β decreases (easier to detect large effects)
  • α unchanged (still set by us)

Power Analysis for Sample Size

Before study: Determine n needed for desired power

Typical goal: Power = 0.80 (80% chance of detecting effect)

Requires specifying:

  • Minimum important effect size
  • Desired α
  • Estimated variability
  • Desired power

Software: G*Power, R, online calculators

Common Misconceptions

❌ "P-value is probability of Type I Error"

  • No! α is P(Type I Error)
  • P-value is P(data | H₀)

❌ "Can eliminate both error types"

  • No! Trade-off exists (for fixed n)

❌ "Type II Error is 1 - α"

  • No! That's only if specific alternative value is exactly on boundary

❌ "High power means H₀ is false"

  • No! Power is property of test, not evidence about H₀

Practical Advice

Before study:

  1. Consider consequences of each error type
  2. Choose α appropriately
  3. Do power analysis to determine n

After study:

  1. Report P-value (not just "significant" or "not")
  2. Consider practical significance, not just statistical
  3. Recognize limitations (Type II error possible if fail to reject)

Quick Reference

Type I Error (α):

  • Reject true H₀
  • P(Type I) = α
  • False positive

Type II Error (β):

  • Fail to reject false H₀
  • P(Type II) = β
  • False negative

Power = 1 - β:

  • Probability of detecting real effect
  • Increase with: larger n, larger effect, smaller σ, larger α

Trade-off:

  • Can't minimize both errors with fixed n
  • Increase n to reduce both

Remember: All hypothesis tests risk errors. Understanding and balancing these risks is key to good statistical practice!

📚 Practice Problems

1Problem 1easy

Question:

Define Type I and Type II errors. What are the consequences of each in the context of testing a new medical treatment?

💡 Show Solution

Step 1: Define Type I Error TYPE I ERROR: Reject H₀ when H₀ is actually TRUE

  • "False positive"
  • Conclude effect exists when it doesn't
  • Probability = α (significance level)

Step 2: Define Type II Error TYPE II ERROR: Fail to reject H₀ when H₀ is FALSE

  • "False negative"
  • Miss a real effect
  • Probability = β (depends on true parameter)

Step 3: Medical treatment context Testing new treatment effectiveness: H₀: Treatment has no effect Hₐ: Treatment is effective

Step 4: Type I Error consequences (medical) Type I Error = Reject H₀ when true Means: Conclude treatment works when it DOESN'T

Consequences:

  • Approve ineffective treatment
  • Patients get useless treatment
  • Waste money on ineffective drug
  • False hope for patients
  • Delay in finding real treatment
  • Potential side effects with no benefit

Example: Approve sugar pill thinking it cures disease

Step 5: Type II Error consequences (medical) Type II Error = Fail to reject H₀ when false Means: Conclude no effect when treatment DOES work

Consequences:

  • Reject effective treatment
  • Patients denied beneficial treatment
  • Miss opportunity to help people
  • Effective drug never reaches market
  • People continue suffering unnecessarily

Example: Reject life-saving drug due to small sample

Step 6: Which is worse? (Depends on context!) In medical testing:

Type I often considered worse:

  • Do no harm principle
  • Better safe than sorry
  • Can't give ineffective/harmful treatment

But Type II also serious:

  • People miss out on cure
  • Disease continues unchecked

Step 7: The tradeoff Cannot minimize both simultaneously!

Lower α (reduce Type I):

  • Less likely false positive
  • But MORE likely Type II error
  • More conservative

Higher α (reduce Type II):

  • Less likely to miss real effect
  • But MORE likely Type I error
  • More liberal

Step 8: Decision table H₀ True H₀ False (No effect) (Has effect) ┌─────────────┬──────────────┐ Reject H₀ │ Type I ✗ │ Correct ✓ │ │ (α) │ (Power) │ ├─────────────┼──────────────┤ Fail to │ Correct ✓ │ Type II ✗ │ Reject H₀ │ (1-α) │ (β) │ └─────────────┴──────────────┘

Step 9: Summary TYPE I ERROR:

  • Reject true H₀
  • False positive
  • P(Type I) = α
  • Medical: Approve bad treatment

TYPE II ERROR:

  • Fail to reject false H₀
  • False negative
  • P(Type II) = β
  • Medical: Reject good treatment

Answer: TYPE I ERROR: Rejecting H₀ when it's true (false positive). In medicine: concluding treatment works when it doesn't, leading to approval of ineffective treatments. Probability = α.

TYPE II ERROR: Failing to reject H₀ when it's false (false negative). In medicine: concluding no effect when treatment actually works, denying patients effective treatment. Probability = β.

2Problem 2easy

Question:

In hypothesis testing with α = 0.05, explain what this significance level represents in terms of Type I error.

💡 Show Solution

Step 1: Recall Type I error Type I Error: Reject H₀ when H₀ is TRUE (False positive)

Step 2: Connection to α α = P(Type I Error) α = P(Reject H₀ | H₀ is true)

Step 3: What α = 0.05 means α = 0.05 = 5%

Interpretation: "If we repeated this test many times when H₀ is actually true, we would incorrectly reject H₀ about 5% of the time."

Step 4: Long-run interpretation Imagine 100 tests where H₀ is TRUE:

  • About 95 tests: Correctly fail to reject H₀ ✓
  • About 5 tests: Incorrectly reject H₀ ✗

Those 5 incorrect rejections = Type I errors

Step 5: Single test interpretation For ONE test with α = 0.05:

If we reject H₀:

  • Either we made correct decision (H₀ false)
  • OR we made Type I error (H₀ true)
  • If H₀ true, had 5% chance of this error

We accept this 5% risk!

Step 6: Why 5%? α = 0.05 is convention Balances:

  • Not too lenient (avoiding false positives)
  • Not too strict (not missing real effects)

Other common values:

  • α = 0.01 (more conservative, less Type I)
  • α = 0.10 (less conservative, more Type I)

Step 7: Choosing α Depends on consequences:

When Type I error is serious:

  • Use smaller α (like 0.01)
  • Example: Drug approval
  • Better safe than sorry

When Type I error less serious:

  • Can use larger α (like 0.10)
  • Example: Preliminary research
  • Don't want to miss potential findings

Step 8: Example scenario Testing if coin is unfair: H₀: p = 0.5 (coin is fair) Hₐ: p ≠ 0.5 (coin is unfair) α = 0.05

Type I error:

  • Conclude coin is unfair when it's actually fair
  • Accuse someone of cheating when coin is fair
  • Will happen 5% of time even with fair coin!

Step 9: Cannot eliminate Type I errors As long as α > 0:

  • Some chance of Type I error
  • Inherent in hypothesis testing
  • Due to random sampling variation

Only way to eliminate:

  • Set α = 0 (never reject H₀)
  • But then can't detect any real effects!

Step 10: Summary interpretation α = 0.05 means:

  1. If H₀ is true, 5% chance we'll reject it
  2. Maximum Type I error rate we're willing to accept
  3. In long run, 5% of true nulls will be rejected
  4. Trade-off: protecting against false positives while still detecting real effects

Answer: α = 0.05 means there is a 5% probability of making a Type I error - incorrectly rejecting H₀ when it is actually true. In the long run, if H₀ is true and we repeat the test many times, we would reject it about 5% of the time just by chance. This is the maximum false positive rate we are willing to accept.

3Problem 3medium

Question:

A significance test has α = 0.05 and power = 0.80. What is β? Interpret what power = 0.80 means in context.

💡 Show Solution

Step 1: Relationship between Power and β Power = 1 - β β = 1 - Power

Step 2: Calculate β Power = 0.80 β = 1 - 0.80 = 0.20

Step 3: Define Power Power = P(Reject H₀ | H₀ is false)

  • Probability of correctly rejecting false H₀
  • Probability of detecting real effect
  • Sensitivity of the test
  • "True positive rate"

Step 4: Define β β = P(Type II Error) β = P(Fail to reject H₀ | H₀ is false)

  • Probability of missing real effect
  • "False negative rate"

Step 5: Interpret Power = 0.80 "If there really is an effect (H₀ is false), we have an 80% chance of detecting it (rejecting H₀)."

OR

"If the true parameter is different from the null value, our test will correctly identify this 80% of the time."

Step 6: Interpret β = 0.20 "If there really is an effect, we have a 20% chance of missing it (failing to reject H₀)."

Type II error happens 20% of time when effect exists.

Step 7: Complete error picture α = 0.05 (Type I error rate) β = 0.20 (Type II error rate) Power = 0.80

When H₀ TRUE:

  • 5% chance: Type I error ✗
  • 95% chance: Correct decision ✓

When H₀ FALSE:

  • 20% chance: Type II error ✗
  • 80% chance: Correct decision ✓ (Power!)

Step 8: Is 80% power good? Power = 0.80 is often considered:

  • Adequate for most studies
  • Good balance of resources and detection
  • Standard goal in research

Power = 0.90 or higher:

  • Excellent
  • More likely to detect effects
  • May require larger sample

Power = 0.50:

  • Poor (coin flip!)
  • Likely to miss real effects

Step 9: Factors affecting power Higher power when:

  • Larger sample size (n ↑)
  • Larger effect size (true difference from H₀)
  • Larger α (but more Type I errors!)
  • Less variability (σ ↓)

Step 10: Practical interpretation With power = 0.80:

If treatment truly works:

  • 80% chance study will show it works
  • 20% chance study will miss the effect

Better than:

  • 50% power (might as well flip coin)

Not as good as:

  • 95% power (almost always detect)
  • But 95% might require huge sample

Step 11: Power analysis Before study: Calculate needed n for desired power

After study: If fail to reject H₀, check power

  • High power + no rejection = probably no effect
  • Low power + no rejection = inconclusive (might have missed effect)

Answer: β = 0.20 (20%)

POWER = 0.80 means: If there truly is an effect (H₀ is false), we have an 80% probability of correctly detecting it and rejecting H₀. This is considered adequate power for most studies.

β = 0.20 means: There is a 20% chance of Type II error - missing a real effect that exists.

4Problem 4medium

Question:

A researcher increases sample size from 50 to 200. How does this affect the probabilities of Type I error (α), Type II error (β), and power?

💡 Show Solution

Step 1: Effect on Type I error (α) ANSWER: NO CHANGE

α is set by researcher:

  • Choose α before collecting data
  • α = 0.05, 0.01, etc.
  • Independent of sample size
  • Controlled directly

α stays the same regardless of n!

Step 2: Why α doesn't change α = P(Reject H₀ | H₀ true)

This is our decision threshold:

  • We set it (like 0.05)
  • Not affected by n
  • Design choice, not data-driven

Example: n = 50, α = 0.05 n = 200, α = still 0.05

Step 3: Effect on Type II error (β) ANSWER: DECREASES

β = P(Fail to reject H₀ | H₀ false)

Larger n → smaller β:

  • More data = more information
  • Easier to detect real effect
  • Less likely to miss true difference

Step 4: Why β decreases with larger n Standard error decreases: SE = σ/√n

n = 50: SE = σ/√50 ≈ 0.141σ n = 200: SE = σ/√200 ≈ 0.071σ

Smaller SE:

  • Test statistic more precise
  • Better able to distinguish from H₀
  • Less overlap between null and true distribution

Step 5: Effect on Power ANSWER: INCREASES

Power = 1 - β

Since β decreases: Power must increase!

Larger n → Higher power:

  • More likely to detect effect
  • More sensitive test
  • Better discrimination

Step 6: Numerical example Suppose initially: n = 50 β = 0.40 Power = 0.60

After increase to n = 200: n = 200 β ≈ 0.10 (decreased!) Power ≈ 0.90 (increased!)

Step 7: The relationship Sample size quadruples (50 → 200):

SE halves: SE ∝ 1/√n √200/√50 = √4 = 2 SE reduced by factor of 2

Power substantially increases: Often from ~60% to ~90% Much better chance of detecting effect

Step 8: Summary table n = 50 n = 200 Change ┌──────────┬───────────┬──────────┐ α (Type I) │ 0.05 │ 0.05 │ None │ ├──────────┼───────────┼──────────┤ β (Type II) │ 0.40 │ 0.10 │ Decrease │ ├──────────┼───────────┼──────────┤ Power │ 0.60 │ 0.90 │ Increase │ └──────────┴───────────┴──────────┘

Step 9: Why this matters Larger sample size: ✓ More powerful test ✓ Better able to detect effects ✓ Lower risk of Type II error ✗ More expensive/time consuming ✓ No increase in Type I error (α stays same)

Step 10: Trade-offs Want high power? → Need large n

Limited resources? → Accept lower power → Risk missing real effects

BUT: α is always controlled at chosen level!

Step 11: General principle Increasing sample size:

  • α: Unchanged (set by researcher)
  • β: Decreases (less likely to miss effect)
  • Power: Increases (more likely to detect effect)
  • SE: Decreases (more precision)
  • Cost: Increases

Step 12: Mathematical explanation For detecting difference d from H₀:

Power depends on: (d - 0)/(σ/√n) = d√n/σ

As n increases:

  • Numerator increases
  • Easier to detect d
  • Higher power

Answer: TYPE I ERROR (α): NO CHANGE - α is set by the researcher and doesn't depend on sample size. Still 0.05 (or whatever chosen level).

TYPE II ERROR (β): DECREASES - Larger sample provides more information, making it easier to detect a real effect. Less likely to miss true difference.

POWER: INCREASES - Since Power = 1 - β and β decreases, power must increase. With n = 200 instead of 50, much more likely to correctly detect real effects.

Increasing sample size makes the test more powerful without increasing Type I error rate!

5Problem 5hard

Question:

In criminal trials, the null hypothesis is "defendant is innocent." Type I error is convicting an innocent person, Type II error is acquitting a guilty person. If we want to minimize Type I errors (protecting innocent people), what happens to Type II errors? Explain the relationship.

💡 Show Solution

Step 1: Set up hypotheses H₀: Defendant is innocent Hₐ: Defendant is guilty

Step 2: Identify errors TYPE I ERROR: Reject H₀ when true

  • Convict innocent person
  • "False positive"
  • Wrong conviction
  • P(Type I) = α

TYPE II ERROR: Fail to reject H₀ when false

  • Acquit guilty person
  • "False negative"
  • Guilty goes free
  • P(Type II) = β

Step 3: Current legal system "Innocent until proven guilty" "Beyond reasonable doubt"

This means:

  • Very small α (low Type I error)
  • Prefer to free guilty than convict innocent
  • Better that 10 guilty go free than 1 innocent convicted

Step 4: How to minimize Type I errors To reduce α (Type I error rate):

  1. Require stronger evidence

    • Need overwhelming proof
    • Higher standard (beyond reasonable doubt)
    • Make it harder to reject H₀
  2. Use smaller α

    • α = 0.01 or even 0.001
    • Not just α = 0.05

Step 5: What happens to Type II errors As we minimize Type I (reduce α): → Type II errors INCREASE (β increases)

WHY?

  • Making it harder to convict
  • Stricter evidence requirements
  • More guilty people will be acquitted
  • More "false negatives"

Step 6: The fundamental tradeoff Cannot minimize BOTH simultaneously!

Lower α (protect innocent): → Higher β (more guilty go free)

Lower β (catch guilty): → Higher α (more innocent convicted)

Step 7: Visualization Convict (Reject H₀) ↓ Easier ←──────────→ Harder

Type I: HIGH LOW (convict ────→ ←──── innocent)

Type II: LOW HIGH (free ────→ ←──── guilty)

As we move right (protect innocent):

  • Type I decreases ✓
  • Type II increases ✗

Step 8: Numeric example Standard: α = 0.05, β = 0.20

Protect innocent more: α = 0.01 → Stricter standard → β might increase to 0.40 → More guilty people acquitted

Be more aggressive: α = 0.10
→ Easier to convict → β might decrease to 0.10 → But more innocent convicted

Step 9: Why the tradeoff exists Same evidence, different thresholds:

EVIDENCE SCALE: 0 (clearly innocent) to 100 (clearly guilty)

If threshold = 80 (high standard):

  • Few innocents convicted (low α) ✓
  • Many guilty acquitted (high β) ✗

If threshold = 50 (lower standard):

  • More innocents convicted (high α) ✗
  • Fewer guilty acquitted (low β) ✓

Can't avoid tradeoff!

Step 10: Societal choice Legal system chooses:

  • Minimize Type I (protect innocent)
  • Accept higher Type II (guilty go free)

Values statement: "Better that 10 guilty escape than 1 innocent suffer"

Different context might choose differently!

Step 11: Ways to reduce BOTH errors Only solution: MORE EVIDENCE

  • Better investigation
  • More witnesses
  • Better forensics
  • Like increasing sample size in statistics!

With better evidence:

  • Can maintain strict standard (low α)
  • While also convicting more guilty (lower β)

But requires resources!

Step 12: Statistical parallel In hypothesis testing:

Conservative approach (α = 0.01):

  • Hard to reject H₀
  • Low Type I error ✓
  • High Type II error ✗
  • Might miss real effects

Liberal approach (α = 0.10):

  • Easy to reject H₀
  • High Type I error ✗
  • Low Type II error ✓
  • Might claim false effects

Balanced approach (α = 0.05):

  • Compromise
  • Moderate both errors

Step 13: The iron law For fixed sample size/evidence:

α and β are inversely related:

  • Decrease α → Increase β
  • Decrease β → Increase α

Only way to decrease both:

  • Increase sample size
  • Get more evidence
  • Costs more resources

Answer: If we minimize Type I errors (convicting innocent), Type II errors INCREASE (more guilty acquitted).

This is the fundamental tradeoff: You cannot simultaneously minimize both error types with fixed evidence. As we make it harder to convict (protecting innocent people), we inevitably let more guilty people go free.

The legal system accepts this tradeoff, explicitly choosing to minimize Type I errors even though it means higher Type II errors. We'd rather free guilty people than convict innocent ones.

The only way to reduce BOTH errors is to gather MORE evidence (analogous to increasing sample size in statistics), but this requires more resources.