Bias in Sampling and Surveys

Types of bias and how to minimize them

Bias in Sampling and Surveys

What is Bias?

Bias: Systematic tendency to over- or under-estimate population parameter.

Key point: Bias ≠ random error. Bias is consistent, predictable deviation in one direction.

Unbiased method: On average, gives correct answer
Biased method: Systematically off, doesn't improve with larger sample

Types of Sampling Bias

1. Selection Bias

Definition: Some members of population systematically more/less likely to be selected.

Causes:

  • Non-random sampling method
  • Convenience sampling
  • Judgment/purposive sampling

Examples:

  • Survey only people at shopping mall (excludes non-shoppers)
  • Online poll (excludes those without internet)
  • Call only landlines (excludes cell-phone-only households)

Result: Sample not representative of population

Solution: Use random sampling methods

2. Undercoverage

Definition: Some groups in population left out of sampling frame.

Sampling frame: List from which sample is drawn

Examples:

  • Phone directory excludes unlisted numbers
  • Email list excludes those without email
  • Voter registration list excludes unregistered voters

Result: Missing groups lead to biased estimates

Solution: Use complete, up-to-date sampling frame that covers entire population

3. Voluntary Response Bias

Definition: Individuals choose whether to participate.

Characteristics:

  • Self-selection
  • Those with strong opinions more likely to respond
  • Usually overrepresents extreme views

Examples:

  • Online polls where anyone can vote
  • Call-in surveys
  • Mail-back questionnaires (without follow-up)
  • Social media polls

Result: Respondents not representative (tend to have stronger, more extreme opinions)

Solution: Use probability sampling where researcher selects participants

4. Nonresponse Bias

Definition: Selected individuals don't respond, and non-respondents differ from respondents.

Types:

  • Unit nonresponse: Entire survey not completed
  • Item nonresponse: Specific questions skipped

Examples:

  • Mail survey with 20% response rate
  • Phone survey where people don't answer
  • Web survey where people start but don't finish

Result: If non-respondents differ systematically from respondents, estimates are biased

Solutions:

  • Follow up with non-respondents
  • Make survey convenient/appealing
  • Keep it short
  • Offer incentives (if appropriate)
  • Compare respondent characteristics to population

Response Bias

Definition: Responses are systematically incorrect due to how question is asked or answered.

1. Question Wording Bias

Loaded/leading questions suggest a particular answer:

  • "Don't you agree that...?"
  • "Like most Americans, do you support...?"

Emotionally charged language:

  • "Should innocent babies be protected?" vs "Should abortion be legal?"

Solution: Use neutral, clear language

2. Question Order Bias

Earlier questions influence later responses

Example:

  • Q1: "How satisfied are you with the president?"
  • Q2: "How satisfied are you with the economy?"

Q1 may influence Q2 answers

Solution: Randomize question order or carefully consider order effects

3. Response Option Bias

Limited or unbalanced options can bias results

Example:

  • Only offering "Yes" or "No" when "Unsure" is valid
  • 4 positive options, 1 negative option

Solution: Offer balanced, complete response options including "no opinion" when appropriate

4. Social Desirability Bias

Respondents give socially acceptable answers rather than truthful ones

Examples:

  • Overreporting voting, recycling, charitable donations
  • Underreporting illegal behavior, prejudice, embarrassing habits

Solutions:

  • Anonymous surveys
  • Neutral wording
  • Indirect questioning
  • Validation against records when possible

5. Interviewer Bias

Interviewer characteristics or behavior influence responses

Examples:

  • Gender, race, age of interviewer affects responses to sensitive topics
  • Interviewer tone, body language suggests preferred answer
  • Recording errors

Solutions:

  • Standardize interviewer training
  • Use self-administered surveys when possible
  • Monitor interviewer performance

6. Recall Bias

Inaccurate memory of past events

Examples:

  • "How many times did you exercise last month?" (people forget)
  • "What did you eat for lunch 3 days ago?"

Solution: Ask about recent, specific time periods; verify with records when possible

Other Survey Issues

1. Overcoverage

Sampling frame includes units not in target population

Example: List includes deceased people, duplicates, or out-of-scope units

Solution: Clean and update sampling frame regularly

2. Measurement Error

Inaccurate measurements of response variable

Causes:

  • Poor question design
  • Respondent misunderstanding
  • Recording errors
  • Equipment problems

Solution: Pilot test survey, train data collectors, use validated measures

3. Processing Error

Errors in data entry, coding, or analysis

Solution: Double-check data entry, use data validation, verify calculations

Reducing Bias: Best Practices

Sampling: ✓ Use probability sampling (random selection)
✓ Ensure complete, accurate sampling frame
✓ Maximize response rate
✓ Follow up with non-respondents
✓ Compare respondent characteristics to population

Survey Design: ✓ Use clear, neutral question wording
✓ Avoid leading or loaded questions
✓ Offer balanced, complete response options
✓ Consider question order effects
✓ Pilot test before full implementation

Data Collection: ✓ Train interviewers/data collectors
✓ Standardize procedures
✓ Consider anonymity for sensitive topics
✓ Verify data accuracy
✓ Document procedures

Impact of Bias

Key insight: Large sample doesn't fix bias!

  • Unbiased small sample > Biased large sample
  • Bias is systematic - doesn't average out
  • Can't use statistics to "correct" for bias after the fact

Example: 1936 Literary Digest poll

  • Mailed 10 million ballots (huge sample!)
  • Predicted Landon would beat Roosevelt
  • Roosevelt won in landslide
  • Problem: Undercoverage and nonresponse bias (sampled from phone books and car registrations during Depression; only 24% responded)

Identifying Bias in Studies

When evaluating study, ask:

  1. How were participants selected? (Random? Convenient?)
  2. What's the sampling frame? (Complete? Current?)
  3. What's the response rate? (High? Low?)
  4. How are questions worded? (Neutral? Leading?)
  5. Who conducted the survey? (Potential conflicts of interest?)
  6. How were data collected? (Method may introduce bias)

Quick Reference

Selection Bias: Non-random sampling
Undercoverage: Incomplete sampling frame
Voluntary Response: Self-selection
Nonresponse: Low response rate

Question Wording: Leading/loaded questions
Social Desirability: Giving "acceptable" answers
Interviewer Bias: Interviewer influences responses
Recall Bias: Inaccurate memory

Key Principle: Use random selection, neutral questions, high response rate, careful measurement

Remember: No amount of sophisticated analysis can fix a biased sample. Preventing bias through good design is essential. When evaluating studies, always look for potential sources of bias before trusting the conclusions!

📚 Practice Problems

1Problem 1easy

Question:

Identify the type of bias in each scenario: a) A phone survey calls only landlines during business hours b) A survey asks: "Don't you agree that the mayor is doing a terrible job?" c) People with strong opinions are more likely to respond to an online poll

💡 Show Solution

Step 1: Identify bias types

  • Undercoverage bias
  • Response bias (includes question wording, social desirability)
  • Nonresponse bias
  • Voluntary response bias

Step 2: Analyze scenario (a)

Phone survey: landlines during business hours

Problem: Systematically excludes certain groups

  • Young people (mostly use cell phones)
  • Working people (not home during business hours)
  • Lower income (may not have landlines)

Type: UNDERCOVERAGE BIAS

  • Some groups in population have no chance of selection
  • Sample not representative

Step 3: Analyze scenario (b)

Question: "Don't you agree the mayor is doing a terrible job?"

Problems:

  • Leading/loaded question
  • Suggests a "correct" answer
  • Uses negative language ("terrible")
  • Pressures respondent

Type: RESPONSE BIAS (Question Wording Bias)

  • Question influences how people answer
  • Doesn't measure true opinions

Step 4: Analyze scenario (c)

Online poll: strong opinions more likely to respond

Problem:

  • People with extreme views participate more
  • Moderate people skip it
  • Not representative of population opinions

Type: VOLUNTARY RESPONSE BIAS (also called self-selection bias)

  • Respondents choose to participate
  • Those who respond differ from those who don't
  • Overrepresents extreme views

Answer: a) Undercoverage bias (excludes cell phone users and working people) b) Response bias - question wording (leading question) c) Voluntary response bias (self-selection of strong opinions)

2Problem 2easy

Question:

A survey finds that 90% of people believe they are better than average drivers. What type of bias might explain this result?

💡 Show Solution

Step 1: Identify the paradox 90% think they're better than average Mathematically impossible: Only 50% can be above average So what's going on?

Step 2: Type of bias - Social Desirability Bias

Definition: People answer in ways that make them look good

  • Want to present themselves positively
  • Don't want to admit flaws
  • Especially for socially valued traits

Step 3: Why driving ability triggers this bias

Good driving is socially valued:

  • Nobody wants to admit being bad driver
  • Being good driver = responsible, skilled, careful
  • Admitting you're below average = admitting you're dangerous

Psychological factors:

  • Self-serving bias (we view ourselves positively)
  • Selective memory (remember our good driving, forget mistakes)
  • Different standards (we judge ourselves by intentions, others by actions)

Step 4: How this manifests in surveys

What people think: "I'm a careful, skilled driver" What they say: "Better than average"

Even bad drivers think:

  • "I'm careful" (even if slow)
  • "Others are reckless" (go too fast)
  • "I've never had accident" (been lucky)

Step 5: Other examples of social desirability bias

People overreport:

  • Voting ("Did you vote?") - people say yes even if they didn't
  • Charity donations - claim to donate more
  • Exercise - claim to exercise more
  • Healthy eating - claim better diet
  • Reading - claim to read more books

People underreport:

  • Illegal behavior
  • Embarrassing habits
  • Socially undesirable opinions
  • Income (if seen as bragging)

Step 6: How to reduce social desirability bias

Strategies:

  1. Anonymous surveys

    • No judgment possible
    • More honest responses
  2. Indirect questioning

    • "How many of your friends..."
    • Less personal threat
  3. Randomized response technique

    • Statistical method ensuring privacy
    • Can't identify individual responses
  4. Behavioral measures instead of self-report

    • Observe actual behavior
    • Don't rely on what people say
  5. Validate against objective data

    • Check survey responses against records
    • Driving: check actual accident rates

Step 7: The driving example specifically

Better measures than self-report:

  • Actual accident rates
  • Traffic violations
  • Driving test scores
  • Insurance company data

These would give more accurate picture than survey

Answer: Social desirability bias - people answer in ways that make them look good. Nobody wants to admit being a below-average driver, so people systematically overestimate their abilities. This psychological bias leads to impossible result (90% can't be above average). Common for socially valued traits like driving skill, voting, charity, healthy behavior.

3Problem 3medium

Question:

A college sends an email survey to all 5,000 students about campus dining. Only 200 students respond, and 80% are dissatisfied. Can the college conclude that 80% of all students are dissatisfied? Why or why not?

💡 Show Solution

Step 1: Identify the issue Response rate: 200/5,000 = 4% (very low!) Result: 80% dissatisfied

Step 2: The problem - Nonresponse Bias

Who responds to surveys?

  • People with strong opinions
  • People who are dissatisfied (more motivated)
  • People who care deeply about the issue

Who doesn't respond?

  • People who are satisfied (no complaints)
  • People who are indifferent
  • Busy people
  • People who don't check email

Step 3: Why 80% is likely biased upward

Those who responded (200 students):

  • Probably have complaints about dining
  • Motivated by dissatisfaction
  • Not representative of all 5,000

Those who didn't respond (4,800 students):

  • Might be satisfied (no reason to complain)
  • Might be neutral
  • Don't care enough to respond

Result: Sample overrepresents dissatisfied students

Step 4: Cannot conclude 80% of all students dissatisfied

The 80% reflects:

  • 80% of the 200 who chose to respond
  • NOT 80% of all 5,000 students

True dissatisfaction rate unknown:

  • Could be much lower
  • Satisfied students less likely to respond
  • Voluntary response bias

Step 5: Better survey design

To get accurate result:

  1. Use random sample of students
  2. Follow up with non-respondents
  3. Offer incentives for participation
  4. Make survey easy and quick
  5. Use multiple contact methods
  6. Aim for high response rate (>60-70%)

Step 6: Calculate scenario

Possible reality:

  • 200 respondents: 160 dissatisfied (80%)
  • 4,800 non-respondents: 960 dissatisfied (20%)
  • Total: 1,120 / 5,000 = 22.4% actually dissatisfied

The 80% would be very misleading!

Answer: NO, cannot conclude 80% of all students are dissatisfied. Only 4% responded (200/5,000), creating severe nonresponse bias. Dissatisfied students are more motivated to respond, so the 80% likely overestimates true dissatisfaction. The 80% applies only to those who chose to respond, not to all students.

4Problem 4medium

Question:

Compare these two survey questions about tax policy: Question A: "Should taxes be increased to fund essential public services like schools and hospitals?" Question B: "Should the government take more of your hard-earned money in taxes?" How might each question bias responses? What would be a more neutral wording?

💡 Show Solution

Step 1: Analyze Question A

"Should taxes be increased to fund essential public services like schools and hospitals?"

Bias: Toward YES (supporting tax increase)

Why it's biased:

  • Uses positive framing: "essential public services"
  • Mentions sympathetic examples: "schools and hospitals"
  • Implies taxes are necessary for good things
  • No mention of downsides

How it influences:

  • People don't want to oppose schools and hospitals
  • Feels wrong to say no to "essential" services
  • Guilt/social pressure to agree

Expected result: Overestimates support for tax increase

Step 2: Analyze Question B

"Should the government take more of your hard-earned money in taxes?"

Bias: Toward NO (opposing tax increase)

Why it's biased:

  • Uses negative framing: "take" (implies theft)
  • Emotional language: "your hard-earned money"
  • Suggests government is taking what's yours
  • No mention of benefits

How it influences:

  • People resist having money "taken"
  • "Hard-earned" makes it personal
  • Government sounds greedy/unfair

Expected result: Overestimates opposition to tax increase

Step 3: Compare the two

Same policy question, opposite biases:

  • Question A: Likely 60-70% support
  • Question B: Likely 30-40% support
  • Same people, different wording!

This shows power of question wording

Step 4: Neutral wording options

Option 1 (Simple): "Do you support or oppose increasing taxes?"

Pro: Very neutral Con: Might not give enough context

Option 2 (Balanced): "Do you support or oppose increasing taxes? Revenue would fund public services, but your take-home pay would decrease."

Pro: Mentions both sides Con: Which to mention first?

Option 3 (Best): "Do you support increasing taxes, oppose increasing taxes, or are you unsure?"

Pro: Neutral language, includes middle option Allows "no opinion" response

Option 4 (Even better - two questions): Q1: "What is your opinion on the current tax level: too high, about right, or too low?" Q2: "If taxes changed, which public services would you prioritize/cut?"

Separates questions, avoids loaded language

Step 5: General principles for neutral questions

DO: ✓ Use neutral language ✓ Avoid emotional words ✓ Present both sides if context needed ✓ Allow "unsure" option ✓ Keep it simple and clear

DON'T: ✗ Use loaded words ("take," "hard-earned," "essential") ✗ Suggest a correct answer ✗ Use only positive or negative framing ✗ Make assumptions ✗ Use double-barreled questions

Answer: Question A biases toward YES (positive framing: "essential services," "schools and hospitals"). Question B biases toward NO (negative framing: "take your hard-earned money"). Both are leading questions that will produce different results. Neutral wording: "Do you support or oppose increasing taxes?" - simple, balanced, no emotional language.

5Problem 5hard

Question:

What is undercoverage bias? Give three examples and explain how it affects survey results.

💡 Show Solution

Step 1: Define undercoverage bias

Undercoverage: When some groups in the population have NO CHANCE or LOWER CHANCE of being in the sample

Result: Sample systematically unrepresentative Missing perspectives from excluded groups

Step 2: Example 1 - Literary Digest Poll 1936

Historical disaster:

  • Magazine surveyed people from phone books and car registration lists
  • Predicted Landon would beat Roosevelt for president
  • Roosevelt won in landslide!

What went wrong:

  • 1936: Only wealthy people had phones and cars
  • Survey undercovered poor and middle-class voters
  • These groups voted differently than wealthy
  • Sample was systematically biased

Impact: Completely wrong prediction

Step 3: Example 2 - Online surveys about internet usage

Survey: "Complete this online survey about your internet habits"

Problem:

  • Must have internet access to take survey
  • Undercoverage: People without internet excluded
  • These people have different internet habits (none!)

Impact on results:

  • Overestimates internet usage
  • Misses perspectives of disconnected populations
  • Can't generalize to whole population

Who's excluded:

  • Elderly without computers
  • Poor without internet access
  • Rural areas with limited connectivity

Step 4: Example 3 - Workplace satisfaction survey (business hours only)

Survey: Call employees during 9am-5pm

Problem:

  • Misses night shift workers
  • Misses part-time workers
  • Misses field workers who aren't at desk

Impact:

  • Night shift may have different satisfaction
  • Part-timers may have different concerns
  • Office-only perspective

Result: Biased view of employee satisfaction

Step 5: How undercoverage affects results

Systematically excludes groups → Biased estimates

Example effects:

  • Missing young people → overestimate conservative views
  • Missing poor people → underestimate financial struggles
  • Missing minorities → miss diverse perspectives
  • Missing rural people → urban-biased results

The direction of bias depends on how excluded groups differ

Step 6: Preventing undercoverage

Strategies:

  1. Use comprehensive sampling frame

    • Lists that include whole population
    • Multiple lists if needed
  2. Use multiple contact methods

    • Phone, email, mail, in-person
    • Reach different groups different ways
  3. Stratified sampling

    • Ensure all subgroups included
    • Sample from each stratum
  4. Adjust for known undercoverage

    • Weight responses to match population
    • Statistical correction (imperfect)
  5. Know your sampling frame limitations

    • Be aware who's excluded
    • State limitations in conclusions

Answer: Undercoverage occurs when some population groups have no/low chance of selection. Examples: (1) 1936 Literary Digest used phone books, excluding poor voters who voted differently; (2) Online surveys exclude those without internet; (3) Daytime-only calls miss night shift workers. Effect: Sample systematically unrepresentative, leading to biased estimates that don't reflect the full population.