Bias in Sampling and Surveys
Types of bias and how to minimize them
Bias in Sampling and Surveys
What is Bias?
Bias: Systematic tendency to over- or under-estimate population parameter.
Key point: Bias ≠ random error. Bias is consistent, predictable deviation in one direction.
Unbiased method: On average, gives correct answer
Biased method: Systematically off, doesn't improve with larger sample
Types of Sampling Bias
1. Selection Bias
Definition: Some members of population systematically more/less likely to be selected.
Causes:
- Non-random sampling method
- Convenience sampling
- Judgment/purposive sampling
Examples:
- Survey only people at shopping mall (excludes non-shoppers)
- Online poll (excludes those without internet)
- Call only landlines (excludes cell-phone-only households)
Result: Sample not representative of population
Solution: Use random sampling methods
2. Undercoverage
Definition: Some groups in population left out of sampling frame.
Sampling frame: List from which sample is drawn
Examples:
- Phone directory excludes unlisted numbers
- Email list excludes those without email
- Voter registration list excludes unregistered voters
Result: Missing groups lead to biased estimates
Solution: Use complete, up-to-date sampling frame that covers entire population
3. Voluntary Response Bias
Definition: Individuals choose whether to participate.
Characteristics:
- Self-selection
- Those with strong opinions more likely to respond
- Usually overrepresents extreme views
Examples:
- Online polls where anyone can vote
- Call-in surveys
- Mail-back questionnaires (without follow-up)
- Social media polls
Result: Respondents not representative (tend to have stronger, more extreme opinions)
Solution: Use probability sampling where researcher selects participants
4. Nonresponse Bias
Definition: Selected individuals don't respond, and non-respondents differ from respondents.
Types:
- Unit nonresponse: Entire survey not completed
- Item nonresponse: Specific questions skipped
Examples:
- Mail survey with 20% response rate
- Phone survey where people don't answer
- Web survey where people start but don't finish
Result: If non-respondents differ systematically from respondents, estimates are biased
Solutions:
- Follow up with non-respondents
- Make survey convenient/appealing
- Keep it short
- Offer incentives (if appropriate)
- Compare respondent characteristics to population
Response Bias
Definition: Responses are systematically incorrect due to how question is asked or answered.
1. Question Wording Bias
Loaded/leading questions suggest a particular answer:
- "Don't you agree that...?"
- "Like most Americans, do you support...?"
Emotionally charged language:
- "Should innocent babies be protected?" vs "Should abortion be legal?"
Solution: Use neutral, clear language
2. Question Order Bias
Earlier questions influence later responses
Example:
- Q1: "How satisfied are you with the president?"
- Q2: "How satisfied are you with the economy?"
Q1 may influence Q2 answers
Solution: Randomize question order or carefully consider order effects
3. Response Option Bias
Limited or unbalanced options can bias results
Example:
- Only offering "Yes" or "No" when "Unsure" is valid
- 4 positive options, 1 negative option
Solution: Offer balanced, complete response options including "no opinion" when appropriate
4. Social Desirability Bias
Respondents give socially acceptable answers rather than truthful ones
Examples:
- Overreporting voting, recycling, charitable donations
- Underreporting illegal behavior, prejudice, embarrassing habits
Solutions:
- Anonymous surveys
- Neutral wording
- Indirect questioning
- Validation against records when possible
5. Interviewer Bias
Interviewer characteristics or behavior influence responses
Examples:
- Gender, race, age of interviewer affects responses to sensitive topics
- Interviewer tone, body language suggests preferred answer
- Recording errors
Solutions:
- Standardize interviewer training
- Use self-administered surveys when possible
- Monitor interviewer performance
6. Recall Bias
Inaccurate memory of past events
Examples:
- "How many times did you exercise last month?" (people forget)
- "What did you eat for lunch 3 days ago?"
Solution: Ask about recent, specific time periods; verify with records when possible
Other Survey Issues
1. Overcoverage
Sampling frame includes units not in target population
Example: List includes deceased people, duplicates, or out-of-scope units
Solution: Clean and update sampling frame regularly
2. Measurement Error
Inaccurate measurements of response variable
Causes:
- Poor question design
- Respondent misunderstanding
- Recording errors
- Equipment problems
Solution: Pilot test survey, train data collectors, use validated measures
3. Processing Error
Errors in data entry, coding, or analysis
Solution: Double-check data entry, use data validation, verify calculations
Reducing Bias: Best Practices
Sampling:
✓ Use probability sampling (random selection)
✓ Ensure complete, accurate sampling frame
✓ Maximize response rate
✓ Follow up with non-respondents
✓ Compare respondent characteristics to population
Survey Design:
✓ Use clear, neutral question wording
✓ Avoid leading or loaded questions
✓ Offer balanced, complete response options
✓ Consider question order effects
✓ Pilot test before full implementation
Data Collection:
✓ Train interviewers/data collectors
✓ Standardize procedures
✓ Consider anonymity for sensitive topics
✓ Verify data accuracy
✓ Document procedures
Impact of Bias
Key insight: Large sample doesn't fix bias!
- Unbiased small sample > Biased large sample
- Bias is systematic - doesn't average out
- Can't use statistics to "correct" for bias after the fact
Example: 1936 Literary Digest poll
- Mailed 10 million ballots (huge sample!)
- Predicted Landon would beat Roosevelt
- Roosevelt won in landslide
- Problem: Undercoverage and nonresponse bias (sampled from phone books and car registrations during Depression; only 24% responded)
Identifying Bias in Studies
When evaluating study, ask:
- How were participants selected? (Random? Convenient?)
- What's the sampling frame? (Complete? Current?)
- What's the response rate? (High? Low?)
- How are questions worded? (Neutral? Leading?)
- Who conducted the survey? (Potential conflicts of interest?)
- How were data collected? (Method may introduce bias)
Quick Reference
Selection Bias: Non-random sampling
Undercoverage: Incomplete sampling frame
Voluntary Response: Self-selection
Nonresponse: Low response rate
Question Wording: Leading/loaded questions
Social Desirability: Giving "acceptable" answers
Interviewer Bias: Interviewer influences responses
Recall Bias: Inaccurate memory
Key Principle: Use random selection, neutral questions, high response rate, careful measurement
Remember: No amount of sophisticated analysis can fix a biased sample. Preventing bias through good design is essential. When evaluating studies, always look for potential sources of bias before trusting the conclusions!
📚 Practice Problems
1Problem 1easy
❓ Question:
Identify the type of bias in each scenario: a) A phone survey calls only landlines during business hours b) A survey asks: "Don't you agree that the mayor is doing a terrible job?" c) People with strong opinions are more likely to respond to an online poll
💡 Show Solution
Step 1: Identify bias types
- Undercoverage bias
- Response bias (includes question wording, social desirability)
- Nonresponse bias
- Voluntary response bias
Step 2: Analyze scenario (a)
Phone survey: landlines during business hours
Problem: Systematically excludes certain groups
- Young people (mostly use cell phones)
- Working people (not home during business hours)
- Lower income (may not have landlines)
Type: UNDERCOVERAGE BIAS
- Some groups in population have no chance of selection
- Sample not representative
Step 3: Analyze scenario (b)
Question: "Don't you agree the mayor is doing a terrible job?"
Problems:
- Leading/loaded question
- Suggests a "correct" answer
- Uses negative language ("terrible")
- Pressures respondent
Type: RESPONSE BIAS (Question Wording Bias)
- Question influences how people answer
- Doesn't measure true opinions
Step 4: Analyze scenario (c)
Online poll: strong opinions more likely to respond
Problem:
- People with extreme views participate more
- Moderate people skip it
- Not representative of population opinions
Type: VOLUNTARY RESPONSE BIAS (also called self-selection bias)
- Respondents choose to participate
- Those who respond differ from those who don't
- Overrepresents extreme views
Answer: a) Undercoverage bias (excludes cell phone users and working people) b) Response bias - question wording (leading question) c) Voluntary response bias (self-selection of strong opinions)
2Problem 2easy
❓ Question:
A survey finds that 90% of people believe they are better than average drivers. What type of bias might explain this result?
💡 Show Solution
Step 1: Identify the paradox 90% think they're better than average Mathematically impossible: Only 50% can be above average So what's going on?
Step 2: Type of bias - Social Desirability Bias
Definition: People answer in ways that make them look good
- Want to present themselves positively
- Don't want to admit flaws
- Especially for socially valued traits
Step 3: Why driving ability triggers this bias
Good driving is socially valued:
- Nobody wants to admit being bad driver
- Being good driver = responsible, skilled, careful
- Admitting you're below average = admitting you're dangerous
Psychological factors:
- Self-serving bias (we view ourselves positively)
- Selective memory (remember our good driving, forget mistakes)
- Different standards (we judge ourselves by intentions, others by actions)
Step 4: How this manifests in surveys
What people think: "I'm a careful, skilled driver" What they say: "Better than average"
Even bad drivers think:
- "I'm careful" (even if slow)
- "Others are reckless" (go too fast)
- "I've never had accident" (been lucky)
Step 5: Other examples of social desirability bias
People overreport:
- Voting ("Did you vote?") - people say yes even if they didn't
- Charity donations - claim to donate more
- Exercise - claim to exercise more
- Healthy eating - claim better diet
- Reading - claim to read more books
People underreport:
- Illegal behavior
- Embarrassing habits
- Socially undesirable opinions
- Income (if seen as bragging)
Step 6: How to reduce social desirability bias
Strategies:
-
Anonymous surveys
- No judgment possible
- More honest responses
-
Indirect questioning
- "How many of your friends..."
- Less personal threat
-
Randomized response technique
- Statistical method ensuring privacy
- Can't identify individual responses
-
Behavioral measures instead of self-report
- Observe actual behavior
- Don't rely on what people say
-
Validate against objective data
- Check survey responses against records
- Driving: check actual accident rates
Step 7: The driving example specifically
Better measures than self-report:
- Actual accident rates
- Traffic violations
- Driving test scores
- Insurance company data
These would give more accurate picture than survey
Answer: Social desirability bias - people answer in ways that make them look good. Nobody wants to admit being a below-average driver, so people systematically overestimate their abilities. This psychological bias leads to impossible result (90% can't be above average). Common for socially valued traits like driving skill, voting, charity, healthy behavior.
3Problem 3medium
❓ Question:
A college sends an email survey to all 5,000 students about campus dining. Only 200 students respond, and 80% are dissatisfied. Can the college conclude that 80% of all students are dissatisfied? Why or why not?
💡 Show Solution
Step 1: Identify the issue Response rate: 200/5,000 = 4% (very low!) Result: 80% dissatisfied
Step 2: The problem - Nonresponse Bias
Who responds to surveys?
- People with strong opinions
- People who are dissatisfied (more motivated)
- People who care deeply about the issue
Who doesn't respond?
- People who are satisfied (no complaints)
- People who are indifferent
- Busy people
- People who don't check email
Step 3: Why 80% is likely biased upward
Those who responded (200 students):
- Probably have complaints about dining
- Motivated by dissatisfaction
- Not representative of all 5,000
Those who didn't respond (4,800 students):
- Might be satisfied (no reason to complain)
- Might be neutral
- Don't care enough to respond
Result: Sample overrepresents dissatisfied students
Step 4: Cannot conclude 80% of all students dissatisfied
The 80% reflects:
- 80% of the 200 who chose to respond
- NOT 80% of all 5,000 students
True dissatisfaction rate unknown:
- Could be much lower
- Satisfied students less likely to respond
- Voluntary response bias
Step 5: Better survey design
To get accurate result:
- Use random sample of students
- Follow up with non-respondents
- Offer incentives for participation
- Make survey easy and quick
- Use multiple contact methods
- Aim for high response rate (>60-70%)
Step 6: Calculate scenario
Possible reality:
- 200 respondents: 160 dissatisfied (80%)
- 4,800 non-respondents: 960 dissatisfied (20%)
- Total: 1,120 / 5,000 = 22.4% actually dissatisfied
The 80% would be very misleading!
Answer: NO, cannot conclude 80% of all students are dissatisfied. Only 4% responded (200/5,000), creating severe nonresponse bias. Dissatisfied students are more motivated to respond, so the 80% likely overestimates true dissatisfaction. The 80% applies only to those who chose to respond, not to all students.
4Problem 4medium
❓ Question:
Compare these two survey questions about tax policy: Question A: "Should taxes be increased to fund essential public services like schools and hospitals?" Question B: "Should the government take more of your hard-earned money in taxes?" How might each question bias responses? What would be a more neutral wording?
💡 Show Solution
Step 1: Analyze Question A
"Should taxes be increased to fund essential public services like schools and hospitals?"
Bias: Toward YES (supporting tax increase)
Why it's biased:
- Uses positive framing: "essential public services"
- Mentions sympathetic examples: "schools and hospitals"
- Implies taxes are necessary for good things
- No mention of downsides
How it influences:
- People don't want to oppose schools and hospitals
- Feels wrong to say no to "essential" services
- Guilt/social pressure to agree
Expected result: Overestimates support for tax increase
Step 2: Analyze Question B
"Should the government take more of your hard-earned money in taxes?"
Bias: Toward NO (opposing tax increase)
Why it's biased:
- Uses negative framing: "take" (implies theft)
- Emotional language: "your hard-earned money"
- Suggests government is taking what's yours
- No mention of benefits
How it influences:
- People resist having money "taken"
- "Hard-earned" makes it personal
- Government sounds greedy/unfair
Expected result: Overestimates opposition to tax increase
Step 3: Compare the two
Same policy question, opposite biases:
- Question A: Likely 60-70% support
- Question B: Likely 30-40% support
- Same people, different wording!
This shows power of question wording
Step 4: Neutral wording options
Option 1 (Simple): "Do you support or oppose increasing taxes?"
Pro: Very neutral Con: Might not give enough context
Option 2 (Balanced): "Do you support or oppose increasing taxes? Revenue would fund public services, but your take-home pay would decrease."
Pro: Mentions both sides Con: Which to mention first?
Option 3 (Best): "Do you support increasing taxes, oppose increasing taxes, or are you unsure?"
Pro: Neutral language, includes middle option Allows "no opinion" response
Option 4 (Even better - two questions): Q1: "What is your opinion on the current tax level: too high, about right, or too low?" Q2: "If taxes changed, which public services would you prioritize/cut?"
Separates questions, avoids loaded language
Step 5: General principles for neutral questions
DO: ✓ Use neutral language ✓ Avoid emotional words ✓ Present both sides if context needed ✓ Allow "unsure" option ✓ Keep it simple and clear
DON'T: ✗ Use loaded words ("take," "hard-earned," "essential") ✗ Suggest a correct answer ✗ Use only positive or negative framing ✗ Make assumptions ✗ Use double-barreled questions
Answer: Question A biases toward YES (positive framing: "essential services," "schools and hospitals"). Question B biases toward NO (negative framing: "take your hard-earned money"). Both are leading questions that will produce different results. Neutral wording: "Do you support or oppose increasing taxes?" - simple, balanced, no emotional language.
5Problem 5hard
❓ Question:
What is undercoverage bias? Give three examples and explain how it affects survey results.
💡 Show Solution
Step 1: Define undercoverage bias
Undercoverage: When some groups in the population have NO CHANCE or LOWER CHANCE of being in the sample
Result: Sample systematically unrepresentative Missing perspectives from excluded groups
Step 2: Example 1 - Literary Digest Poll 1936
Historical disaster:
- Magazine surveyed people from phone books and car registration lists
- Predicted Landon would beat Roosevelt for president
- Roosevelt won in landslide!
What went wrong:
- 1936: Only wealthy people had phones and cars
- Survey undercovered poor and middle-class voters
- These groups voted differently than wealthy
- Sample was systematically biased
Impact: Completely wrong prediction
Step 3: Example 2 - Online surveys about internet usage
Survey: "Complete this online survey about your internet habits"
Problem:
- Must have internet access to take survey
- Undercoverage: People without internet excluded
- These people have different internet habits (none!)
Impact on results:
- Overestimates internet usage
- Misses perspectives of disconnected populations
- Can't generalize to whole population
Who's excluded:
- Elderly without computers
- Poor without internet access
- Rural areas with limited connectivity
Step 4: Example 3 - Workplace satisfaction survey (business hours only)
Survey: Call employees during 9am-5pm
Problem:
- Misses night shift workers
- Misses part-time workers
- Misses field workers who aren't at desk
Impact:
- Night shift may have different satisfaction
- Part-timers may have different concerns
- Office-only perspective
Result: Biased view of employee satisfaction
Step 5: How undercoverage affects results
Systematically excludes groups → Biased estimates
Example effects:
- Missing young people → overestimate conservative views
- Missing poor people → underestimate financial struggles
- Missing minorities → miss diverse perspectives
- Missing rural people → urban-biased results
The direction of bias depends on how excluded groups differ
Step 6: Preventing undercoverage
Strategies:
-
Use comprehensive sampling frame
- Lists that include whole population
- Multiple lists if needed
-
Use multiple contact methods
- Phone, email, mail, in-person
- Reach different groups different ways
-
Stratified sampling
- Ensure all subgroups included
- Sample from each stratum
-
Adjust for known undercoverage
- Weight responses to match population
- Statistical correction (imperfect)
-
Know your sampling frame limitations
- Be aware who's excluded
- State limitations in conclusions
Answer: Undercoverage occurs when some population groups have no/low chance of selection. Examples: (1) 1936 Literary Digest used phone books, excluding poor voters who voted differently; (2) Online surveys exclude those without internet; (3) Daytime-only calls miss night shift workers. Effect: Sample systematically unrepresentative, leading to biased estimates that don't reflect the full population.
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics