Sampling Methods

Simple random, stratified, cluster, and systematic sampling

Sampling Methods

Why Sample?

Sampling allows us to study a subset of a population to make inferences about the whole population. It's practical, economical, and often the only feasible approach.

Population: All individuals/items of interest
Sample: Subset selected for study
Goal: Use sample statistics to estimate population parameters

Simple Random Sample (SRS)

Definition: Every individual has equal probability of selection; every group of size n has equal probability.

How to obtain:

  1. Assign number to each population member
  2. Use random number generator or table
  3. Select corresponding individuals

Example: Select 50 students from 500 by randomly generating 50 numbers between 1-500.

Advantages: Unbiased, every member equally likely
Disadvantages: Requires complete population list, may not represent subgroups well

Stratified Random Sampling

Method:

  1. Divide population into homogeneous groups (strata)
  2. Take SRS from each stratum
  3. Combine samples

When to use: Want guaranteed representation from each subgroup

Example: School has 40% freshmen, 30% sophomores, 20% juniors, 10% seniors. For sample of 100, randomly select 40 freshmen, 30 sophomores, 20 juniors, 10 seniors.

Advantages: Ensures all strata represented, more precise estimates, can compare groups
Disadvantages: Requires knowledge of strata, more complex

Cluster Sampling

Method:

  1. Divide population into clusters (heterogeneous groups)
  2. Randomly select some clusters
  3. Survey ALL members in selected clusters

When to use: Population geographically spread, no complete list available

Example: Select 5 random schools, survey all students in those 5 schools.

Key difference from stratified: In stratified, sample from all groups; in cluster, select whole groups.

Advantages: Practical, economical, reduces travel costs
Disadvantages: Less precise than SRS, clusters must be mini-populations

Systematic Sampling

Method:

  1. Calculate k = N/n (population size / sample size)
  2. Randomly select starting point (1 to k)
  3. Select every kth individual

Example: From 1000 students, want 100. k = 10. Start at random number 7, then select 7, 17, 27, 37, etc.

Advantages: Easy to implement, spreads sample across population
Disadvantages: Problems if list has hidden patterns or cycles

Comparing Methods

Use SRS when: Simplest approach, have complete list
Use Stratified when: Subgroups matter, want comparisons
Use Cluster when: Geographic spread, practical constraints
Use Systematic when: Have ordered list, want efficiency

Sampling Bias

Selection Bias: Some individuals more likely to be selected
Voluntary Response: Individuals self-select (those with strong opinions respond)
Undercoverage: Some groups excluded from sampling frame
Nonresponse: Selected individuals don't participate

Avoid bias: Use random selection, ensure complete sampling frame, maximize response rate

Key Principles

Randomization reduces bias
Larger samples generally better (but quality > quantity)
Representative samples crucial for valid inference
Response rate matters (low response = nonresponse bias)

Remember: Good sampling is the foundation of statistical inference. A biased sample, no matter how large, leads to invalid conclusions!

📚 Practice Problems

1Problem 1easy

Question:

A principal wants to survey 50 students from a high school of 500 students. Describe how to select a simple random sample (SRS).

💡 Show Solution

Step 1: Understand Simple Random Sample (SRS) Every student must have equal probability of being selected Every group of 50 students must have equal probability

Step 2: Assign numbers to all students Number all 500 students from 001 to 500 Use student ID numbers or assign sequentially

Step 3: Use random selection method Option A: Random number generator

  • Generate 50 random numbers between 1 and 500
  • No repeats allowed
  • Select students with those numbers

Option B: Random number table

  • Pick starting point randomly
  • Read 3-digit numbers
  • Ignore repeats and numbers >500
  • Continue until 50 students selected

Option C: Names in hat (physical)

  • Not practical for 500, but conceptually valid
  • Mix thoroughly, draw 50

Step 4: Verify randomness Each student has probability 50/500 = 1/10 of being selected No systematic pattern in selection No human judgment involved

Answer: Number all 500 students from 001-500. Use a random number generator or table to select 50 unique numbers between 1 and 500. Survey the students corresponding to those numbers.

2Problem 2easy

Question:

Explain why this is NOT a simple random sample: "To survey students, the principal stands at the main entrance and surveys the first 50 students who arrive at school."

💡 Show Solution

Step 1: Identify the sampling method used This is a CONVENIENCE sample Principal selects students who are easy to reach Based on who arrives first

Step 2: Check SRS requirements For SRS, every student must have equal probability For SRS, selection must be random

Step 3: Identify problems with this method

Problem 1: Unequal probabilities

  • Students who arrive early: HIGH probability of selection
  • Students who arrive late: ZERO probability
  • Not all students have equal chance

Problem 2: Systematic bias

  • Early arrivers may be different from late arrivers
  • Might be more studious, live closer, take bus, etc.
  • Different characteristics than general population

Problem 3: Not random

  • Order of arrival determines selection
  • Predictable pattern
  • Could manipulate by arriving early/late

Step 4: Potential biases introduced Early arrivers might:

  • Be more organized/responsible
  • Have different transportation
  • Live closer to school
  • Have different family situations
  • Be more/less involved in activities

Results won't represent all students

Answer: This is NOT a simple random sample because not all students have equal probability of selection - only early arrivers can be chosen. It's a convenience sample that likely introduces bias, as early-arriving students may differ systematically from the general student population.

3Problem 3medium

Question:

A university has 4,000 freshmen, 3,000 sophomores, 2,000 juniors, and 1,000 seniors. Design a stratified random sample of 200 students that maintains class proportions.

💡 Show Solution

Step 1: Calculate total population Total = 4,000 + 3,000 + 2,000 + 1,000 = 10,000 students

Step 2: Find proportion of each class Freshmen: 4,000/10,000 = 0.40 = 40% Sophomores: 3,000/10,000 = 0.30 = 30% Juniors: 2,000/10,000 = 0.20 = 20% Seniors: 1,000/10,000 = 0.10 = 10%

Step 3: Apply proportions to sample size Sample size = 200 students

Freshmen: 200 × 0.40 = 80 students Sophomores: 200 × 0.30 = 60 students Juniors: 200 × 0.20 = 40 students Seniors: 200 × 0.10 = 20 students

Step 4: Verify 80 + 60 + 40 + 20 = 200 ✓ 80/200 = 40% ✓ 60/200 = 30% ✓ 40/200 = 20% ✓ 20/200 = 10% ✓

Step 5: How to select within each stratum From each class, take a simple random sample:

  • Randomly select 80 from 4,000 freshmen
  • Randomly select 60 from 3,000 sophomores
  • Randomly select 40 from 2,000 juniors
  • Randomly select 20 from 1,000 seniors

Answer: Select 80 freshmen, 60 sophomores, 40 juniors, and 20 seniors using simple random sampling within each class. This maintains the 40%-30%-20%-10% class distribution.

4Problem 4medium

Question:

A researcher wants to study student satisfaction across a large university with 30 dorms. She randomly selects 5 dorms and surveys ALL students in those 5 dorms. What sampling method is this? What are the advantages and potential problems?

💡 Show Solution

Step 1: Identify the sampling method This is CLUSTER SAMPLING

  • Population divided into groups (clusters = dorms)
  • Randomly select SOME clusters (5 dorms)
  • Survey ALL individuals in selected clusters

Step 2: Advantages of cluster sampling

  1. Cost-effective

    • Only need to visit 5 dorms, not 30
    • Reduced travel time and expense
    • Easier to administer
  2. Practical

    • Complete list of students only needed for selected dorms
    • Don't need list of all students initially
    • Can focus resources on selected areas
  3. Logistically simple

    • Survey whole dorms at once
    • Can hold dorm-wide meetings
    • Easier coordination

Step 3: Potential problems

  1. Clusters may not be representative

    • Each dorm might have unique characteristics
    • Honors dorm, freshman dorm, quiet dorm, party dorm
    • Selected dorms might not represent all 30
  2. Students within dorms are similar

    • Dorm culture affects all residents
    • Same facilities, RAs, rules
    • Reduces variability (not as much info as SRS)
  3. Increased sampling error

    • Generally less precise than SRS of same size
    • Need larger sample for same precision
    • Between-cluster variability matters
  4. Risk of unlucky selection

    • Could randomly select 5 unusual dorms
    • With only 5 clusters, high risk
    • Should select more clusters if possible

Step 4: When cluster sampling is best Good when:

  • Clusters are heterogeneous (mixed) internally
  • Clusters are similar to each other
  • Cost/logistics are major concerns

Bad when:

  • Clusters are very different from each other
  • Students within cluster are very similar
  • High precision needed

Answer: Cluster sampling. Advantages: cost-effective, practical, easy logistics. Problems: dorms may differ systematically (honors vs. freshman dorm), students within dorms are similar (less variability), potentially higher sampling error than SRS. Best when cost matters more than precision.

5Problem 5hard

Question:

Compare stratified random sampling and cluster sampling. When should you use each? Give examples where each would be preferred.

💡 Show Solution

STRATIFIED RANDOM SAMPLING:

How it works:

  1. Divide population into homogeneous groups (strata)
  2. Take a random sample from EACH stratum
  3. Combine samples

Key: Sample from ALL groups, but not everyone in each group

Example strata: grade levels, income brackets, regions

CLUSTER SAMPLING:

How it works:

  1. Divide population into groups (clusters)
  2. Randomly select SOME clusters
  3. Survey ALL (or sample) within selected clusters

Key: Use only SOME groups, but everyone in selected groups

Example clusters: schools, city blocks, dorms

COMPARISON TABLE:

Sample from all groups? Stratified: YES (every stratum) Cluster: NO (only selected clusters)

Survey everyone in selected group? Stratified: NO (random sample) Cluster: YES (all members)

Within-group similarity: Stratified: HIGH (homogeneous strata) Cluster: LOW (heterogeneous clusters)

Between-group differences: Stratified: HIGH (different strata) Cluster: LOW (similar clusters)

Precision: Stratified: HIGHER (ensures representation) Cluster: LOWER (risk of unrepresentative clusters)

Cost: Stratified: HIGHER (must visit all strata) Cluster: LOWER (visit only selected clusters)

WHEN TO USE STRATIFIED:

  1. Subgroups are important Example: Testing drug on different age groups Want to ensure all ages represented

  2. Groups differ substantially Example: Income study in city with rich and poor areas Want proportional representation

  3. Precision is priority Example: Political polling Need accurate estimates for each demographic

  4. Have good frame for all strata Example: Employee survey with department lists Can access each group easily

WHEN TO USE CLUSTER:

  1. No natural strata Example: Households on city blocks Blocks are similar, households within block vary

  2. Cost/logistics are major concern Example: Door-to-door health survey Cheaper to survey whole neighborhoods

  3. Complete list unavailable Example: All residents in a city Can list neighborhoods, but not all people

  4. Groups are internally diverse Example: Schools in a district (each has mix of students) Each school represents population well

REAL EXAMPLES:

Stratified:

  • Poll likely voters by party affiliation (Dem, Rep, Ind)
  • Medical study ensuring males and females both represented
  • University survey with proportional freshmen, soph, junior, senior

Cluster:

  • WHO selecting villages in developing country for vaccination study
  • Census using city blocks
  • Agricultural study selecting random farms, testing all plots on each

Answer: Use stratified when groups differ and you want precise estimates ensuring all groups represented (costs more). Use cluster when groups are similar and cost/logistics matter more (less precise). Stratified samples from all groups; cluster samples all from selected groups.