Sampling Distributions
Distribution of sample statistics
Sampling Distributions
What is a Sampling Distribution?
Statistic: Number calculated from sample (e.g., sample mean , sample proportion )
Sampling Distribution: Distribution of a statistic across all possible samples of size n
Key insight: Statistics vary from sample to sample (sampling variability). Sampling distribution describes this variability.
Example: Sampling Distribution of
Population: All students, μ = 70, σ = 10
Take many samples of n = 25:
- Sample 1: = 72
- Sample 2: = 68
- Sample 3: = 71
- ...
Plot all sample means → Sampling distribution of
Properties of Sampling Distribution of
Center:
Sample mean is unbiased estimator of population mean
Spread:
Called standard error (SE)
Key: Larger sample → smaller standard error (more precise estimates)
Shape:
- If population normal → sampling distribution exactly normal
- If population not normal → approximately normal if n large enough (CLT)
Sampling Distribution of Sample Proportion
Population proportion: p
Sample proportion:
Center:
Spread:
Shape: Approximately normal if np ≥ 10 and n(1-p) ≥ 10
Example: Coin Flips
Fair coin (p = 0.5), n = 100 flips
Center: = 0.5
Spread:
Shape: np = 50 ≥ 10, n(1-p) = 50 ≥ 10 → approximately normal
Interpretation: Sample proportions typically within 0.05 of true value 0.5
Bias vs Variability
Bias: Systematic over- or under-estimation
- parameter
Variability: Spread of sampling distribution
- Measured by standard error
Ideal: Low bias AND low variability (unbiased with small SE)
Increase n:
- Doesn't reduce bias
- DOES reduce variability (SE decreases)
Standard Error
Standard Error (SE): Standard deviation of sampling distribution
For sample mean:
For sample proportion:
Key pattern: SE ∝ 1/√n
To cut SE in half, need 4× sample size
Using Sampling Distributions
Find probabilities about statistics:
Example: Population μ = 100, σ = 15. Sample n = 25.
P( > 105) = ?
Standardize:
P(Z > 1.67) ≈ 0.0475
Difference Between Two Means
Two independent samples:
Shape: Approximately normal if both samples meet conditions
Difference Between Two Proportions
Conditions: Each sample meets np ≥ 10 and n(1-p) ≥ 10
Simulating Sampling Distributions
Steps:
- Take sample of size n from population
- Calculate statistic
- Repeat many times
- Plot distribution of statistics
Result: Empirical approximation of theoretical sampling distribution
Common Misconceptions
❌ Confusing population distribution with sampling distribution
❌ Thinking larger sample reduces bias (only reduces variability)
❌ Forgetting √n in denominator of SE
❌ Using σ instead of σ/√n for
Quick Reference
Sampling Distribution of :
- Center: μ
- Spread: σ/√n
- Shape: Normal (if population normal or n large)
Sampling Distribution of :
- Center: p
- Spread: √(p(1-p)/n)
- Shape: Normal (if np ≥ 10 and n(1-p) ≥ 10)
Remember: Statistics vary from sample to sample. Sampling distribution describes this variability!
📚 Practice Problems
1Problem 1easy
❓ Question:
What is a sampling distribution? How does it differ from the population distribution and sample distribution?
💡 Show Solution
Step 1: Define population distribution Distribution of individual values in the ENTIRE population
Example: Heights of ALL adults in US
- Mean: μ = 68 inches
- SD: σ = 3 inches
- Shape: approximately normal
Step 2: Define sample distribution Distribution of values in ONE specific sample
Example: Heights of 50 adults we measured
- Mean: x̄ = 67.5 inches (sample mean)
- SD: s = 2.8 inches (sample SD)
- Shape: approximately normal (like population)
- This is ONE sample
Step 3: Define sampling distribution Distribution of a STATISTIC across ALL POSSIBLE samples
Example: Distribution of x̄ (sample mean) from all possible samples of size n = 50
- This is NOT about individual heights
- This is about SAMPLE MEANS
- Each possible sample of 50 gives one x̄
- Sampling distribution = distribution of all those x̄'s
Step 4: Key differences POPULATION DISTRIBUTION:
- What: Individual values
- Size: N (entire population)
- Parameters: μ, σ
- Usually don't know exactly
SAMPLE DISTRIBUTION:
- What: Individual values in one sample
- Size: n (one sample)
- Statistics: x̄, s
- Estimates population
SAMPLING DISTRIBUTION:
- What: Sample statistics (like x̄) across all samples
- Size: All possible samples
- Parameters: μₓ̄, σₓ̄
- Theoretical distribution
Step 5: Example with dice Population: All possible rolls of a die
- Values: {1, 2, 3, 4, 5, 6}
- μ = 3.5, σ = 1.71
Sample: One roll → got {4}
- Just one value
Sampling distribution of x̄ for n = 2:
- Take all possible pairs: (1,1), (1,2), ..., (6,6)
- Calculate mean of each pair
- Distribution of those means
- μₓ̄ = 3.5 (same as population)
- σₓ̄ = 1.71/√2 ≈ 1.21 (smaller than population)
Step 6: Why sampling distributions matter We take ONE sample and calculate x̄ We want to know: How far is our x̄ from μ?
Sampling distribution tells us:
- Expected value of x̄
- Variability of x̄
- Shape of x̄ distribution
- Allows us to make inferences!
Step 7: Visual representation Population: Individual heights: 62, 65, 68, 71, 74... (many values) Distribution: μ = 68, σ = 3
Sample (n=50): Individual heights in our sample: 66, 67, 69... (50 values) x̄ = 67.5
Sampling Distribution: All possible x̄'s from samples of size 50 Distribution: μₓ̄ = 68, σₓ̄ = 3/√50 ≈ 0.42 Shape: Normal (by CLT)
Answer: POPULATION DISTRIBUTION: Distribution of individual values in entire population (μ, σ).
SAMPLE DISTRIBUTION: Distribution of individual values in ONE specific sample (x̄, s).
SAMPLING DISTRIBUTION: Distribution of a sample statistic (like x̄) across ALL POSSIBLE samples of size n. Tells us how the statistic varies from sample to sample.
Key: Sampling distribution lets us understand variability of our sample statistics and make inferences about population parameters.
2Problem 2easy
❓ Question:
A population has μ = 50 and σ = 12. If we take samples of size n = 36, what are the mean and standard deviation of the sampling distribution of x̄?
💡 Show Solution
Step 1: Identify given information Population parameters: μ = 50 σ = 12
Sample size: n = 36
Find: μₓ̄ and σₓ̄ (mean and SD of sampling distribution)
Step 2: Find mean of sampling distribution Formula: μₓ̄ = μ
The mean of the sampling distribution equals the population mean!
μₓ̄ = 50
Step 3: Why μₓ̄ = μ? Sample mean x̄ is an UNBIASED estimator of μ On average, x̄ equals μ Sometimes above, sometimes below But average of all possible x̄'s = μ
This is true regardless of sample size!
Step 4: Find standard deviation of sampling distribution Formula: σₓ̄ = σ/√n
Also called "standard error of the mean"
σₓ̄ = 12/√36 = 12/6 = 2
Step 5: Interpret σₓ̄ Standard deviation of sampling distribution = 2
This means:
- Individual values vary with SD = 12
- Sample means vary with SD = 2
- Sample means are LESS variable than individuals!
Makes sense: averaging reduces variability
Step 6: Compare individual and sampling distributions INDIVIDUAL VALUES (population): μ = 50 σ = 12 Values spread out
SAMPLE MEANS (sampling distribution): μₓ̄ = 50 (same center) σₓ̄ = 2 (much less spread) Means cluster closer to μ
Step 7: Effect of sample size If we increased to n = 100: σₓ̄ = 12/√100 = 12/10 = 1.2 Even less variability!
If we decreased to n = 9: σₓ̄ = 12/√9 = 12/3 = 4 More variability
Larger samples → more precise estimates → smaller SE
Step 8: Visual comparison Population: σ = 12 |--|--|--|--|--|--|--| 26 32 38 44 50 56 62
Sampling distribution (n=36): σₓ̄ = 2
|------|
48 50 52
Sample means cluster much tighter around μ!
Answer: μₓ̄ = 50 σₓ̄ = 2
The sampling distribution of x̄ has the same mean as the population (50) but much smaller standard deviation (2 vs 12). Sample means are less variable than individual values - they cluster more tightly around the population mean.
3Problem 3medium
❓ Question:
What does the Central Limit Theorem (CLT) state? Why is it important?
💡 Show Solution
Step 1: State the Central Limit Theorem For a random sample of size n from ANY population with mean μ and standard deviation σ:
As n increases, the sampling distribution of x̄ approaches a normal distribution with:
- Mean: μₓ̄ = μ
- Standard deviation: σₓ̄ = σ/√n
Regardless of the population's shape!
Step 2: Key components
-
Works for ANY population distribution
- Normal, skewed, uniform, bimodal, anything!
-
Larger n → more normal
- Rule of thumb: n ≥ 30 usually sufficient
- If population is normal, works for any n
- If population is very skewed, need larger n
-
Gives us the parameters: μₓ̄ = μ, σₓ̄ = σ/√n
Step 3: Why it's remarkable Population could be:
- Heavily skewed
- Bimodal
- Discrete
- Any weird shape
But sampling distribution of x̄ is approximately NORMAL!
This is counterintuitive but proven mathematically.
Step 4: Example with dice Population: Uniform on {1, 2, 3, 4, 5, 6}
- Discrete, rectangular shape
- μ = 3.5, σ = 1.71
Sampling distribution of x̄:
- n = 1: looks uniform (rectangular)
- n = 5: starting to look bell-shaped
- n = 30: very close to normal!
- As n → ∞: perfectly normal
Step 5: Why CLT is important Allows us to use normal probabilities!
Even if we don't know population shape:
- Can assume x̄ ~ Normal (if n large enough)
- Can calculate P(x̄ in some range)
- Can create confidence intervals
- Can perform hypothesis tests
All based on normal distribution properties!
Step 6: Practical application Quality control: Measure sample mean weight
- Individual boxes might be any distribution
- But x̄ for n = 50 boxes is approximately normal
- Can calculate P(x̄ is too far from target)
Medical: Average blood pressure in sample
- Individual BP's vary unpredictably
- But x̄ for n = 100 patients is approximately normal
- Can make inferences about population mean
Step 7: Limitations CLT applies to: ✓ Sample mean x̄ ✓ Sample sum Σx (also becomes normal) ✓ Sample proportion p̂ (special case)
Does NOT apply to: ✗ Individual values (keep population shape) ✗ Sample median (different distribution) ✗ Sample maximum/minimum
Step 8: How large is "large enough"? General rules:
- n ≥ 30: usually sufficient for CLT
- Population normal: CLT works for any n
- Population moderately skewed: n ≥ 15 okay
- Population heavily skewed: need n ≥ 40 or more
- Population has outliers: may need very large n
Answer: The Central Limit Theorem states that the sampling distribution of x̄ approaches a normal distribution with mean μ and standard deviation σ/√n as sample size increases, REGARDLESS of the population's shape.
Importance:
- Lets us use normal probabilities for x̄ even when population isn't normal
- Foundation for confidence intervals and hypothesis tests
- Explains why normal distribution appears so often in nature
- Works for almost any population (very general theorem)
This is perhaps the most important theorem in statistics!
4Problem 4medium
❓ Question:
A population is right-skewed with μ = 80 and σ = 15. For samples of size n = 50, find the probability that x̄ is between 78 and 82.
💡 Show Solution
Step 1: Check if we can use normal approximation Population is right-skewed (not normal) But n = 50 ≥ 30 By Central Limit Theorem: sampling distribution of x̄ is approximately normal!
Step 2: Find parameters of sampling distribution μₓ̄ = μ = 80
σₓ̄ = σ/√n = 15/√50 = 15/7.07 ≈ 2.12
Step 3: Set up probability question Find: P(78 < x̄ < 82)
x̄ ~ Normal(μ = 80, σ = 2.12) approximately
Step 4: Standardize to z-scores z₁ = (78 - 80)/2.12 = -2/2.12 ≈ -0.94
z₂ = (82 - 80)/2.12 = 2/2.12 ≈ 0.94
Step 5: Find probability P(78 < x̄ < 82) = P(-0.94 < Z < 0.94)
Using standard normal table or symmetry: P(Z < 0.94) ≈ 0.8264 P(Z < -0.94) ≈ 0.1736
P(-0.94 < Z < 0.94) = 0.8264 - 0.1736 = 0.6528
Step 6: Interpret About 65.3% of samples of size 50 will have a sample mean between 78 and 82.
Even though population is skewed:
- Individual values spread out (σ = 15)
- Sample means cluster near μ = 80 (σₓ̄ = 2.12)
- Distribution of x̄ is approximately normal
Step 7: Compare to individual values If we asked: P(78 < X < 82) for individual value?
Can't answer! We'd need the population distribution shape. Right-skewed means not symmetric, so normal approximation doesn't work for individuals.
But for x̄ with n = 50, CLT saves us - we CAN use normal!
Step 8: Verify reasonableness Range 78-82 is μ ± 2 In terms of SE: 80 ± 2(2.12) = 80 ± 4.24 Our range 78-82 is within about 1 SE
For normal: P(μ - 1σ < X < μ + 1σ) ≈ 0.68 Our answer 0.6528 ≈ 0.65 is close ✓
Answer: P(78 < x̄ < 82) ≈ 0.653 or 65.3%
Despite the population being right-skewed, the Central Limit Theorem allows us to treat the sampling distribution of x̄ as approximately normal (since n = 50 ≥ 30). About 65% of samples will have means within 2 units of the population mean.
5Problem 5hard
❓ Question:
Two independent populations: Population A (μ = 100, σ = 20) and Population B (μ = 90, σ = 15). Take samples of n₁ = 40 from A and n₂ = 50 from B. Find the mean and standard deviation of the sampling distribution of x̄₁ - x̄₂. What is P(x̄₁ - x̄₂ > 15)?
💡 Show Solution
Step 1: Set up the problem Population A: μ₁ = 100, σ₁ = 20, n₁ = 40 Population B: μ₂ = 90, σ₂ = 15, n₂ = 50
Want distribution of: x̄₁ - x̄₂ (difference of sample means)
Step 2: Find mean of difference For independent samples: μₓ̄₁₋ₓ̄₂ = μ₁ - μ₂ = 100 - 90 = 10
Expected difference is 10.
Step 3: Find standard deviation of difference For independent samples: σₓ̄₁₋ₓ̄₂ = √(σ₁²/n₁ + σ₂²/n₂)
Calculate each term: σ₁²/n₁ = 20²/40 = 400/40 = 10 σ₂²/n₂ = 15²/50 = 225/50 = 4.5
σₓ̄₁₋ₓ̄₂ = √(10 + 4.5) = √14.5 ≈ 3.81
Step 4: Check CLT conditions n₁ = 40 ≥ 30 ✓ n₂ = 50 ≥ 30 ✓
By CLT: x̄₁ and x̄₂ are each approximately normal Therefore: x̄₁ - x̄₂ is approximately normal
x̄₁ - x̄₂ ~ Normal(μ = 10, σ = 3.81)
Step 5: Find P(x̄₁ - x̄₂ > 15) Standardize: z = (15 - 10)/3.81 = 5/3.81 ≈ 1.31
P(x̄₁ - x̄₂ > 15) = P(Z > 1.31)
Step 6: Look up probability From standard normal table: P(Z < 1.31) ≈ 0.9049
Therefore: P(Z > 1.31) = 1 - 0.9049 = 0.0951
Step 7: Interpret About 9.5% chance that sample mean from A exceeds sample mean from B by more than 15.
This makes sense:
- Expected difference is only 10
- 15 is (15-10)/3.81 ≈ 1.31 SE above expected
- Fairly unlikely but not extremely rare
Step 8: Why variances add (not subtract) Even though we're finding difference of means, we ADD variances.
Why? Variability adds when combining random variables.
- If x̄₁ varies: contributes to variation in difference
- If x̄₂ varies: also contributes to variation in difference
- Both sources of variation combine
Formula: Var(X - Y) = Var(X) + Var(Y) [for independent X, Y]
Step 9: Verify independence assumption Populations must be independent: ✓ Sample from A doesn't affect sample from B ✓ Different populations ✓ Random samples
If not independent (e.g., paired data), would need different approach!
Step 10: Summary of formulas used For independent samples:
- μₓ̄₁₋ₓ̄₂ = μ₁ - μ₂
- σₓ̄₁₋ₓ̄₂ = √(σ₁²/n₁ + σ₂²/n₂)
- Distribution: approximately normal (if CLT applies)
Answer: μₓ̄₁₋ₓ̄₂ = 10 σₓ̄₁₋ₓ̄₂ ≈ 3.81 P(x̄₁ - x̄₂ > 15) ≈ 0.095 or 9.5%
The difference in sample means has a mean of 10 and standard deviation of about 3.81. There's about a 9.5% chance that the sample mean from Population A exceeds the sample mean from Population B by more than 15.
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics