Transformations for Linearity

Linearizing nonlinear relationships

Transformations to Achieve Linearity

Why Transform?

Problem: Many relationships are nonlinear

Solution: Transform one or both variables to make relationship linear

Benefits:

Can use linear regression tools
Easier interpretation
Better predictions

When to Transform

Indicators need transformation:

Scatterplot shows curve (not line)
Residual plot shows pattern (not random)
Low r² despite clear relationship

Don't transform if:

Relationship already linear
Residual plot looks good

Common Transformations

For y:

log(y): Exponential growth/decay
√y: Moderate curve
1/y: Inverse relationship

For x:

log(x): Logarithmic curve
x²: Quadratic relationship
√x: Moderate curve

Both:

log(y) vs log(x): Power relationship

Exponential Model

Original relationship: $y = ab^x$

Curved scatterplot, exponential growth/decay

Transform: Take log of y

Becomes linear: $\log(y) = \log(a) + x\log(b)$

Regression: log(y) on x gives linear relationship

Example: Population growth, compound interest, radioactive decay

Example 1: Exponential Transformation

Bacteria population over time:

Original data shows exponential growth (curved)

Transform: Calculate log(population) for each time

New scatterplot: log(population) vs time is linear!

Regression: $\log(\hat{y}) = 2 + 0.3x$

Back-transform for predictions:

$\hat{y} = 10^{2 + 0.3x}$

Power Model

Original relationship: $y = ax^p$

Curved relationship

Transform: Take log of both

Becomes linear: $\log(y) = \log(a) + p\log(x)$

Regression: log(y) on log(x) gives linear relationship

Example: Area vs radius, metabolic rate vs body mass

Example 2: Power Transformation

Planet orbital period vs distance from sun:

Both variables on logarithmic scale → linear!

Regression: $\log(\text{period}) = a + b\log(\text{distance})$

Slope b ≈ 1.5 (Kepler's third law: $p \propto d^{1.5}$ )

Square Root and Squaring

√y transformation:

Moderate upward curve
Spread-increasing pattern

x² transformation:

Quadratic relationship (parabola)
But limited to one side

Example: Free-fall distance (d) vs time (t)

$d = \frac{1}{2}gt^2$ suggests regress d on t²

Choosing the Right Transformation

Trial and error approach:

Try transformation
Make scatterplot of transformed data
Check residual plot
Check r²
If not linear, try different transformation

Guided approach:

Exponential pattern → log(y)
Power relationship → log-log
Quadratic → x²
Fan shape in residuals → log(y)

Interpreting Transformed Models

Log(y) on x:

Slope interpretation: "For each unit increase in x, y is multiplied by $10^b$ "

Example: Slope = 0.301 in log(population) vs time

"Each year, population multiplies by $10^{0.301} \approx 2$ "

(Population doubles each year)

Log(y) on log(x):

Slope interpretation: "A 1% increase in x is associated with approximately b% increase in y"

Back-Transformation

After fitting model on transformed data:

Make predictions on transformed scale, then back-transform

Example: Model is $\log(\hat{y}) = 2 + 0.3x$

For x = 10:

$\log(\hat{y}) = 2 + 0.3(10) = 5$

$\hat{y} = 10^5 = 100,000$

Don't just transform predictions after the fact!

Checking the Transformation

Good transformation produces:

Linear scatterplot
Random residual plot
Higher r²
Roughly constant spread

Compare before/after:

Original r² vs transformed r²
Original residual plot vs transformed residual plot

Multiple Transformations

Sometimes try several:

Example: Comparing transformations for curved data

log(y) vs x: r² = 0.85
√y vs x: r² = 0.92
y vs x²: r² = 0.78

Choose: √y vs x (highest r², simplest)

Common Patterns and Transformations

| Pattern | Try | |---------|-----| | Exponential growth/decay | log(y) | | Power relationship | log(y) and log(x) | | Quadratic (parabola) | x² | | Moderate upward curve | √y or √x | | Spread increases with y | log(y) |

Residual Plot After Transformation

Must check! Transformation successful if:

No pattern in residuals
Random scatter around 0
Constant spread

If still see pattern: Try different transformation

Linearizable vs Non-linearizable

Linearizable: Can be made linear with transformation

Exponential: y = ab^x
Power: y = ax^p
Quadratic: y = a + bx + cx²

Non-linearizable: Cannot be easily linearized

Some periodic functions
Complex curves
May need nonlinear regression

Common Mistakes

❌ Not checking residual plot after transformation
❌ Back-transforming incorrectly
❌ Transforming when already linear
❌ Misinterpreting slope of transformed model
❌ Comparing r² before and after (different y variable!)

Practical Considerations

Pros of transformation:

Use simple linear methods
Often theoretically motivated
Can improve predictions

Cons of transformation:

Harder to interpret
Must back-transform for predictions
Not all relationships linearizable

Alternative: Modern nonlinear regression (beyond AP Stats)

Example 3: Complete Transformation

Original: y vs x is curved (r² = 0.40, residuals show pattern)

Transform: Use log(y)

New: log(y) vs x is linear (r² = 0.95, random residuals)

Equation: $\log(\hat{y}) = 1.5 + 0.2x$

Interpretation: "Each unit increase in x multiplies y by $10^{0.2} \approx 1.58$ "

For prediction at x = 10:

$\log(\hat{y}) = 1.5 + 0.2(10) = 3.5$

$\hat{y} = 10^{3.5} \approx 3162$

Quick Reference

Exponential (y = ab^x): Use log(y) vs x

Power (y = ax^p): Use log(y) vs log(x)

Quadratic: Use y vs x²

Goal: Linear scatterplot, random residuals, high r²

Check: Always examine residual plot of transformed data

Interpret carefully: Slopes mean different things after transformation

Remember: Transform to fix nonlinearity, but always check if transformation worked! Linear models are powerful when applied to appropriately transformed data.

📚 Practice Problems

1Problem 1medium

❓ Question:

A scatterplot of x vs y shows a curved exponential pattern. The residual plot for ŷ = a + bx is curved. Try plotting log(y) vs x. What pattern should you see if this transformation works?

💡 Show Solution

Step 1: Understand the original problem

Scatterplot shows exponential curve (y = ae^(bx))
Linear model residuals are curved
Need to linearize the relationship

Step 2: Why try log(y) vs x? Exponential relationship: y = ae^(bx) Take log of both sides: log(y) = log(a) + bx

This is LINEAR in x!

Step 3: What to look for after transformation If log transformation is appropriate: ✓ Scatterplot of log(y) vs x should be LINEAR ✓ Residual plot should show RANDOM scatter ✓ No curved pattern in residuals

Step 4: How to check

Create new variable: y' = log(y)
Plot y' vs x (should be linear)
Fit regression: ŷ' = b₀ + b₁x
Check residual plot (should be random)

Step 5: Interpretation After transformation:

Can use linear regression on log(y) vs x
To predict y: ŷ = e^(b₀ + b₁x)
Or: ŷ = e^(b₀) × e^(b₁x)

Answer: After log transformation, the plot of log(y) vs x should show a LINEAR pattern, and residuals should be randomly scattered with no curve.

2Problem 2hard

❓ Question:

Data shows a power relationship: y = ax^b. What transformation will linearize this relationship?

💡 Show Solution

Step 1: Identify the relationship Power model: y = ax^b (Example: area = πr², where b = 2)

Step 2: Apply log transformation to BOTH variables Take log of both sides: log(y) = log(a × x^b) log(y) = log(a) + log(x^b) log(y) = log(a) + b·log(x)

Step 3: Recognize linear form Let: Y = log(y), X = log(x), A = log(a) Then: Y = A + bX

This is LINEAR!

Step 4: How to transform

Create Y = log(y)
Create X = log(x)
Plot Y vs X (should be linear)
Fit regression: Ŷ = b₀ + b₁X

Step 5: Interpret coefficients After regression:

b₁ = power (exponent b)
b₀ = log(a), so a = e^(b₀) or a = 10^(b₀)

To predict original y: ŷ = e^(b₀) × x^(b₁) [if using natural log] ŷ = 10^(b₀) × x^(b₁) [if using log base 10]

Example: If Ŷ = 2 + 1.5X (using log base 10) Then y = 10² × x^1.5 = 100x^1.5

Answer: Take log of BOTH variables. Plot log(y) vs log(x), which linearizes power relationships.

3Problem 3hard

❓ Question:

After fitting y vs x, the residual plot fans out (variance increases). You try log(y) vs x and get a better residual plot. Why does this help?

💡 Show Solution

Step 1: Identify the original problem Fan-shaped residuals mean:

Variance increases with x
Violates constant variance assumption
Often occurs when y grows exponentially

Step 2: Why log(y) helps with variance When y is exponential or multiplicative:

Larger y values have larger variability
Variance proportional to mean
log transformation STABILIZES variance

Mathematical reason: If y has variance proportional to y²: Var(y) ∝ y²

Then: Var(log(y)) ≈ constant (Delta method from calculus)

Step 3: Additional benefit Log transformation often: ✓ Linearizes exponential relationships ✓ Stabilizes variance (fixes fan shape) ✓ Makes distribution more symmetric ✓ Reduces impact of outliers

Step 4: When to use log transformation Use log(y) when you see:

Exponential growth pattern
Fan-shaped residuals
Right-skewed distribution
Multiplicative relationships
Variance increases with mean

Step 5: Check after transformation After using log(y):

Residual plot should show equal spread
No fan shape
Random scatter
Valid for inference

Answer: Log transformation stabilizes variance. When variance increases with mean (fan shape), log(y) typically has constant variance, fixing the heteroscedasticity problem.

4Problem 4medium

❓ Question:

You fit log(y) = 2 + 0.5x using natural log. Predict y when x = 10.

💡 Show Solution

Step 1: Understand the model Fitted equation: log(y) = 2 + 0.5x This uses NATURAL LOG (ln)

Step 2: Predict log(y) for x = 10 log(y) = 2 + 0.5(10) log(y) = 2 + 5 log(y) = 7

Step 3: Back-transform to get y Since we used natural log (ln): ln(y) = 7

To solve for y, use exponential: y = e^7

Step 4: Calculate y = e^7 ≈ 1,096.63

Step 5: Interpretation "When x = 10, y is predicted to be approximately 1,097."

Important notes:

Must back-transform using e^(predicted value)
If using log₁₀, would use 10^(predicted value)
Always specify which log was used!

Alternative form: Original model: y = e^(2 + 0.5x) = e² × e^(0.5x) y = e² × e^(0.5x) ≈ 7.39 × e^(0.5x)

When x = 10: y = 7.39 × e^5 ≈ 1,097

Answer: y = e^7 ≈ 1,097

5Problem 5hard

❓ Question:

A residual plot shows both curvature AND fan shape. What transformations might you try?

💡 Show Solution

Step 1: Identify TWO problems

Curvature → nonlinear relationship
Fan shape → non-constant variance

Need transformation that fixes BOTH!

Step 2: Try log(y) vs x Often works for:

Exponential relationships (fixes curve)
Multiplicative error (fixes fan)
Right-skewed data

Check result: ✓ Should be linear ✓ Should have constant variance

Step 3: If log(y) doesn't work completely Try other transformations:

√y vs x (square root)
1/y vs x (reciprocal)
log(y) vs log(x) (both sides)

Step 4: Systematic approach

Try log(y) vs x first (most common)
Check residual plot
If still curved, try log-log or other
If variance still not constant, try different transformation

Step 5: Decision guide Pattern → Try transformation:

Exponential curve + fan → log(y) vs x
Power relationship → log(y) vs log(x)
Moderate curve → √y vs x
Strong right skew → log(y)

Step 6: After transformation Must verify: ✓ Scatterplot is linear ✓ Residuals randomly scattered ✓ Constant variance (no fan) ✓ Approximately normal residuals

Answer: Try log(y) vs x first, as it often fixes both curvature (exponential) and fan shape (non-constant variance). Check residual plot; if issues remain, try other transformations like √y or log-log.

🎴

Practice with Flashcards

Review key concepts with our flashcard system

📖

Browse All Topics

Explore other calculus topics