An Interactive Guide to Linear Regression

1. The Core Idea: Finding the "Best Fit" Line

Linear regression aims to find a straight line that best represents the relationship between two variables. But what does "best" mean? It means finding the line that minimizes the total squared error (or "residuals")—the vertical distances from each data point to the line. Try it yourself! Adjust the sliders to change the line's intercept and slope, and watch how the **Sum of Squared Residuals (SSR)** changes. Your goal is to find the line with the lowest possible SSR.

Sum of Squared Residuals (SSR)

0.00

2. Deconstructing the Model Equation

The relationship is formally described by this equation. Hover over each term to learn about its role. The model you built above is trying to find the best estimates (denoted with a "hat", e.g., $\hat{\beta}_0$) for these true, unknown population parameters.

Y = β₀ + β₁X + ε

Hover over a term in the equation to see its definition here.

3. Key Assumptions of Linear Regression

For the results of a linear regression to be reliable and unbiased (specifically for inference), a few key assumptions about the data and the error term ($\epsilon$) must be met. This is often remembered by the acronym LINE: Linearity, Independence, Normality, and Equal variance.

The underlying relationship between the independent variable (X) and the dependent variable (Y) is linear. If the relationship is curved, a straight line will not be a good fit.

✓ Good (Linear)

✗ Bad (Non-linear)

4. Interpretation & Evaluation

Once you have the best-fit line, how do you know if it's any good? We use metrics like **R-squared (R²)**. Click the "Find Best Fit" button in the first section to see the results for this dataset.

Model Fit Summary

R-squared (R²)

-

This value indicates the proportion of the variance in the dependent variable that is predictable from the independent variable(s).

Adjusted R-squared

-

A modified R² that penalizes for adding predictors that don't improve the model. Useful for Multiple Regression.

5. Beyond the Basics

Simple Linear Regression is a powerful starting point. In practice, you will encounter more complex scenarios that require extensions of these core ideas. Here are some key areas for further study.