Linear Regression Visualizer

Click the scatter plot to add points. Watch the best-fit line, R², residuals, and confidence bands update in real time. Switch between OLS, polynomial, and robust regression.

Left-click to add a point. Right-click or use the table below to remove points. Drag to pan is not supported by design (click = data point).

Add at least 2 points to fit a model.

n (points)

R²

—

RMSE

—

MAE

—

SSRes

—

Scatter Plot & Regression Fit

Residual Plot (y&hat; vs e)

Q-Q Plot (normal quantiles)

Data Points

#	x	y	ŷ (fitted)	Residual e

Ordinary Least Squares (OLS)

OLS finds the line y = β₀ + β₁x that minimizes the sum of squared residuals:

SSRes = ∑ (y_i - ŷ_i)²

The closed-form solution is given by the normal equations: (X^TX)β = X^Ty, yielding β = (X^TX)^-1X^Ty. For simple linear regression this simplifies to explicit formulas: β₁ = Cov(x,y)/Var(x) and β₀ = &ybar; - β₁&xbar;.

R² — Coefficient of Determination

R² measures the proportion of total variance explained by the model:

R² = 1 - SSRes / SSTotal SSTotal = ∑(y_i - &ybar;)²

R² = 1 means the model perfectly explains all variance; R² = 0 means the model does no better than the mean. R² can be negative for polynomial or robust fits when the model is worse than the mean (possible with poor extrapolation). Caution: R² always increases when you add more terms, so always check residual plots.

Polynomial Regression

A degree-d polynomial regression adds x², ..., x^d as features. The model becomes y = β₀ + β₁x + β₂x² + ... + β_dx^d. This is still a linear model in the parameters β, so OLS still applies via the expanded design matrix. The risk is overfitting — a degree-n polynomial passes exactly through n+1 points (Runge’s phenomenon).

Robust Regression (Huber)

Huber regression replaces the squared loss with a hybrid function that is quadratic for small residuals (|e| ≤ δ) and linear for large residuals (|e| > δ). This makes it far less sensitive to outliers than OLS, which squares large residuals and lets outliers dominate. This visualizer implements iteratively reweighted least squares (IRLS) with the Huber M-estimator (δ = 1.35 × MAD/0.6745).

Reading the Residual Plot

A good fit shows residuals randomly scattered around zero with no pattern. A clear curve or trend in the residual plot means the model misses non-linearity. Funnel shapes indicate heteroscedasticity (variance increases with x). Large isolated residuals flag potential outliers or data-entry errors. The Q-Q plot compares residual quantiles to normal quantiles — points on the diagonal confirm the normality assumption needed for classical inference.

Confidence Bands

The confidence bands (shaded region) show the 95% confidence interval for the expected value E[y|x]. They are narrowest at x = &xbar; and widen toward the edges of the data range. The formula for the half-width at a given x is t_0.025,n-2 · s · √(1/n + (x-&xbar;)²/S_xx), where s² = SSRes/(n-2). Note: these are confidence bands for the mean prediction, not prediction intervals for individual observations.

Frequently Asked Questions

What is the difference between R² and adjusted R²?

R² increases whenever you add a predictor to the model, even if that predictor has no real relationship with y. Adjusted R² penalizes for the number of predictors: R²_adj = 1 - (1 - R²)(n-1)/(n-p-1), where p is the number of predictors. Adjusted R² can decrease if a new predictor adds less than it costs in degrees of freedom. For simple linear regression (p=1) the difference is small; it matters more in multiple regression.

Why do outliers have such a large effect on OLS?

OLS minimizes the sum of squared residuals. Squaring the error means a single point with residual 10 contributes 100 to the loss, dominating 100 points each with residual 1 (total 100). The influence of a point on the regression line is measured by its leverage (determined by how unusual its x value is) multiplied by its residual. The combination is called Cook's distance. Robust regression methods like Huber or RANSAC limit the influence of high-residual points.

When should I use polynomial vs. linear regression?

Use linear regression when the scatter plot suggests a straight-line relationship and residuals are random. Use polynomial regression when there is a clear curve (U-shape, S-shape) that a line cannot capture. As a rule of thumb, degree-2 handles simple curves and degree-3 handles inflection points; higher degrees risk overfitting. Always prefer the simpler model that explains the data adequately. Cross-validation can quantify whether the added complexity actually improves out-of-sample predictions.

What do the confidence bands represent?

The confidence bands show the 95% confidence interval for the mean response E[y|x] at each x. They do not bound individual predictions. The prediction interval (which would be wider) bounds where a new individual observation y is expected to fall. The bands are narrowest at x = &xbar; because the regression line is most constrained at the centroid of the data, and they widen at the extremes because small errors in slope compound as you move away from the center.

How does IRLS (robust regression) work?

Iteratively Reweighted Least Squares starts with an OLS estimate, computes residuals, and assigns each observation a weight that decreases with residual magnitude using the Huber function: w_i = min(1, δ/|e_i|). It then solves a weighted least squares problem with those weights, recomputes residuals, updates weights, and repeats until convergence (typically 20-50 iterations). Outliers receive small weights and have little influence on the final estimate.

Built by Michael Lip. Try the ML3X Matrix Calculator for interactive step-by-step solutions.