Residual Analysis Calculator

Compute residuals, standardized residuals, leverage, Cook's distance, Durbin-Watson, skewness, kurtosis, and outlier detection for regression diagnostics.

About the Residual Analysis Calculator

Fitting a regression line is only the first step. Residual analysis checks whether the model is actually behaving like a usable regression rather than simply producing a high R².

This calculator reports raw residuals, standardized residuals, leverage, Cook's distance, and summary diagnostics such as Durbin-Watson, skewness, and kurtosis. Together those outputs help you look for common failure modes: curvature, changing variance, autocorrelation, and influential points.

The goal is not just to identify a line, but to see whether the assumptions behind that line are holding up once you inspect the errors directly.

Why Use This Residual Analysis Calculator?

Residual diagnostics matter because a visually poor model can still produce an impressive summary statistic. Looking at residual shape, influence, and correlation is often what tells you whether to transform variables, add curvature, or question a few points before you trust the fit.

How to Use This Calculator

  1. Enter X values and corresponding Y values (comma-separated).
  2. Or click a preset to load diagnostic scenarios.
  3. Set the outlier threshold for standardized residuals (default 2).
  4. Review the diagnostic output cards (RMSE, Durbin-Watson, etc.).
  5. Examine the residual table for outliers and influential points.
  6. Check Cook's distance — values > 1.0 indicate highly influential observations.
  7. Use the diagnostic reference table to interpret each metric.

Formula

Residual: eᵢ = yᵢ − ŷᵢ. Standardized: eᵢ* = eᵢ / (s√(1−hᵢᵢ)). Leverage: hᵢᵢ = 1/n + (xᵢ−x̄)²/Sxx. Cook's D: Dᵢ = eᵢ*²·hᵢᵢ / (p(1−hᵢᵢ)). Durbin-Watson: d = Σ(eᵢ−eᵢ₋₁)²/Σeᵢ².

Example Calculation

Result: R² = 0.9997, RMSE = 0.117, Durbin-Watson = 2.14, all |std. residuals| < 2.0, max Cook's D = 0.32

Residuals show no pattern, Durbin-Watson near 2.0 (no autocorrelation), no outliers or influential points. This is a healthy regression with all assumptions met.

Tips & Best Practices

Regression Assumptions and Residuals

OLS regression assumes: (1) Linearity — the true relationship is linear. (2) Independence — residuals are uncorrelated. (3) Homoscedasticity — residual variance is constant. (4) Normality — residuals are normally distributed. Each assumption maps to specific diagnostic tests.

Linearity: Plot residuals vs. predicted values. Random scatter = good. Curves = consider polynomial terms. Independence: Durbin-Watson tests first-order serial correlation. Homoscedasticity: Look for fan shapes in residual plots. Normality: Check skewness and kurtosis.

Influential Points vs. Outliers

An outlier has a large residual — the model predicts poorly for that point. A high-leverage point has an extreme X value. An influential point changes the regression substantially when removed. A point can be high-leverage without being influential (if it falls on the trend), or an outlier without being influential (if leverage is low). Cook's distance captures the combined effect.

What To Do When Diagnostics Fail

Non-linearity: Add polynomial terms or transform variables. Heteroscedasticity: Use weighted least squares or robust standard errors. Autocorrelation: Use generalized least squares or add lag terms. Non-normality: Transform Y (log, sqrt) or use robust regression. Outliers: Investigate data quality, use robust methods (LAD, Huber), or report with and without.

Sources & Methodology

Last updated:

Frequently Asked Questions

What's the difference between raw and standardized residuals?

Raw residuals (eᵢ = yᵢ − ŷᵢ) retain Y-units. Standardized residuals divide by estimated standard deviation accounting for leverage, converting to a unit-free scale where values beyond ±2 indicate potential outliers.

What does the Durbin-Watson statistic mean?

DW tests for first-order autocorrelation in residuals. DW ≈ 2 means no autocorrelation. DW << 2 suggests positive autocorrelation (consecutive residuals similar). DW >> 2 suggests negative autocorrelation (consecutive residuals alternate sign).

When is Cook's distance concerning?

The traditional rule: Cook's D > 1 is influential. A stricter rule uses D > 4/n. Remove or investigate high-Cook's-D points — they may be data errors, outliers, or genuinely different observations that shouldn't be modeled together.

What does high leverage mean?

Leverage measures how far xᵢ is from x̄. Extreme X values have high leverage: they have outsized potential to pull the regression line. High leverage isn't always bad — compare Cook's D to see if the point actually affects the regression.

What if residuals aren't normally distributed?

Non-normal residuals don't affect coefficient estimates but do affect confidence intervals and p-values. Check skewness (should be near 0) and kurtosis (should be near 0 for excess kurtosis). With n > 30, the Central Limit Theorem provides some protection.

How do I detect heteroscedasticity?

Look for a fan or funnel shape in the residual visual — residuals getting larger (or smaller) as X increases. Our visual bars show this pattern clearly. Formal tests include Breusch-Pagan and White's test.

Related Pages