Coefficient of Determination (R²) Calculator
Calculate R², adjusted R², SS decomposition (SST/SSR/SSE), F-statistic, standard error, and residual analysis. Full ANOVA decomposition with interpretation guide.
Compute residuals, standardized residuals, leverage, Cook's distance, Durbin-Watson, skewness, kurtosis, and outlier detection for regression diagnostics.
| X | Y | Ŷ | Residual | Std. Res. | Leverage | Cook\'s D | Visual |
|---|---|---|---|---|---|---|---|
| 1.00 | 2.1000 | 2.0382 | 0.0618 | 0.539 | 0.3455 | 0.0767 | |
| 2.00 | 3.9000 | 4.0364 | -0.1364 | -1.110 | 0.2485 | 0.2036 | |
| 3.00 | 6.2000 | 6.0345 | 0.1655 | 1.286 | 0.1758 | 0.1763 | |
| 4.00 | 8.0000 | 8.0327 | -0.0327 | -0.247 | 0.1273 | 0.0045 | |
| 5.00 | 10.1000 | 10.0309 | 0.0691 | 0.515 | 0.1030 | 0.0152 | |
| 6.00 | 11.8000 | 12.0291 | -0.2291 | -1.707 | 0.1030 | 0.1673 | |
| 7.00 | 14.1000 | 14.0273 | 0.0727 | 0.549 | 0.1273 | 0.0220 | |
| 8.00 | 15.9000 | 16.0255 | -0.1255 | -0.975 | 0.1758 | 0.1013 | |
| 9.00 | 18.2000 | 18.0236 | 0.1764 | 1.435 | 0.2485 | 0.3406 | |
| 10.00 | 20.0000 | 20.0218 | -0.0218 | -0.190 | 0.3455 | 0.0096 |
| Diagnostic | Good | Concerning | Indicates |
|---|---|---|---|
| Durbin-Watson | 1.5–2.5 | <1.5 or >2.5 | Autocorrelation in residuals |
| |Std. Residual| | <2 | >2 (outlier at >2) | Unusual observations |
| Leverage | <0.400 | >0.400 (2p/n) | Influential X position |
| Cook\'s D | <0.5 | >1.0 | Overall influence on regression |
| Skewness | |s|<0.5 | |s|>1.0 | Non-normality of residuals |
| Kurtosis | |k|<1.0 | |k|>2.0 | Heavy/light tail problems |
Fitting a regression line is only the first step. Residual analysis checks whether the model is actually behaving like a usable regression rather than simply producing a high R².
This calculator reports raw residuals, standardized residuals, leverage, Cook's distance, and summary diagnostics such as Durbin-Watson, skewness, and kurtosis. Together those outputs help you look for common failure modes: curvature, changing variance, autocorrelation, and influential points.
The goal is not just to identify a line, but to see whether the assumptions behind that line are holding up once you inspect the errors directly.
Residual diagnostics matter because a visually poor model can still produce an impressive summary statistic. Looking at residual shape, influence, and correlation is often what tells you whether to transform variables, add curvature, or question a few points before you trust the fit.
Residual: eᵢ = yᵢ − ŷᵢ. Standardized: eᵢ* = eᵢ / (s√(1−hᵢᵢ)). Leverage: hᵢᵢ = 1/n + (xᵢ−x̄)²/Sxx. Cook's D: Dᵢ = eᵢ*²·hᵢᵢ / (p(1−hᵢᵢ)). Durbin-Watson: d = Σ(eᵢ−eᵢ₋₁)²/Σeᵢ².Result: R² = 0.9997, RMSE = 0.117, Durbin-Watson = 2.14, all |std. residuals| < 2.0, max Cook's D = 0.32
Residuals show no pattern, Durbin-Watson near 2.0 (no autocorrelation), no outliers or influential points. This is a healthy regression with all assumptions met.
OLS regression assumes: (1) Linearity — the true relationship is linear. (2) Independence — residuals are uncorrelated. (3) Homoscedasticity — residual variance is constant. (4) Normality — residuals are normally distributed. Each assumption maps to specific diagnostic tests.
Linearity: Plot residuals vs. predicted values. Random scatter = good. Curves = consider polynomial terms. Independence: Durbin-Watson tests first-order serial correlation. Homoscedasticity: Look for fan shapes in residual plots. Normality: Check skewness and kurtosis.
An outlier has a large residual — the model predicts poorly for that point. A high-leverage point has an extreme X value. An influential point changes the regression substantially when removed. A point can be high-leverage without being influential (if it falls on the trend), or an outlier without being influential (if leverage is low). Cook's distance captures the combined effect.
Non-linearity: Add polynomial terms or transform variables. Heteroscedasticity: Use weighted least squares or robust standard errors. Autocorrelation: Use generalized least squares or add lag terms. Non-normality: Transform Y (log, sqrt) or use robust regression. Outliers: Investigate data quality, use robust methods (LAD, Huber), or report with and without.
Last updated:
Raw residuals (eᵢ = yᵢ − ŷᵢ) retain Y-units. Standardized residuals divide by estimated standard deviation accounting for leverage, converting to a unit-free scale where values beyond ±2 indicate potential outliers.
DW tests for first-order autocorrelation in residuals. DW ≈ 2 means no autocorrelation. DW << 2 suggests positive autocorrelation (consecutive residuals similar). DW >> 2 suggests negative autocorrelation (consecutive residuals alternate sign).
The traditional rule: Cook's D > 1 is influential. A stricter rule uses D > 4/n. Remove or investigate high-Cook's-D points — they may be data errors, outliers, or genuinely different observations that shouldn't be modeled together.
Leverage measures how far xᵢ is from x̄. Extreme X values have high leverage: they have outsized potential to pull the regression line. High leverage isn't always bad — compare Cook's D to see if the point actually affects the regression.
Non-normal residuals don't affect coefficient estimates but do affect confidence intervals and p-values. Check skewness (should be near 0) and kurtosis (should be near 0 for excess kurtosis). With n > 30, the Central Limit Theorem provides some protection.
Look for a fan or funnel shape in the residual visual — residuals getting larger (or smaller) as X increases. Our visual bars show this pattern clearly. Formal tests include Breusch-Pagan and White's test.
Calculate R², adjusted R², SS decomposition (SST/SSR/SSE), F-statistic, standard error, and residual analysis. Full ANOVA decomposition with interpretation guide.
Calculate Pearson and Spearman correlation coefficients, R², covariance, significance testing, z-scores, and ranks from X/Y data. Full statistical analysis.
Fit Y = aX³ + bX² + cX + d with R², inflection point, critical points, end behavior analysis, and residual table. Comparison to linear fit.