Calculate covariance, Pearson correlation, R², and regression line for paired data. Includes scatter plot, cross-product table, and correlation gauge.
The covariance calculator measures how two variables move together. Positive covariance means they tend to rise together, negative covariance means one tends to fall as the other rises, and a value near zero suggests little linear relationship.
This tool also computes Pearson correlation, R², and a simple regression line, so you can move from the raw covariance value to a standardized interpretation of strength and direction. The cross-product table and scatter plot make the calculation easier to inspect instead of treating it like a black-box result.
It is useful for introductory statistics, finance, data analysis, and any situation where you need to check whether two measured quantities are tracking each other in a meaningful way.
Covariance is usually the first numerical check for whether two variables move together, but by itself it is hard to compare across different units. Pairing it with correlation, R², and the regression line gives you both the raw relationship and the standardized one in the same view.
That combination is useful when you want to decide whether a relationship is merely directional, strong enough to matter, or stable enough to support a predictive line.
Cov(X,Y) = Σ(xᵢ − x̄)(yᵢ − ȳ) / (n−1). Pearson r = Cov(X,Y) / (sₓ × sᵧ). R² = r². Regression: y = a + bx where b = Cov(X,Y) / sₓ² and a = ȳ − b × x̄.
Result: Covariance = 122.14, r = 0.997
Height (X) and weight (Y) have a very strong positive correlation (r = 0.997). The covariance of 122.14 indicates they increase together, with R² = 0.994 meaning 99.4% of weight variance is explained by height in this sample. Regression line: y = −165.8 + 1.31x.
Harry Markowitz's Modern Portfolio Theory uses covariance matrices to quantify diversification. If two assets have negative covariance, combining them reduces portfolio risk. The optimal portfolio minimizes variance for a given expected return — all based on the covariance structure of the assets.
Principal Component Analysis (PCA) begins by computing the covariance matrix of all variables, then finds the eigenvectors (principal components) that capture the most variance. The first principal component points in the direction of maximum covariance. This technique powers dimensionality reduction in machine learning.
Pearson covariance/correlation is sensitive to outliers. Alternatives include: Spearman rank correlation (based on ranks, not values), Kendall tau (based on concordant/discordant pairs), and the Minimum Covariance Determinant estimator. For non-linear relationships, consider mutual information or distance correlation.
Last updated:
Covariance measures the direction (positive/negative) and magnitude of the linear relationship, but its value depends on the scales of X and Y. Correlation standardizes covariance by dividing by the product of standard deviations, giving a dimensionless value between −1 and +1. Use correlation to compare relationships across different datasets.
R² (coefficient of determination) is the proportion of variance in Y explained by the linear relationship with X. An R² of 0.80 means 80% of the variation in Y can be predicted from X. The remaining 20% is unexplained variance (noise, other factors, or non-linear effects).
Yes — zero covariance means no linear relationship between X and Y. However, there could still be a non-linear relationship (like a U-shape or circle). Always check the scatter plot. Note that independent variables always have zero covariance, but zero covariance does not guarantee independence.
Use sample covariance (n−1 denominator) in almost all cases — whenever your data is a subset of a larger population. Use population covariance (n denominator) only when you have data for the entire population. The difference matters most for small sample sizes.
In finance, covariance between asset returns determines portfolio diversification benefits (Markowitz theory). In machine learning, the covariance matrix drives PCA (principal component analysis). In science, it quantifies how two measurements relate. In quality control, it helps identify related process variables.
The regression line y = a + bx is the best-fit straight line through the scatter plot (minimizing squared vertical distances). The slope (b) tells you: for each 1-unit increase in X, Y changes by b units. The intercept (a) is the predicted Y when X = 0.