Calculate MCC from confusion matrix with full metrics: accuracy, F1, precision, recall, specificity, Cohen's kappa, informedness, markedness, and visual confusion matrix.
The Matthews Correlation Coefficient (MCC) is a balanced way to score binary classifiers, especially when the classes are imbalanced and accuracy alone can be misleading.
This calculator takes a confusion matrix and reports MCC alongside the other metrics people commonly compare against it: accuracy, balanced accuracy, precision, recall, specificity, F1, Cohen's kappa, and related rates. That makes it easier to see why two models with similar accuracy can still have very different practical quality.
MCC ranges from −1 to +1, where +1 is perfect agreement, 0 is no useful discrimination, and −1 means the predictions are effectively inverted.
MCC is useful because it stays informative when the class balance is skewed and when a model can game accuracy by overpredicting the majority class. Comparing it directly against accuracy, F1, and balanced accuracy helps show whether the classifier is genuinely useful or only superficially impressive.
MCC = (TP·TN − FP·FN) / √((TP+FP)(TP+FN)(TN+FP)(TN+FN)). Range: [−1, +1]. MCC = 0 = random. MCC = +1 = perfect.
Result: MCC = 0.7528, Accuracy = 87.5%, F1 = 0.8718, Precision = 89.47%, Recall = 85.0%
MCC of 0.75 indicates strong agreement between predictions and reality. Despite 12.5% error rate, the classifier correctly handles both positives and negatives, with slightly better specificity than sensitivity.
Consider a disease affecting 1% of patients. A classifier that always predicts "healthy" achieves 99% accuracy — impressive-looking but useless. Its MCC is exactly 0, correctly indicating no discriminative ability. This example explains why MCC has become the recommended primary metric in machine learning competitions and medical AI.
MCC is the Pearson correlation between actual and predicted binary labels (encoded 0/1). This means all properties of Pearson correlation apply: MCC = +1 implies perfect prediction, −1 implies perfect inverse prediction, and 0 implies independence. The formula (TP·TN − FP·FN)/√((TP+FP)(TP+FN)(TN+FP)(TN+FN)) is equivalent to Pearson r on binary variables.
MCC gives one number but hides the precision-recall tradeoff. If FP and FN have very different costs — false positive in cancer screening vs. false negative — you need precision and recall separately. ROC-AUC shows performance across all thresholds. Use MCC as the primary evaluation metric, then drill into precision, recall, and domain-specific costs for deployment decisions.
Last updated:
If 95% of samples are negative, predicting "always negative" gives 95% accuracy but MCC = 0 (no useful discrimination). MCC requires good performance on both classes, making it impossible to game with trivial strategies.
MCC = √(Informedness × Markedness) when both are positive. It's also the geometric mean of the regression coefficients of the problem and its dual. MCC incorporates all four confusion matrix cells equally.
Yes. MCC generalizes to multi-class problems using the full confusion matrix rather than only the four binary cells. The formula is more involved, but the same idea holds: it measures agreement between predicted and true labels while accounting for class imbalance.
MCC = 0 when TP·TN = FP·FN. This occurs for random guessing, "always positive" (TN=FN=0), or "always negative" (TP=FP=0) strategies. Any strategy that ignores the true labels gives MCC = 0.
Both measure binary agreement beyond chance, but MCC is symmetric and treats both classes equally. Kappa can be biased by prevalence and marginal distributions. MCC is generally preferred in machine learning; kappa in inter-rater reliability studies.
No universal threshold exists, but MCC > 0.3 suggests the classifier adds value beyond random, MCC > 0.5 indicates moderate utility, and MCC > 0.7 is considered strong. The required threshold depends on the cost of false positives vs. false negatives in your domain.