Calculate accuracy, precision, recall, F1, MCC from a confusion matrix. Also computes measurement error metrics (MAE, RMSE, MAPE) and simple proportion accuracy with confidence intervals.
Accuracy measures closeness to the truth, but the exact meaning depends on the task. In classification, accuracy is the share of correct predictions. In measurement, it describes how close observed values are to the true or reference values. This calculator covers both contexts, plus a simple correct-versus-total proportion mode.
For classification, you can enter a confusion matrix and get accuracy, precision, recall, specificity, F1 score, Matthews correlation coefficient, and related diagnostics. For measurement problems, you can enter paired actual and measured values to compute MAE, RMSE, MAPE, R², and other error summaries.
That makes the page useful for model evaluation, diagnostic testing, quality-control checks, and any workflow where one overall accuracy number is not enough on its own.
A single accuracy percentage can hide important failure modes. In imbalanced classification, a model can score highly while missing the class you actually care about. In measurement work, a low average error can still conceal occasional large misses.
This calculator puts the supporting metrics next to the headline number so you can see whether the result reflects balanced performance, bias, spread, or some mix of the three.
Classification Accuracy = (TP + TN) / (TP + FP + FN + TN) Precision = TP / (TP + FP) Recall (Sensitivity) = TP / (TP + FN) Specificity = TN / (TN + FP) F1 = 2 × Precision × Recall / (Precision + Recall) MCC = (TP×TN − FP×FN) / √((TP+FP)(TP+FN)(TN+FP)(TN+FN)) MAE = (1/n) Σ|yᵢ − ŷᵢ| RMSE = √((1/n) Σ(yᵢ − ŷᵢ)²) MAPE = (100/n) Σ|yᵢ − ŷᵢ|/|yᵢ|
Result: Accuracy = 98.50%, F1 = 0.923
With 90 true positives, 10 false positives, 5 false negatives, and 895 true negatives out of 1,000 observations, accuracy is 98.5%. Precision is 90% (90/100), recall is 94.7% (90/95), and F1 score is 0.923. MCC is 0.909, indicating excellent classification performance.
Classification accuracy counts discrete correct/incorrect predictions. Measurement accuracy quantifies how close continuous predictions are to true values. The metrics differ fundamentally: classification uses counts (TP, FP, FN, TN) while measurement uses deviations (errors). Both are called "accuracy" but require different evaluation frameworks.
In a dataset where 99% of samples are negative, a classifier that always predicts "negative" achieves 99% accuracy. This is the accuracy paradox — high accuracy despite useless predictions. Balanced accuracy, F1, and MCC all address this by accounting for both positive and negative class performance.
Always report multiple metrics. No single number captures all aspects of performance. Pair accuracy with precision/recall for classification, or MAE with MAPE for measurement. Use confusion matrix visualization to identify specific error patterns. Consider the costs of different error types in your domain.
Last updated:
Accuracy is the overall proportion of correct predictions (both positive and negative). Precision is the proportion of positive predictions that are actually positive. With a rare disease, a test that always says "negative" has high accuracy but zero precision.
Use F1 when classes are imbalanced or when false positives and false negatives have different costs. Use accuracy when classes are balanced and both types of errors are equally important.
Matthews Correlation Coefficient considers all four confusion matrix cells and produces a balanced measure even with imbalanced datasets. Unlike accuracy or F1, it's high only if the classifier does well on both positive and negative classes.
MAE gives equal weight to all errors. RMSE squares errors first, penalizing large errors disproportionately. If large errors are particularly undesirable, use RMSE. If all errors matter equally, use MAE.
R² = 1 means perfect prediction (all measured values exactly match actual). R² = 0 means the model is no better than predicting the mean. R² can be negative if predictions are worse than the mean.
The confidence interval width depends on accuracy × (1 − accuracy) / n. For 95% CI within ±2%, you need roughly n = accuracy × (1 − accuracy) × (1.96/0.02)² ≈ 2,400 for 50% accuracy, fewer for higher accuracy.