Five-Star Rating Calculator

Analyze star ratings with simple average, Bayesian average, Wilson confidence, distribution visualization, polarity detection, and entropy-based consensus metrics.

Simple Average
4.05 / 5
100 total ratings
Bayesian Average
3.53 / 5
Prior: 100 reviews at 3
Wilson Lower Bound
65.7%
95% CI lower bound for % positive (4-5★)
95% CI for Mean
3.83 – 4.27
SEM = 0.112
Standard Deviation
1.12
Moderate spread
Net Sentiment
40.0%
% 5★ minus % 1★

Rating Distribution

5
45 (45.0%)
4
30 (30.0%)
3
15 (15.0%)
2
5 (5.0%)
1
5 (5.0%)

Detailed Analysis

MetricValueInterpretation
Simple Average4.050Excellent
Bayesian Average3.525Shrunk toward 3 with 100 prior
Median4 ★Middle rating
Mode5 ★Most common (45 ratings)
Std Deviation1.117Moderate agreement
Entropy1.882 bits81.1% of max spread
Polarity50.0%Somewhat polarized
Net Sentiment40.0%Strong positive
% Positive (4-5★)75.0%
% Negative (1-2★)10.0%
Wilson Lower65.70%Ranking score (confidence-adjusted)
Planning notes, formulas, and examples

About the Five-Star Rating Calculator

Five-star rating systems power decisions on Amazon, Yelp, Google, App Store, and countless other platforms — but a simple average can be deeply misleading. An item with one 5-star review isn't better than one with 1,000 reviews averaging 4.7 stars. This calculator goes far beyond the crude average to provide statistically rigorous rating analysis.

Three ranking methods are computed: the simple weighted average, the Bayesian average (IMDB-style, which pulls ratings toward a prior when review counts are low), and the Wilson lower bound (which gives a confidence-adjusted "worst reasonable case" score for ranking). Beyond numerical scores, the calculator measures rating consensus through standard deviation and entropy, detects polarized distributions, and computes a net sentiment score.

Whether you're evaluating products, ranking search results, comparing restaurants, or designing your own rating system, this calculator shows you what the star distribution actually reveals — and what a simple "4.2 out of 5" hides.

When This Page Helps

Every ecommerce platform, review site, and marketplace needs to rank items by ratings — and simple averages fail in predictable ways. This calculator demonstrates three industry-standard solutions (simple, Bayesian, Wilson) side by side, so platform designers can choose the right method and users can understand why ratings feel "off" sometimes.

The distribution visualization, polarity detection, and entropy metrics provide insights that no single number can capture. A "3.5-star" product could be mediocre (most ratings 3-4), controversial (split between 1 and 5), or barely-reviewed (one 3 and one 4). This calculator tells you which.

How to Use the Inputs

  1. Enter the number of reviews for each star level (1-star through 5-star).
  2. Use presets for common patterns: good restaurant, mixed product, polarized app.
  3. Adjust the Bayesian prior weight to control how much small-count items are penalized.
  4. Review three different average methods and their differences.
  5. Examine the visual distribution bars to see the shape of ratings.
  6. Check the detailed analysis table for consensus, polarity, and confidence metrics.
Formula used
Simple Average: Σ(star × count) / Σ(count) Bayesian Average: (m × C + Σ(star × count)) / (m + Σ(count)) where m = prior review count, C = prior mean (typically 3.0) Wilson Lower Bound (for % positive): (p̂ + z²/2n − z√(p̂(1−p̂)/n + z²/4n²)) / (1 + z²/n) where p̂ = proportion of 4-5★, z = 1.96 for 95% CI

Example Calculation

Result: Simple: 4.05/5, Bayesian: 3.57/5, Wilson: 67.7%, SD: 1.10, Net: +40%

With 100 total ratings weighted toward 5 and 4 stars, the simple average is 4.05. The Bayesian average (with 100-review prior at 3.0) pulls this down to 3.57, reflecting that 100 reviews provide moderate confidence. Wilson lower bound of 67.7% means we're 95% confident that at least 67.7% of future reviews will be positive (4-5★). SD of 1.10 indicates moderate consensus.

Tips & Best Practices

  • Simple average is misleading for items with fewer than 50 reviews — use Bayesian instead.
  • Wilson lower bound is best for ranking: it rewards both high ratings AND high volume.
  • High polarity (>40%) suggests the product is divisive — check if different segments react differently.
  • Entropy near maximum (2.32 bits) means ratings carry little information — no consensus exists.
  • Net sentiment >30% indicates strong positive reception; below −30% indicates serious problems.
  • Always look at the distribution shape, not just the number — 4.0 can look very different.

How Major Platforms Rank

IMDB uses a Bayesian average ("weighted rating") for its Top 250 list: WR = (v/(v+m)) × R + (m/(v+m)) × C, where v = votes, m ≈ 25,000, R = mean rating, C = mean across all films (~7.0). Amazon uses a proprietary system that factors in recency, verified purchases, and helpfulness votes alongside star counts. Reddit's "Best" comment sort uses Wilson confidence intervals, as described by Evan Miller's influential blog post.

The J-Curve Problem

Online ratings typically follow a J-shaped distribution: many 5-star ratings, gradually fewer 4, 3, 2, and then a bump at 1 star. This happens because satisfied customers leave reviews voluntarily (5★), dissatisfied customers complain (1★), but average-experience customers rarely bother. Any rating system must account for this selection bias.

Designing Fair Rating Systems

When designing a rating system, consider: (1) Bayesian averaging to handle cold starts, (2) recency weighting to reflect improving/declining quality, (3) credibility signals to weight verified purchasers higher, (4) display distribution bars (not just the number), and (5) enough volume before showing ratings publicly. Each design choice affects how users interpret and trust the system.

Sources & Methodology

Last updated:

Frequently Asked Questions

  • The Bayesian average blends your data with a prior assumption (default: 3.0 stars with 100 reviews' weight). For items with few reviews, the result is pulled toward the prior. As reviews accumulate, the data overwhelms the prior and the Bayesian average converges to the simple average. This prevents a single 5-star review from ranking above a well-reviewed 4.5-star item.