Why is the Bayesian average lower than the simple average?

The Bayesian average blends your data with a prior assumption (default: 3.0 stars with 100 reviews' weight). For items with few reviews, the result is pulled toward the prior. As reviews accumulate, the data overwhelms the prior and the Bayesian average converges to the simple average. This prevents a single 5-star review from ranking above a well-reviewed 4.5-star item.

What is the Wilson lower bound used for?

Wilson lower bound is ideal for ranking items by approval rate. It answers: "Given this sample size, what's the lowest percentage of positive ratings we can be 95% confident about?" A product with 10/10 positive ratings gets a lower Wilson score than one with 95/100, because the second has more evidence. Reddit uses a variant of this for comment ranking.

How does entropy relate to rating quality?

Entropy measures the spread of ratings across star levels. Low entropy means ratings cluster at one level (strong consensus — good or bad). High entropy means ratings are spread evenly (no consensus, controversial item). Maximum entropy occurs when each star level has exactly 20% of ratings.

What is polarity and why does it matter?

Polarity measures how bimodal the distribution is — how much of the ratings are at the extremes (1★ and 5★) versus the middle (2-4★). A highly polarized product has fans who love it and critics who hate it. The average might be 3 stars, but the experience is nothing like "average" — it depends on who you are.

How should I set the Bayesian prior weight?

Set it to the typical number of reviews for items in your category. If most items have ~200 reviews, use 200 as the prior. This ensures new items with few reviews aren't artificially inflated. IMDB uses approximately 25,000 as the prior for their Top 250 list.

Why might the median and mean disagree?

If ratings are skewed (e.g., mostly 5-star with some 1-star), the mean is pulled down by the low ratings while the median stays at 5. The median represents the "typical" review; the mean represents the overall balance. Large disagreements indicate a skewed distribution.

Five-Star Rating Calculator

Analyze star ratings with simple average, Bayesian average, Wilson confidence, distribution visualization, polarity detection, and entropy-based consensus metrics.

★★★★★ (5-star) count

★★★★ (4-star) count

★★★ (3-star) count

★★ (2-star) count

★ (1-star) count

Bayesian Prior Weight (reviews)

Simple Average

4.05 / 5

100 total ratings

Bayesian Average

3.53 / 5

Prior: 100 reviews at 3

Wilson Lower Bound

65.7%

95% CI lower bound for % positive (4-5★)

95% CI for Mean

3.83 – 4.27

SEM = 0.112

Standard Deviation

1.12

Moderate spread

Net Sentiment

40.0%

% 5★ minus % 1★

Rating Distribution

5 ★

45 (45.0%)

4 ★

30 (30.0%)

3 ★

15 (15.0%)

2 ★

5 (5.0%)

1 ★

5 (5.0%)

Detailed Analysis

Metric	Value	Interpretation
Simple Average	4.050	Excellent
Bayesian Average	3.525	Shrunk toward 3 with 100 prior
Median	4 ★	Middle rating
Mode	5 ★	Most common (45 ratings)
Std Deviation	1.117	Moderate agreement
Entropy	1.882 bits	81.1% of max spread
Polarity	50.0%	Somewhat polarized
Net Sentiment	40.0%	Strong positive
% Positive (4-5★)	75.0%
% Negative (1-2★)	10.0%
Wilson Lower	65.70%	Ranking score (confidence-adjusted)

Planning notes, formulas, and examples

About the Five-Star Rating Calculator

Five-star rating systems power decisions on Amazon, Yelp, Google, App Store, and countless other platforms — but a simple average can be deeply misleading. An item with one 5-star review isn't better than one with 1,000 reviews averaging 4.7 stars. This calculator goes far beyond the crude average to provide statistically rigorous rating analysis.

Three ranking methods are computed: the simple weighted average, the Bayesian average (IMDB-style, which pulls ratings toward a prior when review counts are low), and the Wilson lower bound (which gives a confidence-adjusted "worst reasonable case" score for ranking). Beyond numerical scores, the calculator measures rating consensus through standard deviation and entropy, detects polarized distributions, and computes a net sentiment score.

Whether you're evaluating products, ranking search results, comparing restaurants, or designing your own rating system, this calculator shows you what the star distribution actually reveals — and what a simple "4.2 out of 5" hides.

When This Page Helps

Every ecommerce platform, review site, and marketplace needs to rank items by ratings — and simple averages fail in predictable ways. This calculator demonstrates three industry-standard solutions (simple, Bayesian, Wilson) side by side, so platform designers can choose the right method and users can understand why ratings feel "off" sometimes.

The distribution visualization, polarity detection, and entropy metrics provide insights that no single number can capture. A "3.5-star" product could be mediocre (most ratings 3-4), controversial (split between 1 and 5), or barely-reviewed (one 3 and one 4). This calculator tells you which.

How to Use the Inputs

Enter the number of reviews for each star level (1-star through 5-star).
Use presets for common patterns: good restaurant, mixed product, polarized app.
Adjust the Bayesian prior weight to control how much small-count items are penalized.
Review three different average methods and their differences.
Examine the visual distribution bars to see the shape of ratings.
Check the detailed analysis table for consensus, polarity, and confidence metrics.

Formula used

Simple Average: Σ(star × count) / Σ(count)

Bayesian Average: (m × C + Σ(star × count)) / (m + Σ(count))
where m = prior review count, C = prior mean (typically 3.0)

Wilson Lower Bound (for % positive):
(p̂ + z²/2n − z√(p̂(1−p̂)/n + z²/4n²)) / (1 + z²/n)
where p̂ = proportion of 4-5★, z = 1.96 for 95% CI

Example Calculation

Result: Simple: 4.05/5, Bayesian: 3.57/5, Wilson: 67.7%, SD: 1.10, Net: +40%

With 100 total ratings weighted toward 5 and 4 stars, the simple average is 4.05. The Bayesian average (with 100-review prior at 3.0) pulls this down to 3.57, reflecting that 100 reviews provide moderate confidence. Wilson lower bound of 67.7% means we're 95% confident that at least 67.7% of future reviews will be positive (4-5★). SD of 1.10 indicates moderate consensus.

Tips & Best Practices

Simple average is misleading for items with fewer than 50 reviews — use Bayesian instead.
Wilson lower bound is best for ranking: it rewards both high ratings AND high volume.
High polarity (>40%) suggests the product is divisive — check if different segments react differently.
Entropy near maximum (2.32 bits) means ratings carry little information — no consensus exists.
Net sentiment >30% indicates strong positive reception; below −30% indicates serious problems.
Always look at the distribution shape, not just the number — 4.0 can look very different.

How Major Platforms Rank

IMDB uses a Bayesian average ("weighted rating") for its Top 250 list: WR = (v/(v+m)) × R + (m/(v+m)) × C, where v = votes, m ≈ 25,000, R = mean rating, C = mean across all films (~7.0). Amazon uses a proprietary system that factors in recency, verified purchases, and helpfulness votes alongside star counts. Reddit's "Best" comment sort uses Wilson confidence intervals, as described by Evan Miller's influential blog post.

The J-Curve Problem

Online ratings typically follow a J-shaped distribution: many 5-star ratings, gradually fewer 4, 3, 2, and then a bump at 1 star. This happens because satisfied customers leave reviews voluntarily (5★), dissatisfied customers complain (1★), but average-experience customers rarely bother. Any rating system must account for this selection bias.

Designing Fair Rating Systems

When designing a rating system, consider: (1) Bayesian averaging to handle cold starts, (2) recency weighting to reflect improving/declining quality, (3) credibility signals to weight verified purchasers higher, (4) display distribution bars (not just the number), and (5) enough volume before showing ratings publicly. Each design choice affects how users interpret and trust the system.

Sources & Methodology

Last updated: March 8, 2026

Frequently Asked Questions

The Bayesian average blends your data with a prior assumption (default: 3.0 stars with 100 reviews' weight). For items with few reviews, the result is pulled toward the prior. As reviews accumulate, the data overwhelms the prior and the Bayesian average converges to the simple average. This prevents a single 5-star review from ranking above a well-reviewed 4.5-star item.