What is the token bucket algorithm?

A virtual bucket holds tokens; each request consumes a token. Tokens refill at the rate limit speed. When empty, requests are rejected. The bucket size determines max burst. For 10 rps with a 50-token bucket, clients can burst 50 requests then sustain 10 rps.

What rate limit should I set?

Base it on: (1) backend capacity per consumer (total_capacity / expected_consumers × safety_margin), (2) typical client usage patterns (measure P95 request rates), (3) business requirements (premium tiers get higher limits). Start conservative and increase.

How do I handle rate limit errors?

Return HTTP 429 with a Retry-After header indicating when to retry. Include rate limit headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset. Client-side, implement exponential backoff with jitter.

Should I rate limit by user or IP?

Use both as separate layers. Per-user limits ensure fair access among authenticated clients. Per-IP limits protect against unauthenticated abuse and DDoS. Add a global limit as a circuit breaker for overall service protection.

What is the difference between rate limiting and throttling?

Rate limiting rejects excess requests immediately (429 response). Throttling slows them down by queuing or delaying responses. Throttling is better for user experience but harder to implement. Many systems use rate limiting with retry guidance.

How do rate limits work with API gateways?

API gateways (Kong, AWS API Gateway, Apigee) implement rate limiting at the edge, before requests reach your backend. This is more efficient and provides consistent enforcement across all API routes. Configure limits in the gateway, not in application code.

API Rate Limit Calculator

Calculate API rate limit budgets, burst allowances, and throttling thresholds for effective API traffic management.

Algorithm

Rate Limit per Consumer

rps

Burst Bucket SizeMax tokens / concurrent burst

API Consumers

Backend Capacity

rps

Rate Window

sec

Retry-After Header

sec

Total Sustained Load

50,000.00 rps

500 consumers x 100 rps each

Max Burst Load

100,000.00 req

Exceeds backend capacity!

Backend Utilization

2.50%

50,000.00 / 20,000.00 rps

Capacity Headroom

0.00%

OVERLOADED - add capacity

Bucket Refill Time

2.00 sec

Time to fully replenish 200 tokens

Max Safe Consumers

200.00

At 100 rps per consumer

Throttle Probability

0.80%

1,333.00 req/s throttled

Retry Amplification

267.00 rps

Retry storm from 5s Retry-After

Backend Utilization

2.50%

Consumer Tier Breakdown

Tier	Consumers	Share	Rate Limit	Total RPS	% of Backend
Free	300.00	0.60%	10.00 rps	3,000.00 rps	0.15%
Basic	125.00	0.25%	50.00 rps	6,250.00 rps	0.31%
Pro	50.00	0.10%	100.00 rps	5,000.00 rps	0.25%
Enterprise	25.00	0.05%	300.00 rps	7,500.00 rps	0.38%
Total (Tiered)	500.00	100%	-	21,750.00 rps	1.09%

Algorithm Comparison Reference

Algorithm	Burst Friendly	Memory	Accuracy	Best For
Token Bucket	Yes	O(1)	High	General APIs
Sliding Window	Moderate	O(n)	Very High	Strict precision
Fixed Window	Edge burst	O(1)	Moderate	Simple rate limits
Leaky Bucket	No	O(1)	High	Smooth output rate

Planning notes, formulas, and examples

About the API Rate Limit Calculator

API rate limiting controls how many requests a client can make within a time window. It protects backend services from overload, ensures fair usage across clients, and prevents abuse. Proper rate limit design balances API usability (allowing legitimate bursts) with protection (preventing resource exhaustion).

This calculator helps API designers determine appropriate rate limits based on expected usage patterns, burst requirements, and infrastructure capacity. It models the token bucket algorithm — the most common rate limiting approach — which allows bursts up to a bucket size while enforcing a sustained request rate.

Getting rate limits right is critical: too restrictive and you frustrate legitimate users; too permissive and you risk overloading your service during traffic spikes or abuse scenarios.

When This Page Helps

Rate limits that are too tight frustrate legitimate users; too loose risks service overload. This calculator helps find the right balance based on your capacity and usage patterns.

How to Use the Inputs

Enter the sustained request rate limit (requests per second).
Enter the burst bucket size (max concurrent burst requests).
Enter the number of API consumers.
Enter your backend's maximum request capacity.
Review the total sustained load and burst capacity analysis.

Formula used

Total Sustained Load = consumers × rate_limit_per_consumer
Max Burst = consumers × burst_bucket_size
Headroom = (backend_capacity − total_sustained) / backend_capacity × 100
Bucket Refill Time = burst_bucket / rate_limit seconds

Example Calculation

Result: 1,000 sustained rps, 5,000 max burst, 50% headroom

Sustained: 100 consumers × 10 rps = 1,000 rps. Max burst: 100 × 50 = 5,000 requests simultaneously. Backend capacity: 2,000 rps. Headroom: (2,000 − 1,000) / 2,000 = 50%. Burst could exceed capacity — consider reducing burst bucket or adding queuing.

Tips & Best Practices

Use the token bucket algorithm for rate limiting — it handles bursts naturally.
Set burst bucket to 5–10x the per-second rate for API usability.
Always return 429 Too Many Requests with Retry-After header.
Monitor rate limit hits — if >5% of requests are throttled, limits may be too tight.
Implement per-user, per-IP, and global rate limits as separate layers.
Document rate limits clearly in API documentation with examples.

Designing Rate Limits

Effective rate limit design starts with capacity planning: determine your backend's maximum request rate, divide by expected consumers (with a safety margin), and set per-consumer limits accordingly. Add burst allowance (5–10x sustained rate) for UX and reduce if total burst exceeds capacity.

Tiered Rate Limits

Many APIs offer tiered rate limits: free tier (100 rps), standard (1,000 rps), enterprise (10,000 rps). Tiering aligns rate limits with business value and encourages upgrades. Implement using API keys mapped to tier-specific token buckets.

Monitoring and Tuning

Monitor: (1) rate limit hit rate (% requests throttled), (2) P99 request rate per consumer, (3) backend utilization. If throttle rate exceeds 5%, limits may be too restrictive. If backend utilization exceeds 70% during normal traffic, limits may be too permissive.

Sources & Methodology

Last updated: February 8, 2026

Frequently Asked Questions

A virtual bucket holds tokens; each request consumes a token. Tokens refill at the rate limit speed. When empty, requests are rejected. The bucket size determines max burst. For 10 rps with a 50-token bucket, clients can burst 50 requests then sustain 10 rps.

API Rate Limit Calculator

Backend Utilization

About the API Rate Limit Calculator

When This Page Helps

How to Use the Inputs

Example Calculation

Tips & Best Practices

Designing Rate Limits

Tiered Rate Limits

Monitoring and Tuning

Sources & Methodology

Frequently Asked Questions

More in this topic

Webhook Retry Cost Calculator

Network Latency & Throughput Calculator

DNS TTL Impact Calculator