Pod Autoscaling Calculator

Calculate Kubernetes HPA pod counts based on CPU/memory thresholds, current utilization, and scaling targets.

%
%
millicores (1000m = 1 vCPU)
m
Mi
$
Desired Pods
5.00
Scale Up: +2 pods
Scale Factor
1.60ร—
ceil(3 ร— 1.60) = 5
New Utilization
0.48%
52.00% headroom remaining
Monthly Cost (Before)
$153.30
3 pods ร— $0.07/hr ร— 730 hrs
Monthly Cost (After)
$255.50
5 pods ร— $0.07/hr ร— 730 hrs
Cost Delta
$102.20
Additional monthly cost
Total CPU (After)
2,500.00m
5.00 pods ร— 500.00m each
Total Memory (After)
2,560.00 Mi
5.00 pods ร— 512.00 Mi each

Capacity Visualization

Before (3 pods โ€” 80% util)
After (5 pods โ€” 48% util)

Utilization Scenarios

UtilizationDesired PodsClampedNew UtilMonthly Cost
30%2.002.0045.00%$102.20
50%3.003.0050.00%$153.30
70%5.005.0042.00%$255.50
80%5.005.0048.00%$255.50
90%6.006.0045.00%$306.60
95%6.006.0047.50%$306.60
100%6.006.0050.00%$306.60

HPA Best Practices

ParameterRecommendedReason
Target CPU50โ€“70%Leaves headroom for burst traffic
Min Replicasโ‰ฅ 2High availability (survives pod failure)
Max ReplicasBudget-basedPrevents runaway scaling costs
Scale-down stabilization300sPrevents flapping on variable load
Scale-up stabilization0โ€“60sReact quickly to traffic spikes
Metric typeCPU + customCPU alone misses memory-bound workloads

Common Pod Sizes

WorkloadCPU (m)Memory (Mi)~$/hr
Micro service100โ€“250128โ€“256$0.02โ€“0.04
Web server250โ€“500256โ€“512$0.04โ€“0.07
API gateway500โ€“1000512โ€“1024$0.07โ€“0.14
Worker / batch1000โ€“20001024โ€“4096$0.14โ€“0.35
ML inference2000โ€“40004096โ€“8192$0.35โ€“0.70
Planning notes, formulas, and examples

About the Pod Autoscaling Calculator

The Kubernetes Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pod replicas based on observed CPU utilization, memory usage, or custom metrics. Understanding how HPA calculates desired replica count is essential for reliable autoscaling.

The HPA formula is: desiredReplicas = ceil(currentReplicas ร— (currentMetricValue / desiredMetricValue)). For example, if 3 pods are running at 80% CPU with a target of 50%, HPA desires ceil(3 ร— 80/50) = ceil(4.8) = 5 pods.

This calculator helps you predict HPA behavior under different load scenarios, set appropriate min/max replica bounds, and choose target utilization thresholds that balance responsiveness with cost efficiency.

When This Page Helps

Misconfigured HPA leads to either under-scaling (outages) or over-scaling (wasted resources). This calculator helps predict pod counts at different utilization levels for optimal HPA configuration.

How to Use the Inputs

  1. Enter the current number of running pods.
  2. Enter the current average CPU or memory utilization.
  3. Enter the HPA target utilization percentage.
  4. Enter the min and max replica bounds.
  5. Review the desired pod count and scaling behavior.
Formula used
Desired Replicas = ceil(current_replicas ร— (current_utilization / target_utilization)) Clamped = clamp(desired, min_replicas, max_replicas) Scale Factor = current_utilization / target_utilization

Example Calculation

Result: 5 desired pods (scale up by 2)

Scale factor: 80% / 50% = 1.6. Desired: ceil(3 ร— 1.6) = ceil(4.8) = 5 pods. Clamped between min 2 and max 20: 5 pods. HPA will scale from 3 to 5 pods to bring average utilization back toward 50%.

Tips & Best Practices

  • Set target utilization at 50โ€“70% to allow headroom for traffic spikes.
  • Always set min replicas โ‰ฅ 2 for high-availability workloads.
  • Set max replicas based on your budget and cluster capacity.
  • Use stabilization windows to prevent rapid scale-down flapping.
  • Consider container startup time when setting scaling parameters.
  • Combine CPU and memory metrics for more accurate scaling decisions.

The HPA Algorithm

HPA runs a control loop every 15 seconds (configurable). It queries the metrics API for current utilization, computes the desired replica count, and applies the change (subject to stabilization windows). The algorithm is simple but the interactions with pod lifecycle, resource requests, and custom metrics create complexity.

Choosing Min and Max Replicas

Min replicas should match your availability requirements: 2 for basic HA, 3 for zone redundancy, more for high-traffic services. Max replicas should reflect your budget ceiling and cluster capacity. Setting max too high risks exhausting cluster resources.

Custom Metrics for Smarter Scaling

CPU is a lagging indicator. By the time CPU spikes, requests may already be queuing. Custom metrics like requests-per-second, queue depth, or in-flight connections are leading indicators that enable proactive scaling before performance degrades.

Sources & Methodology

Last updated:

Frequently Asked Questions

  • A target of 50โ€“60% is common for latency-sensitive services, leaving headroom for spikes. CPU-bound batch processing can use 70โ€“80%. Lower targets scale more aggressively (more pods, higher cost). Higher targets risk under-provisioning during spikes.