What target utilization should I use?

A target of 50–60% is common for latency-sensitive services, leaving headroom for spikes. CPU-bound batch processing can use 70–80%. Lower targets scale more aggressively (more pods, higher cost). Higher targets risk under-provisioning during spikes.

How does HPA handle scale-down?

HPA has a default stabilization window of 5 minutes for scale-down (configurable). It won't scale down if any metric calculation in the window suggested more replicas. This prevents flapping. Scale-up has no default window for fast response.

Can HPA scale to zero?

Standard HPA cannot scale to zero (min is 1). Kubernetes KEDA (Event Driven Autoscaler) supports scale-to-zero based on queue length, HTTP requests, or custom metrics. This is useful for batch workloads and cost optimization.

What if my app uses more memory than CPU?

Configure HPA to scale on memory utilization instead of or in addition to CPU. Use the metrics API to define multiple scaling criteria. HPA will use the metric that suggests the highest replica count.

How do resource requests affect HPA?

HPA calculates utilization relative to resource requests. If a pod requests 100m CPU and uses 80m, that's 80% utilization. Setting requests too high makes utilization appear low (under-scaling). Setting too low makes it appear high (over-scaling).

What is the difference between HPA and VPA?

HPA scales horizontally (more pods). VPA scales vertically (more resources per pod). HPA is better for stateless workloads. VPA is better for single-instance stateful workloads. They can be used together with care, but don't use both for CPU.

Pod Autoscaling Calculator

Calculate Kubernetes HPA pod counts based on CPU/memory thresholds, current utilization, and scaling targets.

Current Replicas

Current Utilization

Target Utilization

Min Replicas

Max Replicas

Scaling Metric

CPU per Podmillicores (1000m = 1 vCPU)

Memory per Pod

Cost per Pod-Hour

Desired Pods

5.00

Scale Up: +2 pods

Scale Factor

1.60×

ceil(3 × 1.60) = 5

New Utilization

0.48%

52.00% headroom remaining

Monthly Cost (Before)

$153.30

3 pods × $0.07/hr × 730 hrs

Monthly Cost (After)

$255.50

5 pods × $0.07/hr × 730 hrs

Cost Delta

$102.20

Additional monthly cost

Total CPU (After)

2,500.00m

5.00 pods × 500.00m each

Total Memory (After)

2,560.00 Mi

5.00 pods × 512.00 Mi each

Capacity Visualization

Before (3 pods — 80% util)

After (5 pods — 48% util)

Utilization Scenarios

Utilization	Desired Pods	Clamped	New Util	Monthly Cost
30%	2.00	2.00	45.00%	$102.20
50%	3.00	3.00	50.00%	$153.30
70%	5.00	5.00	42.00%	$255.50
80%	5.00	5.00	48.00%	$255.50
90%	6.00	6.00	45.00%	$306.60
95%	6.00	6.00	47.50%	$306.60
100%	6.00	6.00	50.00%	$306.60

HPA Best Practices

Parameter	Recommended	Reason
Target CPU	50–70%	Leaves headroom for burst traffic
Min Replicas	≥ 2	High availability (survives pod failure)
Max Replicas	Budget-based	Prevents runaway scaling costs
Scale-down stabilization	300s	Prevents flapping on variable load
Scale-up stabilization	0–60s	React quickly to traffic spikes
Metric type	CPU + custom	CPU alone misses memory-bound workloads

Common Pod Sizes

Workload	CPU (m)	Memory (Mi)	~$/hr
Micro service	100–250	128–256	$0.02–0.04
Web server	250–500	256–512	$0.04–0.07
API gateway	500–1000	512–1024	$0.07–0.14
Worker / batch	1000–2000	1024–4096	$0.14–0.35
ML inference	2000–4000	4096–8192	$0.35–0.70

Planning notes, formulas, and examples

About the Pod Autoscaling Calculator

The Kubernetes Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pod replicas based on observed CPU utilization, memory usage, or custom metrics. Understanding how HPA calculates desired replica count is essential for reliable autoscaling.

The HPA formula is: desiredReplicas = ceil(currentReplicas × (currentMetricValue / desiredMetricValue)). For example, if 3 pods are running at 80% CPU with a target of 50%, HPA desires ceil(3 × 80/50) = ceil(4.8) = 5 pods.

This calculator helps you predict HPA behavior under different load scenarios, set appropriate min/max replica bounds, and choose target utilization thresholds that balance responsiveness with cost efficiency.

When This Page Helps

Misconfigured HPA leads to either under-scaling (outages) or over-scaling (wasted resources). This calculator helps predict pod counts at different utilization levels for optimal HPA configuration.

How to Use the Inputs

Enter the current number of running pods.
Enter the current average CPU or memory utilization.
Enter the HPA target utilization percentage.
Enter the min and max replica bounds.
Review the desired pod count and scaling behavior.

Formula used

Desired Replicas = ceil(current_replicas × (current_utilization / target_utilization))
Clamped = clamp(desired, min_replicas, max_replicas)
Scale Factor = current_utilization / target_utilization

Example Calculation

Result: 5 desired pods (scale up by 2)

Scale factor: 80% / 50% = 1.6. Desired: ceil(3 × 1.6) = ceil(4.8) = 5 pods. Clamped between min 2 and max 20: 5 pods. HPA will scale from 3 to 5 pods to bring average utilization back toward 50%.

Tips & Best Practices

Set target utilization at 50–70% to allow headroom for traffic spikes.
Always set min replicas ≥ 2 for high-availability workloads.
Set max replicas based on your budget and cluster capacity.
Use stabilization windows to prevent rapid scale-down flapping.
Consider container startup time when setting scaling parameters.
Combine CPU and memory metrics for more accurate scaling decisions.

The HPA Algorithm

HPA runs a control loop every 15 seconds (configurable). It queries the metrics API for current utilization, computes the desired replica count, and applies the change (subject to stabilization windows). The algorithm is simple but the interactions with pod lifecycle, resource requests, and custom metrics create complexity.

Choosing Min and Max Replicas

Min replicas should match your availability requirements: 2 for basic HA, 3 for zone redundancy, more for high-traffic services. Max replicas should reflect your budget ceiling and cluster capacity. Setting max too high risks exhausting cluster resources.

Custom Metrics for Smarter Scaling

CPU is a lagging indicator. By the time CPU spikes, requests may already be queuing. Custom metrics like requests-per-second, queue depth, or in-flight connections are leading indicators that enable proactive scaling before performance degrades.

Sources & Methodology

Last updated: February 8, 2026

Frequently Asked Questions

A target of 50–60% is common for latency-sensitive services, leaving headroom for spikes. CPU-bound batch processing can use 70–80%. Lower targets scale more aggressively (more pods, higher cost). Higher targets risk under-provisioning during spikes.