Container Startup Time Calculator
Estimate container startup time from image pull, layer extraction, runtime initialization, and application boot time.
Calculate Kubernetes HPA pod counts based on CPU/memory thresholds, current utilization, and scaling targets.
| Utilization | Desired Pods | Clamped | New Util | Monthly Cost |
|---|---|---|---|---|
| 30% | 2.00 | 2.00 | 45.00% | $102.20 |
| 50% | 3.00 | 3.00 | 50.00% | $153.30 |
| 70% | 5.00 | 5.00 | 42.00% | $255.50 |
| 80% | 5.00 | 5.00 | 48.00% | $255.50 |
| 90% | 6.00 | 6.00 | 45.00% | $306.60 |
| 95% | 6.00 | 6.00 | 47.50% | $306.60 |
| 100% | 6.00 | 6.00 | 50.00% | $306.60 |
| Parameter | Recommended | Reason |
|---|---|---|
| Target CPU | 50โ70% | Leaves headroom for burst traffic |
| Min Replicas | โฅ 2 | High availability (survives pod failure) |
| Max Replicas | Budget-based | Prevents runaway scaling costs |
| Scale-down stabilization | 300s | Prevents flapping on variable load |
| Scale-up stabilization | 0โ60s | React quickly to traffic spikes |
| Metric type | CPU + custom | CPU alone misses memory-bound workloads |
| Workload | CPU (m) | Memory (Mi) | ~$/hr |
|---|---|---|---|
| Micro service | 100โ250 | 128โ256 | $0.02โ0.04 |
| Web server | 250โ500 | 256โ512 | $0.04โ0.07 |
| API gateway | 500โ1000 | 512โ1024 | $0.07โ0.14 |
| Worker / batch | 1000โ2000 | 1024โ4096 | $0.14โ0.35 |
| ML inference | 2000โ4000 | 4096โ8192 | $0.35โ0.70 |
The Kubernetes Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pod replicas based on observed CPU utilization, memory usage, or custom metrics. Understanding how HPA calculates desired replica count is essential for reliable autoscaling.
The HPA formula is: desiredReplicas = ceil(currentReplicas ร (currentMetricValue / desiredMetricValue)). For example, if 3 pods are running at 80% CPU with a target of 50%, HPA desires ceil(3 ร 80/50) = ceil(4.8) = 5 pods.
This calculator helps you predict HPA behavior under different load scenarios, set appropriate min/max replica bounds, and choose target utilization thresholds that balance responsiveness with cost efficiency.
Misconfigured HPA leads to either under-scaling (outages) or over-scaling (wasted resources). This calculator helps predict pod counts at different utilization levels for optimal HPA configuration.
Desired Replicas = ceil(current_replicas ร (current_utilization / target_utilization))
Clamped = clamp(desired, min_replicas, max_replicas)
Scale Factor = current_utilization / target_utilizationResult: 5 desired pods (scale up by 2)
Scale factor: 80% / 50% = 1.6. Desired: ceil(3 ร 1.6) = ceil(4.8) = 5 pods. Clamped between min 2 and max 20: 5 pods. HPA will scale from 3 to 5 pods to bring average utilization back toward 50%.
HPA runs a control loop every 15 seconds (configurable). It queries the metrics API for current utilization, computes the desired replica count, and applies the change (subject to stabilization windows). The algorithm is simple but the interactions with pod lifecycle, resource requests, and custom metrics create complexity.
Min replicas should match your availability requirements: 2 for basic HA, 3 for zone redundancy, more for high-traffic services. Max replicas should reflect your budget ceiling and cluster capacity. Setting max too high risks exhausting cluster resources.
CPU is a lagging indicator. By the time CPU spikes, requests may already be queuing. Custom metrics like requests-per-second, queue depth, or in-flight connections are leading indicators that enable proactive scaling before performance degrades.
Last updated:
A target of 50โ60% is common for latency-sensitive services, leaving headroom for spikes. CPU-bound batch processing can use 70โ80%. Lower targets scale more aggressively (more pods, higher cost). Higher targets risk under-provisioning during spikes.
HPA has a default stabilization window of 5 minutes for scale-down (configurable). It won't scale down if any metric calculation in the window suggested more replicas. This prevents flapping. Scale-up has no default window for fast response.
Standard HPA cannot scale to zero (min is 1). Kubernetes KEDA (Event Driven Autoscaler) supports scale-to-zero based on queue length, HTTP requests, or custom metrics. This is useful for batch workloads and cost optimization.
Configure HPA to scale on memory utilization instead of or in addition to CPU. Use the metrics API to define multiple scaling criteria. HPA will use the metric that suggests the highest replica count.
HPA calculates utilization relative to resource requests. If a pod requests 100m CPU and uses 80m, that's 80% utilization. Setting requests too high makes utilization appear low (under-scaling). Setting too low makes it appear high (over-scaling).
HPA scales horizontally (more pods). VPA scales vertically (more resources per pod). HPA is better for stateless workloads. VPA is better for single-instance stateful workloads. They can be used together with care, but don't use both for CPU.
Estimate container startup time from image pull, layer extraction, runtime initialization, and application boot time.
Estimate Docker image size from base image, dependencies, application code, and build artifacts. Optimize layer efficiency.
Calculate the true cost of each deployment including CI/CD compute, engineer time, rollback risk, and opportunity cost.