Skip to content

Horizontal Pod Autoscaler (HPA)

The Horizontal Pod Autoscaler (HPA) is a Kubernetes mechanism that automatically scales the number of Pod replicas based on observed resource utilization.^[400-devops__06-Kubernetes__k8s-ithelp__Day25__README.md] It operates at the Pod level, adjusting the replica count up or down to handle load fluctuations, ensuring that the system remains responsive without manual intervention.^[400-devops__06-Kubernetes__k8s-ithelp__Day25__README.md]

Prerequisites

HPA functionality relies on the Metrics Server, which must be installed in the Kubernetes cluster to collect resource Metrics (such as CPU and memory usage) that serve as the basis for autoscaling decisions.^[400-devops__06-Kubernetes__k8s-ithelp__Day25__README.md, 400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md] Additionally, target resources (like Deployments) must have explicit resource requests defined for the Metrics used in scaling calculations.^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md]

Scaling Behavior

HPA manages the number of replicas by periodically checking Metrics against defined thresholds.

Scale-up

When the HPA controller detects that resource usage exceeds the defined target, it increases the number of replicas in the Deployment.^[400-devops__06-Kubernetes__k8s-ithelp__Day25__README.md] The desired number of replicas is calculated using the formula: desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]^[400-devops__06-Kubernetes__k8s-ithelp__Day25__README.md]

Scale-down

Conversely, when resource usage falls below the target, the controller reduces the replica count to minimize resource waste.^[400-devops__06-Kubernetes__k8s-ithelp__Day25__README.md] To prevent "flapping" (rapid oscillation in replica counts), the system typically waits for a stabilization window (default 3–5 minutes) after a scaling event before checking Metrics again.^[400-devops__06-Kubernetes__k8s-ithelp__Day25__README.md, 400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md]

Replica Management

If a Deployment already has a specific replicas setting configured, the HPA will override this value to enforce its calculated scaling requirements.^[400-devops__06-Kubernetes__k8s-ithelp__Day25__README.md] The number of replicas is always constrained within the defined minReplicas and maxReplicas bounds (where minReplicas cannot be 0).^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md]

Configuration and API Versions

The HPA configuration is defined in a HorizontalPodAutoscaler resource.

API Version Changes

The HPA API has evolved significantly. While early versions (v1) only supported CPU utilization, later versions (v2beta2 and v2) introduced support for memory and custom Metrics.^[400-devops__06-Kubernetes__k8s-ithelp__Day25__README.md, 400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md] It is recommended to use autoscaling/v2 to access the latest features, such as memory-based autoscaling and behavior configuration.^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md]

Metrics Types

HPA supports several types of Metrics to trigger scaling:

  • Resource: Standard Kubernetes resources like CPU and memory. Targets can be defined as a percentage of usage (Utilization) or a specific value (AverageValue).^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md]
  • Pods: Metrics describing each Pod (e.g., packets-per-second), averaged across all Pods before comparison to the target.^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md]
  • Object: Metrics describing a single Kubernetes object, such as an Ingress (e.g., requests-per-second).^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md]
  • External: Metrics from outside Kubernetes, allowing integration with external monitoring systems (e.g., Message Queue length).^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md]

Behavior Configuration

The behavior field allows fine-tuning of the scaling logic to prevent rapid fluctuations: * Stabilization Window: Defines a time window (e.g., 300 seconds) used to calculate the desired state, preventing scaling down too quickly.^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md] * Policies: Constraints on the rate of scale-up or scale-down (e.g., limiting changes to a specific percentage or number of Pods per period).^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md]

Limitations and Considerations

While HPA provides essential autoscaling capabilities, the default scaling speed may not be sufficient for extreme, sudden spikes in traffic (e.g., scaling thousands of machines in minutes) often seen in major news events.^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md] In such scenarios, integrating with more advanced monitoring stacks (like Prometheus and Grafana) for custom Metrics is often necessary.^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md]

Furthermore, HPA is currently incompatible with Vertical Pod Autoscaler (VPA) for standard resource Metrics (CPU/memory) because they both attempt to control conflicting resource attributes.^[400-devops__06-Kubernetes__k8s-ithelp__Day25__README.md]

Sources

  • 400-devops__06-Kubernetes__k8s-ithelp__Day25__README.md
  • 400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md