Skip to content

HPA validation with load testing

HPA validation with load testing is the process of verifying that a Kubernetes HPA configuration correctly scales workloads in response to resource demands. By simulating user traffic, engineers can observe if the cluster creates new [[Pods]] or terminates existing ones according to defined metrics like CPU or memory utilization.

Prerequisites

Before validating HPA, the cluster must have the Metrics Server installed and ready. The Metrics Server collects resource metrics (such as CPU and memory usage) from the [[Pods]], which serve as the data source for the autoscaling decisions^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md].

You can verify if the Metrics Server is operational by running:

kubectl top node
A successful output will display the current CPU and memory usage of the nodes^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md].

HPA Configuration

A valid HPA resource requires specific configuration to function during a test. The spec section defines the scaling boundaries and the target metrics.

  • Scaling Boundaries: minReplicas and maxReplicas define the minimum and maximum number of replicas. The minimum cannot be set to 0^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md].
  • Target Metrics: The metrics section defines what triggers scaling. Common types include:
    • Resource: Metrics known to Kubernetes, such as CPU or memory. You can define averageUtilization (e.g., scale up if CPU exceeds 50% of the requested value)^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md].
    • Pods: Metrics specific to the pods (e.g., packets-per-second)^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md].
    • Object: Metrics from a specific Kubernetes object, such as an Ingress^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md].
    • External: Metrics from outside Kubernetes, allowing autoscaling based on events like queue length^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md].

Note: Starting from autoscaling/v2beta2, metrics server memory can be used as a scaling indicator^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md].

Validation Methodology

The most common method for validation is to deploy a sample application, expose it via a [[Service]], and then run a load generator to create traffic.

1. Deployment Setup

Create a [[Deployment]] with resource requests and limits defined. For example, a container might request 500m CPU and limit usage to 500m^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md]. The HPA is then configured to target this deployment.

2. Executing the Load Test

To validate the scaling behavior, you need to generate a load that exceeds the HPA's target threshold. This is typically done using a temporary Pod running a load-testing tool, such as busybox[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md].

For example, to generate a continuous HTTP load:

kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
This command creates a pod that sends requests to the service in a tight loop[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md].

3. Observing Results

While the load generator is running, monitor the HPA status in a separate terminal to see the scaling in action:

kubectl get hpa --watch
The output displays the TARGETS (current vs. target utilization) and REPLICAS count. As the load increases and CPU utilization crosses the target (e.g., 50%), you should see the replica count increase automatically to distribute the load[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md].

Scaling Behavior and Stabilization

The HPA includes specific behaviors to prevent "flapping" (rapid scaling up and down).

  • Stabilization Window: The behavior.scaleDown.stabilizationWindowSeconds parameter defaults to 300 seconds (5 minutes). This ensures that when the load drops, the autoscaler looks at the highest desired state within the past 5 minutes before scaling down, preventing excessive replica count jitter[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md].
  • Scaling Policies: You can define strict policies for scaling up or down using policies. For instance, you might limit scaling to a specific percentage or a fixed number of pods per periodSeconds[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md].

Considerations

While Kubernetes HPA provides basic container scaling, the default elasticity may not be sufficient for extreme scenarios, such as "hot news" events requiring scaling to thousands of machines within minutes^[400-devops__06-Kubernetes__k8s-ithelp__Day26__README.md]. In such cases, integrating with external monitoring systems like Prometheus is often necessary.

Sources