Skip to content

Kubernetes pod priority and preemption

Kubernetes pod priority and preemption is a mechanism introduced in Kubernetes (becoming Beta in version 1.11) designed to ensure that critical workloads are not indefinitely delayed by a lack of resources, while also allowing the cluster scheduler to manage resource contention more effectively.^[400-devops__06-Kubernetes__k8s-paas__原理及源码解析__Kubernetes调度机制.md]

Priority

The priority of a Pod is defined by a Kubernetes object called a PriorityClass.^[400-devops__06-Kubernetes__k8s-paas__原理及源码解析__Kubernetes调度机制.md]

Configuration

A PriorityClass is a non-namespaced object that specifies a priority value.^[400-devops__06-Kubernetes__k8s-paas__原理及源码解析__Kubernetes调度机制.md] Key fields include:

  • value: An integer defining the priority. A larger value indicates a higher priority^[400-devops__06-Kubernetes__k8s-paas__原理及源码解析__Kubernetes调度机制.md]. The maximum possible value is 1,000,000,000 (one billion).
  • globalDefault: A boolean field. If set to true, this value becomes the default priority for Pods that do not specify a priorityClassName.^[400-devops__06-Kubernetes__k8s-paas__原理及源码解析__Kubernetes调度机制.md] If set to false, Pods without a specific assignment default to a priority of 0.
  • description: A text field explaining the intended use of the priority class.

To assign a priority to a [[pods|Pod]], the priorityClassName field in the Pod's specification is set to the name of the chosen PriorityClass.^[400-devops__06-Kubernetes__k8s-paas__原理及源码解析__Kubernetes调度机制.md] Upon submission, the PriorityAdmissionController automatically sets the Pod's spec.priority field to the corresponding value.^[400-devops__06-Kubernetes__k8s-paas__原理及源码解析__Kubernetes调度机制.md]

Functionality

Scheduling Queue

The scheduler maintains a queue of Pods waiting to be scheduled.^[400-devops__06-Kubernetes__k8s-paas__原理及源码解析__Kubernetes调度机制.md] The priority value determines the order in which Pods exit this queue; higher priority Pods are dequeued earlier than lower priority ones, allowing them to undergo the scheduling process sooner.^[400-devops__06-Kubernetes__k8s-paas__原理及源码解析__Kubernetes调度机制.md]

Preemption

When a high-priority Pod fails to schedule (e.g., due to insufficient resources on any available node), the scheduler attempts to trigger preemption instead of simply putting the Pod into a pending state.^[400-devops__06-Kubernetes__k8s-paas__原理及源码解析__Kubernetes调度机制.md]

During this process, the scheduler searches for a node where the termination of one or more low-priority Pods would free up enough resources to accommodate the pending high-priority Pod.^[400-devops__06-Kubernetes__k8s-paas__原理及源码解析__Kubernetes调度机制.md]

If a suitable node and set of victims are found, the low-priority Pods are removed (preempted) to make room for the high-priority Pod.^[400-devops__06-Kubernetes__k8s-paas__原理及源码解析__Kubernetes调度机制.md]

This prevents critical services from being "shelved" indefinitely (waiting for a manual update or cluster state change) and allows them to "bump" less important workloads.^[400-devops__06-Kubernetes__k8s-paas__原理及源码解析__Kubernetes调度机制.md]

Sources