Skip to content

Node selector migration pattern

The Node selector migration pattern is a workflow used to facilitate zero-downtime upgrades of Kubernetes node pools within a Google Kubernetes Engine (GKE) environment^[GCP升级.md]. It achieves this by provisioning a temporary set of nodes and sequentially shifting workloads to this new infrastructure before upgrading the original node pool^[GCP升级.md].

Process Overview

This pattern follows a specific sequence of infrastructure changes and application updates, primarily orchestrated via Terraform and Kubernetes configuration adjustments^[GCP升级.md].

  1. Temporary Pool Creation: A temporary node pool (e.g., named temp) is created alongside the existing pool (e.g., named app)^[GCP升级.md].
  2. Migration (To Temporary): The application's configuration is updated to target the temporary pool using a nodeSelector^[GCP升级.md].
  3. Upgrade: The original node pool is then upgraded to the desired GKE version^[GCP升级.md].
  4. Restoration (To Original): The application's nodeSelector is reverted back to the original (now upgraded) node pool^[GCP升级.md].

Operational Details

The migration relies on modifying the nodeSelector field in Kubernetes manifests to route traffic^[GCP升级.md].

For example, the configuration is changed from targeting the existing pool:

      nodeSelector:
        pool: app
To targeting the temporary pool:
      nodeSelector:
        pool: temp

In scenarios involving stateful workloads like Kafka, the migration is performed iteratively to manage the transition safely^[GCP升级.md]. This involves rotating the nodeSelector for individual components (e.g., kafka1, kafka2, kafka3) one by one, with a suggested interval of approximately three minutes between each update^[GCP升级.md].

Infrastructure as Code (Terraform)

The infrastructure changes are managed through Terraform, typically involving the creation of a specific file (e.g., 11-app.tf) to define the temporary resources^[GCP升级.md]. Once the migration and upgrade are complete, the temporary resource definitions are typically backed up (renamed to .back) to clean up the active configuration^[GCP升级.md].

Sources

  • GCP升级.md