GKE upgrade workflow¶
The GKE upgrade workflow is a manual process designed to upgrade the Google Kubernetes Engine (GKE) version with minimal downtime. The strategy involves creating a temporary node pool to handle traffic during the upgrade, ensuring service continuity while the primary node pool is updated^[400-devops__05-Cloud-Provider__GCP升级.md].
Prerequisites and Version Selection¶
Before starting the upgrade, authentication and initialization are required.^[400-devops__05-Cloud-Provider__GCP升级.md]
- Authentication: Run
gcloud auth loginto access the GCP project.^[400-devops__05-Cloud-Provider__GCP升级.md] - State Refresh: Execute
terraform initfollowed byterraform refreshto confirm that the Terraform state matches the current GCP infrastructure.^[400-devops__05-Cloud-Provider__GCP升级.md] - Target Version: Identify the specific GKE version to upgrade to from the release notes (specifically looking for the
No channelversion).^[400-devops__05-Cloud-Provider__GCP升级.md]
Workflow Steps¶
The upgrade process follows a specific sequence to prepare the environment, shift workloads, upgrade the infrastructure, and then shift workloads back.
1. Environment Preparation¶
First, the Terraform configuration for the new node pool must be enabled and the target version defined.
- Restore Configuration: Rename the backup file
11-app.tf.backto11-app.tf.^[400-devops__05-Cloud-Provider__GCP升级.md] - Set Version: Update the variable
gke_versioninwin-env-project\dev\gcloud\01-variables.tfandwin-env-project\dev\gcloud\11-app.tfto the desired version (e.g.,"1.21.13-gke.900").^[400-devops__05-Cloud-Provider__GCP升级.md]
2. Temporary Node Pool Creation¶
A temporary node pool is provisioned to serve as a staging area for pods during the main upgrade.
- Modify App Config: In
win-env-project\dev\11-app.tf, change the resource name fromapptotemp. This configuration defines agoogle_container_node_poolfor the temporary environment.^[400-devops__05-Cloud-Provider__GCP升级.md] - Provision: Run
terraform planandterraform applyto create the temporary node pool namednode_pool_1(temp).^[400-devops__05-Cloud-Provider__GCP升级.md]
3. Migrate Workloads to Temporary Pool¶
With the temporary pool active, Kubernetes workloads are shifted away from the node pool that will be upgraded.
- Update Node Selectors: In
win-env-project\dev\kube, modify thenodeSelectorfrompool: apptopool: temp.^[400-devops__05-Cloud-Provider__GCP升级.md] - Apply Configuration: Use
03-apply-kube.shto apply these changes.^[400-devops__05-Cloud-Provider__GCP升级.md] - Rolling Update: For specific components like Kafka, switch them one by one (e.g., kafka1, kafka2, kafka3) to the
temppool with approximately 3-minute intervals to manage load.^[400-devops__05-Cloud-Provider__GCP升级.md]
4. Perform GKE Upgrade¶
Once the primary node pool (app) is drained of active workloads, the upgrade can be performed on the infrastructure level.
- Upgrade Module: Navigate to
win-env-project\dev\gcloud\modules\site\03-node-pool.tf.^[400-devops__05-Cloud-Provider__GCP升级.md] - Update Version: Modify the
google_container_node_pool"node_pool" resource to upgrade the GKE version.^[400-devops__05-Cloud-Provider__GCP升级.md]
5. Migrate Workloads Back to App Pool¶
After the upgrade is complete, workloads must be shifted back to the upgraded primary pool.
- Restore Node Selectors: Change the
nodeSelectorinwin-env-project\dev\kubeback frompool: temptopool: app.^[400-devops__05-Cloud-Provider__GCP升级.md] - Apply Configuration: Run
03-apply-kube.shagain.^[400-devops__05-Cloud-Provider__GCP升级.md] - Rolling Update: Switch components (e.g., kafka1, kafka2, kafka3) back to the
apppool one by one with ~3-minute intervals.^[400-devops__05-Cloud-Provider__GCP升级.md]
6. Cleanup and Post-Upgrade¶
Finalize the process by removing the temporary configuration and verifying network settings.
- Backup Config: Rename
11-app.tfback to11-app.tf.backto disable the temporary pool resource.^[400-devops__05-Cloud-Provider__GCP升级.md] - Verify Network: Manually check and adjust VCP network settings or Ops IPs if necessary, for instance, ensuring Jenkins connectivity (referenced via
kube.16888dev.com:30100).^[400-devops__05-Cloud-Provider__GCP升级.md]
Sources¶
^[400-devops__05-Cloud-Provider__GCP升级.md]
Related¶
- Terraform
- Kubernetes
- [[GKE]]
- [[Node pool]]
- Documentation Workflow