Plan an In-Place OS Upgrade for ACP Nodes
This guide helps cluster administrators plan an in-place operating system upgrade for nodes in self-built clusters. It focuses on the checks that must be completed before and after the operating system changes.
This guide is a reference checklist for maintenance windows. It does not replace the operating system vendor's upgrade documentation, and it does not guarantee that every operating system upgrade path is supported. Confirm the target operating system and kernel against the support matrix before you upgrade any node.
TOC
Scope and LimitationsBefore You UpgradeAdditional Checks for Control Plane NodesPerform the Operating System UpgradeAfter You UpgradeKnown High-Risk ScenariosOperation ChecklistScope and Limitations
Use this guide only when all of the following conditions are met:
- The cluster is an on-premises or self-built cluster managed by .
- You can log in to the target nodes and manage the node operating system.
- You have platform administrator permissions,
kubectladministrator permissions, and SSH or console access to each target node. - You can drain workload Pods from the target node before the operating system upgrade.
This guide does not apply to the following cluster types:
- Clusters that use Immutable Infrastructure. Apply operating system changes by replacing nodes with new images.
- Managed cloud Kubernetes clusters where the cloud provider manages the node or control plane operating system.
- Imported clusters where you cannot log in to nodes or control plane nodes.
Before You Upgrade
Complete the following checks before changing the operating system on any node.
Step 1: Confirm the target operating system and kernel
Confirm that the target operating system version, kernel version, kernel source, and CPU architecture are within the supported range. For the current support matrix and known restrictions, see Supported OS and Kernel Versions.
Apply these rules before the maintenance window:
- The operating system major and minor versions must match the supported matrix.
- The core kernel version must match the supported matrix. Only the build suffix can differ.
- The kernel must be the official kernel shipped by the operating system vendor.
- Ubuntu HWE kernels and third-party or custom-compiled kernels are not supported.
- If the target version is not in the supported matrix, contact technical support for compatibility confirmation before the upgrade.
Step 2: Check for conflicting packages
During an in-place operating system upgrade, the operating system package manager might install or overwrite container runtime, Kubernetes, or container network binaries. Before the upgrade, check and resolve packages that conflict with components. For the package lists and commands, see Remove Conflicting Packages.
If conflicting packages are found, prepare an application migration plan and back up the affected data before uninstalling them.
Step 3: Record the current runtime and node component versions
Record the versions before the operating system upgrade. Use the records for comparison after the node is upgraded.
Also record critical node configuration files that your operating system upgrade process might update, such as /etc/resolv.conf, /etc/fstab, systemd configuration files, and container runtime configuration files.
Step 4: Verify cluster capacity and drain the node
Confirm that the remaining nodes have enough capacity to run the Pods evicted from the target node. Then drain the node before the operating system upgrade.
You can use the console to evict Pods from the node. For the console operation, see Manage Nodes.
If you use kubectl, confirm the command options with your operations team before running it. For example:
Pods managed by DaemonSets are not evicted by the drain operation. Workloads that use local storage might lose local data after eviction. Confirm the workload impact before you proceed.
Additional Checks for Control Plane Nodes
Control plane nodes run components such as kube-apiserver, etcd, kube-scheduler, and kube-controller-manager. Upgrade control plane nodes one at a time, and verify cluster health after each node is upgraded.
Before upgrading a control plane node:
- Back up etcd data. For the supported backup mechanism and restore considerations, see etcd Backup and Restore.
- Confirm that the etcd cluster is healthy and that quorum can be maintained while one control plane node is unavailable.
- Confirm that the cluster has at least three control plane nodes when you are performing rolling maintenance on control plane nodes.
- If possible, validate the procedure on compute nodes first, and then proceed with control plane nodes.
You can use the following commands as references when checking control plane health:
If you use etcdctl on a control plane node, use the certificate paths from your environment:
Perform the Operating System Upgrade
Follow the operating system vendor's supported upgrade procedure for the target version. The exact package commands, repository configuration, and reboot requirements are determined by the operating system vendor and your organization's operating system maintenance policy.
During the operating system upgrade:
- Do not intentionally install container runtime, Kubernetes, or container network packages that conflict with components.
- Preserve node network configuration, DNS configuration, time synchronization configuration, and systemd service configuration unless the vendor procedure requires a change.
- Reboot the node when the vendor procedure requires it.
- Keep the node unschedulable until all post-upgrade checks pass.
After You Upgrade
Run the following checks on each node after the operating system upgrade and reboot are complete.
Step 1: Confirm the operating system and kernel
Verify that the node is running the expected operating system and kernel versions.
Compare the output with the supported matrix and your approved maintenance plan.
Step 2: Verify base node configuration
Verify that the operating system upgrade did not revert required node settings.
If swap is enabled after the operating system upgrade, disable it and remove the swap entry from /etc/fstab according to your operating system policy.
If SELinux or AppArmor is enabled after the operating system upgrade, disable it according to your operating system policy and the node requirements.
Step 3: Verify runtime and node component versions
Compare the runtime and node component versions with the pre-upgrade records.
Then verify that containerd is running:
If any binary was overwritten by an operating system package, stop the maintenance and contact technical support before you continue with other nodes.
Step 4: Verify time synchronization and DNS
Verify that time synchronization and DNS configuration were not changed by the operating system upgrade.
The time skew between nodes must be no more than 10 seconds. If the skew is larger than 10 seconds, synchronize time before restarting workloads on the node.
Step 5: Restart kubelet and verify node recovery
Restart kubelet after the operating system upgrade is complete and the base node checks pass.
Wait for the node to return to the Ready state.
When the node is healthy, resume scheduling.
You can also resume scheduling from the console. For the console operation, see Manage Nodes.
Step 6: Validate workload recovery
Verify the node status and workload placement after scheduling is resumed.
Then validate the business services that were affected by the node drain. Proceed to the next node only after the current node and related services are healthy.
Known High-Risk Scenarios
Operation Checklist
Use this checklist during the maintenance window and keep the completed record after the upgrade.