Alibaba Cloud Container Service for Kubernetes (ACK) regularly releases new OS image versions that provide new features, performance optimizations, and bug fixes. You should upgrade the OS images of your node pools in a timely manner. You can also switch OS types as needed. For example, you can replace an operating system that has reached its end of life (EOL) with a supported one.
For more information about the OS types and latest OS image versions supported by ACK and the limits on specific operating systems, see OS image release notes.
Notes
This operation replaces the system disks of nodes in batches to update the operating system. Do not store important data on the system disk. If you do, back up the data in advance. Data disks are not affected during the upgrade. We recommend that you perform this operation during off-peak hours.
When ACK upgrades a node by replacing its system disk, ACK drains the node. This process evicts pods from the node to other available nodes while respecting the Pod Disruption Budget (PDB). To ensure high availability, deploy your workloads with multiple replicas across different nodes. Also, configure a PDB for critical services to control the number of pods that can be disrupted at the same time.
The default timeout period for draining a node is 30 minutes. If pod migration is not complete within the timeout period, ACK stops the upgrade to ensure service stability.
When ACK upgrades a node by replacing its system disk, ACK re-initializes the node based on the current node pool configuration. This includes settings such as the logon method, labels, taints, OS image, and runtime version. To update the node pool configuration, you must edit the node pool. If you modify a node using other methods, the changes are overwritten during the upgrade.
If a pod on a node references a HostPath that points to the system disk, the data in the HostPath directory is lost after the system disk is replaced.
If your cluster uses other custom configurations, such as swap partitions, kubelet configurations modified by using the CLI, or runtime configurations, the cluster may fail to be updated or the custom configurations may be overwritten during the update.
By default, some ACK OS images use cgroup v2. For more information about cgroup v2, see cgroup versions.
If you have worker nodes that are not managed by a node pool, also known as standalone nodes, you must first migrate them to a node pool. For more information, see Migrate standalone nodes to a node pool.
Starting with ContainerOS 3.4.0, the system disk is set to read-only mode. To ensure that the system can start, you must attach at least one data disk. Therefore, when you upgrade from ContainerOS 3.3 to 3.4 or later, follow the procedure below. This does not affect other versions.
If you customize the GPU driver version for a node pool by specifying a version number or using an OSS URL, the OS image upgrade may cause an incompatibility between the OS and the driver. Select the latest driver from the List of NVIDIA driver versions supported by ACK.
Procedure
Follow these steps to update the OS image to the latest version or change the OS type. To avoid compatibility risks, run a precheck scan first.
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, find the cluster to manage and click its name. In the left navigation pane, choose .
On the Node Pools page, find the target node pool, and click
> Change Operating System in the Actions column.
Click Precheck to scan for potential risks and view the results.
Normal: The precheck is successful. You can proceed to the next step.
Abnormal: The precheck failed. This does not affect the running state of your cluster. Resolve the issues based on the recommended solutions.
After the precheck is successful, configure the following parameters and click Start Replacement.
Parameter
Description
Target Version
Select the target OS image and version.
Current Version
The current OS version.
Nodes To Update
Specify the nodes whose operating systems you want to replace. You can select all nodes or specific nodes.
Ignore Warnings
Specifies whether to proceed with the upgrade if the precheck reports node pool-level warnings. An example of a warning is that a pod in the node pool uses a HostPath that points to the system disk.
Batch Replacement Policy
Maximum Concurrent Nodes Per Batch
The system updates nodes in batches based on the maximum number of concurrent nodes that you specify.
Automatic Pause Policy
The pause policy for the OS replacement process.
Interval Between Batches
If you do not use the automatic pause policy, you can specify an interval between update batches. The interval can be 5 to 120 minutes.
Automatic Snapshot
This upgrade method replaces the system disk. If the system disk contains important business data, create a snapshot for the node before you update the OS. This lets you back up and restore data. You are charged for using snapshots. If a snapshot is no longer needed after the upgrade, delete the snapshot promptly.
ImportantTo avoid incompatibility risks when you change the OS, review the OS image release notes.
References
For more information about how to upgrade the kubelet and container runtime versions of a node pool, see Upgrade a node pool.
For more information about the process and logic of upgrades by replacing system disks, see Reference: In-place upgrades and upgrades by replacing system disks.