ACK node pools allow you to configure data disks for new nodes and initialize the disks from an existing data disk snapshot. The snapshot can contain all the necessary cached data, such as pre-downloaded container images, pre-trained large-scale machine learning models, required system images, and other critical static resources. When a new node joins the node pool, it automatically recovers the cached data from the snapshot. This significantly reduces the data loading time for the first run. This topic describes two scenarios that show how to use data disk snapshots to accelerate workload deployment and node initialization.
Scenario 1: Accelerate application startup
This scenario uses the Qwen-7B large model application workload as an example. It shows how to use an ECS instance to pre-load the ac2/qwen container image onto a data disk, cache the image, and create a snapshot. You can then use this snapshot as the data disk for new nodes in a node pool. When you schedule the workload to these new nodes, the application starts faster.
Accelerate application startup using a data disk snapshot
Step 1: Create an ECS instance with a data disk
Create an ECS instance that includes a data disk to create the data disk snapshot. For more information, see Create an instance using the wizard.
Determine the required data disk size to ensure that enough space is reserved.
After the instance is created, log on to the ECS instance and run the following command to view the disk information.
fdisk -l
Expected output:
Disk /dev/vda: 40 GiB, 42949672960 bytes, 83886080 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: F51132A7-67B1-4650-806D-FD0DExxxxxxx Device Start End Sectors Size Type /dev/vda1 2048 6143 4096 2M BIOS boot /dev/vda2 6144 415743 409600 200M EFI System /dev/vda3 415744 83886046 83470303 39.8G Linux filesystem Disk /dev/vdb: 40 GiB, 42949672960 bytes, 83886080 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes
The output shows that the block device /dev/vdb is attached to the ECS instance but is not mounted to any host directory.
Run the following commands to initialize the file system of the data disk and mount the data disk to a host directory. This topic uses the
/mnt/example
directory as an example.mkdir -p /mnt/example mkfs.ext4 /dev/vdb mount /dev/vdb /mnt/example
Step 2: Install the runtime
Check the required runtime version. The version must be the same as the runtime version of the cluster nodes that will use the snapshot.
Log on to the ACK console. In the navigation pane on the left, click Clusters.
On the Clusters page, find the cluster to manage and click its name. In the left-side navigation pane, choose .
Click the target node pool. On the Basic Information tab, view the Container Runtime. This topic uses containerd 1.6.36 as an example.
On the ECS instance, run the following commands to bind and mount a subdirectory.
mkdir -p /var/lib/containerd mkdir -p /mnt/example/containerd/ mount --bind /mnt/example/containerd/ /var/lib/containerd
Install the containerd 1.6.36 runtime. For more information, see Getting started with containerd.
Step 3: Download the container image and create a snapshot
On the ECS instance, run the following command to pull the required image.
ctr -n k8s.io images pull ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/qwen:7b-pytorch2.2.0.1-alinux3.2304
Expected output:
ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/qwen:7b-pytorch2.2.0.1-alinux3.2304: resolved |++++++++++++++++++++++++++++++++++++++| index-sha256:26f7ec425ca145b75edea364a51aa295587ddd5d65ac204e4e6da0e51bddb357: done |++++++++++++++++++++++++++++++++++++++| manifest-sha256:f6d5fb3791e6b6a3213b44ede5bec8e4a3b7fbd4ff4ba22ace00a10b83a4982a: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:bf2e4c7c66fc8341e90fb1a2f5f19c3b76b692054d54c42095d5fb9b18c1fac8: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:fec856349732affc95608fc6e36d9c9cb50247901696df2046a781a3969e5360: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:ed96823de6cfad0ef1b9de97e427f41160817c98e9bebf90ba8cc37992eabc96: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:9df511db38b7e057b13fa70a4f9e4a7e65e7efec867b2668908634181cac38a9: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:1a4373bb9d93a3b5ada44eabc5fc42a548124703fcba53e8085e3392c85649b1: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:5725de981ff2b5977566f75731fb218716625462d0bc5ee519b9923965b5a352: done |++++++++++++++++++++++++++++++++++++++| config-sha256:5d25192c5dd2dfe79ead390db8668baaf10667e5ae789420cda953291f91559f: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:af616670f1a58fbbccff2e7053aa93994bdcc539ab388b668afd8ddf8437ee3d: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:da5d933bec7b35e1ca604c002d06a18e3b3b5e52420edaa8e85e56e8853641b7: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:cbfd4016879986762938d503ee88832c0f32989c07a0e41966f0fdbdb4d82d9f: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:0c2891fbd11945ec68c8a936413f844957ece8196c595f64f683531649293f5b: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:ef52560218e4372dd5bc40de38a2032d66fad82129b39e622b020038220327fc: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:a5d4fb6719df2b4228e2ac875522cae82159c42fd5f60d4de4bf53ece0862368: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:30b9d466ccc9dfbe42ddfb86f57e526f095827cbff25aaa0ff98d861ad791f45: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:6df931e10a58a685f5cfb9bfc29e9127e1145cc3aa05209316edc820a6ebc00b: done |++++++++++++++++++++++++++++++++++++++| elapsed: 66.3s total: 780.8 (11.8 MiB/s) unpacking linux/amd64 sha256:26f7ec425ca145b75edea364a51aa295587ddd5d65ac204e4e6da0e51bddb357... done: 17.972544176s
Create a snapshot of the data disk of the ECS instance. For more information, see Create a snapshot.
Step 4: Select the data disk snapshot in an ACK node pool and add nodes
Create a node pool. During the creation process, add a data disk. Use the snapshot that you created in the previous step to create the disk. Set the expected number of nodes to 1.
ImportantThe data disk that uses the snapshot must be the last data disk. For more information about how acceleration works, see How data disk cache acceleration works.
Log on to a node in the node pool and run the following command to view the container images.
sudo crictl images
Expected output:
IMAGE TAG IMAGE ID SIZE ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/qwen 7b-pytorch2.2.0.1-alinux3.2304 5d25192c5dd2d 820MB .......
The output shows that the ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/qwen container image already exists.
Because the image is cached on the node, the Qwen-7B application workload quickly enters the running state after it is scheduled to the node.
Deploy the application without using a data disk snapshot for acceleration
You can create a Deployment by following the instructions in Create a stateless workload Deployment. During the creation process, select the ac2/qwen container image from Artifacts.
Scenario 2: Accelerate node initialization
This scenario describes how to accelerate node initialization. Node initialization is the process where a node changes from the Created state to the Ready state. The procedure is as follows:
Use an ECS instance to pre-load the kubelet and container images that are required by ACK onto a data disk.
Cache the kubelet and container images, and then create a snapshot.
Use this snapshot as the data disk for new nodes in a node pool to accelerate the node initialization process.
The node initialization times provided in this scenario are theoretical. The actual time depends on your operating environment.
Cache the kubelet required by ACK using a data disk snapshot
mkdir -p /mnt/example/ack
export KUBE_VERSION="1.32.1-aliyun.1"
export REGION="cn-hangzhou"
wget http://aliacs-k8s-$REGION.oss-$REGION-internal.aliyuncs.com/public/pkg/kubernetes/kubernetes-$KUBE_VERSION-linux-amd64.tar.gz
tar -xvf kubernetes-$KUBE_VERSION-linux-amd64.tar.gz
mv pkg/kubernetes /mnt/example/ack/
Cache the images required by ACK using a data disk snapshot
System component versions may change during upgrades. This can cause a mismatch between the container image version in the data disk cache and the version of the deployed image. If this happens, you must create a new data disk snapshot. For more information about component changes, see Release notes of ACK components and update the snapshot promptly.
Step 1: Export the required system container images
In the cluster where you want to use the data disk snapshot, create a new node and connect to the ECS instance.
On the cluster node, use the following script to export the required system images. After the script is executed, the
image_list.txt
file is generated in the current directory.#!/bin/bash # Set the output file path. OUTPUT_FILE="./image_list.txt" # Delete or create the output file. > "$OUTPUT_FILE" # Get the names and tags of all container images. images=$(crictl images --no-trunc|awk 'NR>1 {print $1 ":" $2}') # Traverse and export each image. for image in $images; do echo "$image" >> "$OUTPUT_FILE" done echo "All images have been exported to $OUTPUT_FILE"
Step 2: Create the snapshot and scale out cluster nodes
Create an ECS instance and install the container runtime. For more information, see Scenario 1, Step 1, and Step 2.
Upload the
image_list.txt
file that you generated in Step 1 to the ECS instance.On the ECS instance, run the following script to automatically pull all required system component container images. If the crictl command is missing, see Install crictl to install it.
#!/bin/bash # Set the input file path. INPUT_FILE="./image_list.txt" # Check if the file exists. if [ ! -f "$INPUT_FILE" ]; then echo "Error: File $INPUT_FILE does not exist!" exit 1 fi # Traverse each line (each image) in the input file. while IFS= read -r image; do if [ -n "$image" ]; then # Make sure the line is not empty. echo "Pulling image $image..." # Use crictl pull to pull the image. crictl pull "$image" if [ $? -eq 0 ]; then echo "Image $image pulled successfully!" else echo "Failed to pull image $image!" fi fi done < "$INPUT_FILE"
Stop the ECS instance and create a snapshot of the data disk. For more information, see Create a snapshot.
Create a node pool that meets the following requirements to configure the data disk snapshot. For more information, see Create a node pool.
Data Disk: Add a data disk and use the snapshot that you created to create the disk.
ImportantThe data disk that uses the snapshot must be the last data disk. For more information about how acceleration works, see How data disk cache acceleration works.
If you select Enable Image Accelerator when you create the node pool, the efficiency of data disk snapshot creation may be affected.
Custom Data: Add the command
touch /var/.skip-yum
to skip the yum source update when the node joins the node pool.Expected Number Of Nodes: Set this to 1.
Check the time it takes for a node that is accelerated by a data disk snapshot to change from the Created state to the Ready state. For more information, see View node events.
Cache ACK system component images without using a data disk snapshot
You can use a ContainerOS image to create cluster nodes. Then, you can observe the time it takes for a node that is not accelerated by a data disk snapshot to change from the Created state to the Ready state. For more information, see check the node's events.
How the data disk cache acceleration works
When you create a node pool, you can add a data disk. If the data disk is applied to the containerd directory and uses a data disk snapshot, the image cache from the snapshot is loaded into the containerd directory. When you create a workload, you can directly use the cached images. This reduces the application startup time.