All Products
Search
Document Center

Container Service for Kubernetes:Enable scheduling features

Last Updated:Aug 14, 2025

When you deploy GPU computing jobs in an ACK managed cluster Pro, you can assign scheduling property labels to GPU nodes. These labels, such as exclusive, shared, and topology-aware scheduling, and card model labels, help optimize resource utilization and enable precise application scheduling.

Scheduling labels

GPU scheduling labels identify GPU models and resource allocation policies. This enables fine-grained resource management and efficient scheduling.

Scheduling feature

Label value

Scenarios

Exclusive scheduling (Default)

ack.node.gpu.schedule: default

 High-performance jobs that require exclusive use of an entire GPU card, such as model training and HPC.

Shared scheduling

ack.node.gpu.schedule: cgpu

ack.node.gpu.schedule: core_mem

ack.node.gpu.schedule: share

ack.node.gpu.schedule: mps

Improves GPU utilization. Suitable for scenarios where multiple lightweight jobs run concurrently, such as in multitenancy or inference workloads.

  • cgpu: Shares computing power and isolates video memory. Based on Alibaba Cloud cGPU technology.

  • core_mem: Isolates both computing power and video memory.

  • share: Shares both computing power and video memory with no isolation.

  • mps: Shares computing power and isolates video memory. Based on NVIDIA Multi-Process Service (MPS) isolation combined with Alibaba Cloud cGPU technology.

ack.node.gpu.placement: binpack

ack.node.gpu.placement: spread

This applies to optimizing the resource allocation policy for multiple GPU cards on a single node after the cgpu, core_mem, share, and mps shared scheduling features are enabled.

  • binpack: (Default) Compactly schedules pods on multiple cards. Fills one GPU with pods before assigning to the next. This reduces resource fragmentation and is ideal for scenarios that prioritize resource utilization or energy savings.

  • spread: Distributes pods across different GPUs. This reduces the impact of a single card failure and is suitable for high availability (HA) jobs.

Topology-aware scheduling

ack.node.gpu.schedule: topology

Automatically assigns pods to the GPU combination with the optimal communication bandwidth based on the physical GPU topology within a single node. Suitable for jobs that are sensitive to inter-GPU communication latency.

Card model scheduling

aliyun.accelerator/nvidia_name: <GPU_card_name>

Use with card model scheduling to set the video memory capacity and total number of GPU cards for a GPU job.
aliyun.accelerator/nvidia_mem: <video_memory_per_card>
aliyun.accelerator/nvidia_count: <total_number_of_GPU_cards>

Schedules jobs to nodes with specified GPU models or avoids nodes with specified models.

Enable scheduling features

Exclusive scheduling

If a node has no GPU scheduling labels, exclusive scheduling is enabled by default. In this mode, the node allocates GPU resources to pods in units of a single GPU.

If other GPU scheduling features are enabled, removing the labels does not restore exclusive scheduling. You must manually change the label value to ack.node.gpu.schedule: default to restore the exclusive scheduling feature.

Shared scheduling

Shared scheduling is supported only in ACK managed cluster Pro. For more information, see Limits.

  1. Install the ack-ai-installer shared scheduling component.

    1. Log on to the ACK console. In the navigation pane on the left, click Clusters.

    2. On the Clusters page, find the cluster you want and click its name. In the left-side navigation pane, choose Applications > Cloud-native AI Suite.

    3. On the Cloud-native AI Suite page, click Deploy. On the Deploy Cloud-native AI Suite page, select Scheduling Policy Extension (Batch Scheduling, GPU Sharing, GPU Topology Awareness).

      For more information about how to set the computing power scheduling policy for cGPU, see Install and use the cGPU component.
    4. On the Cloud-native AI Suite page, click Deploy Cloud-native AI Suite.

      On the Cloud-native AI Suite page, find the installed shared GPU component ack-ai-installer in the component list.

  2. Enable the shared scheduling feature.

    1. On the Clusters page, click the name of the target cluster. In the navigation pane on the left, choose Node Management > Node Pools.

    2. On the Node Pools page, click Create Node Pool, configure the node labels, and then click Confirm.

      You can keep the default settings for other configuration items. For more information about the scenarios for node labels, see Scheduling labels.
      • Configure basic shared scheduling.

        Click the Node Labels icon for Node Labels, set Key to ack.node.gpu.schedule, and select one of the following tag values: cgpu, core_mem, share, or mps (requires you to install the MPS Control Daemon component).

      • Configure multi-card shared scheduling.

        If a node has multiple GPUs, you can configure multi-card shared scheduling to optimize resource allocation.

        Click the Node Label icon for Node Labels, set the Key to ack.node.gpu.placement, and set the tag value to binpack or spread.

  3. Verify that shared scheduling is enabled.

    cgpu/share/mps

    Replace <NODE_NAME> with the name of your target node and run the following command to verify that cgpu, share, or mps shared scheduling is enabled for the node pool.

    kubectl get nodes <NODE_NAME> -o yaml | grep -q "aliyun.com/gpu-mem"

    Expected output:

    aliyun.com/gpu-mem: "60"

    If the value of the aliyun.com/gpu-mem field is not 0, cgpu, share, or mps shared scheduling is enabled.

    core_mem

    Replace <NODE_NAME> with the name of your target node and run the following command to verify that core_mem shared scheduling is enabled for the node pool.

    kubectl get nodes <NODE_NAME> -o yaml | grep -E 'aliyun\.com/gpu-core\.percentage|aliyun\.com/gpu-mem'

    Expected output:

    aliyun.com/gpu-core.percentage:"80"
    aliyun.com/gpu-mem:"6"

    If the values of the aliyun.com/gpu-core.percentage and aliyun.com/gpu-mem fields are not 0, core_mem shared scheduling is enabled.

    binpack

    Use the shared GPU scheduling GPU resource query tool and run the following command to query the GPU resource allocation of the node:

    kubectl inspect cgpu

    Expected output:

    NAME                   IPADDRESS      GPU0(Allocated/Total)  GPU1(Allocated/Total)  GPU2(Allocated/Total)  GPU3(Allocated/Total)  GPU Memory(GiB)
    cn-shanghai.192.0.2.109  192.0.2.109  15/15                   9/15                   0/15                   0/15                   24/60
    --------------------------------------------------------------------------------------
    Allocated/Total GPU Memory In Cluster:
    24/60 (40%)

    The output shows that GPU0 is fully allocated (15/15) and GPU1 is partially allocated (9/15). This matches the strategy of filling one GPU before allocating resources to the next, which confirms that the binpack policy is in effect.

    spread

    Use the shared scheduling GPU resource query tool and run the following command to query the GPU resource allocation of the node:

    kubectl inspect cgpu

    Expected output:

    NAME                   IPADDRESS      GPU0(Allocated/Total)  GPU1(Allocated/Total)  GPU2(Allocated/Total)  GPU3(Allocated/Total)  GPU Memory(GiB)
    cn-shanghai.192.0.2.109  192.0.2.109  4/15                   4/15                   0/15                   4/15                   12/60
    --------------------------------------------------------------------------------------
    Allocated/Total GPU Memory In Cluster:
    12/60 (20%)

    The output shows that 4/15 of the resources are allocated to GPU0, 4/15 to GPU1, and 4/15 to GPU3. This confirms that the spread policy is in effect because the pods are distributed across different GPUs.

Topology-aware scheduling

Topology-aware scheduling is supported only in ACK managed cluster Pro. For more information, see System component version requirements.

  1. Install the ack-ai-installer shared scheduling component.

  2. Enable topology-aware scheduling.

    Replace <NODE_NAME> with the name of your target node and run the following command to add a label to the node. This activates the topology-aware scheduling feature for the node.

    kubectl label node <NODE_NAME> ack.node.gpu.schedule=topology
    After you activate topology-aware scheduling for a node, it no longer supports scheduling for non-topology-aware GPU resources. You can run the kubectl label node <NODE_NAME> ack.node.gpu.schedule=default --overwrite command to change the label and restore exclusive scheduling.
  3. Verify that topology-aware scheduling is enabled.

    Replace <NODE_NAME> with the name of your target node and run the following command to verify that topology-aware scheduling is enabled for the node pool.

    kubectl get nodes <NODE_NAME> -o yaml | grep aliyun.com/gpu

    Expected output:

    aliyun.com/gpu: "2"

    If the value of the aliyun.com/gpu field is not 0, topology-aware scheduling is enabled.

Card model scheduling

You can schedule a Job to a node with a specified GPU model or avoid a specific model.

  1. Check the GPU model of the node.

    Run the following command to query the GPU models of the nodes in the cluster.

    The GPU model name is in the NVIDIA_NAME field.
    kubectl get nodes -L aliyun.accelerator/nvidia_name

    The expected output is similar to the following:

    NAME                        STATUS   ROLES    AGE   VERSION            NVIDIA_NAME
    cn-shanghai.192.XX.XX.176   Ready    <none>   17d   v1.26.3-aliyun.1   Tesla-V100-SXM2-32GB
    cn-shanghai.192.XX.XX.177   Ready    <none>   17d   v1.26.3-aliyun.1   Tesla-V100-SXM2-32GB

    Expand to view more ways to check the GPU model.

    On the Clusters page, click the name of the target cluster. In the navigation pane on the left, choose Workloads > Pods. In the row of the container that you created (for example, tensorflow-mnist-multigpu-***), click Terminal in the Actions column. Then, from the drop-down list, select the container to log on to and run the following commands.

    • Query the GPU model: nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 | sed -e 's/ /-/g'

    • Query the video memory capacity of each GPU: nvidia-smi --id=0 --query-gpu=memory.total --format=csv,noheader | sed -e 's/ //g'

    • Query the total number of GPUs on the node: nvidia-smi -L | wc -l

    image

  2. Enable card model scheduling.

    1. On the Clusters page, find the cluster you want and click its name. In the left-side pane, choose Workloads > Jobs.

    2. On the Jobs page, click Create From YAML. Use the following examples to create an application and enable the card model scheduling feature.

      image

      Specify a particular card model

      Use the GPU model label to run your application on nodes with a specific GPU model.

      Replace Tesla-V100-SXM2-32GB in the aliyun.accelerator/nvidia_name: "Tesla-V100-SXM2-32GB" code with the actual GPU model of your node.

      Expand to view the YAML file details

      apiVersion: batch/v1
      kind: Job
      metadata:
        name: tensorflow-mnist
      spec:
        parallelism: 1
        template:
          metadata:
            labels:
              app: tensorflow-mnist
          spec:
            nodeSelector:
              aliyun.accelerator/nvidia_name: "Tesla-V100-SXM2-32GB" # Runs the application on a Tesla V100-SXM2-32GB GPU.
            containers:
            - name: tensorflow-mnist
              image: registry.cn-beijing.aliyuncs.com/acs/tensorflow-mnist-sample:v1.5
              command:
              - python
              - tensorflow-sample-code/tfjob/docker/mnist/main.py
              - --max_steps=1000
              - --data_dir=tensorflow-sample-code/data
              resources:
                limits:
                  nvidia.com/gpu: 1
              workingDir: /root
            restartPolicy: Never

      After the Job is created, you can choose Workloads > Pods in the navigation pane on the left. In the pod list, you can see that an example pod is successfully scheduled to a matching node, which demonstrates flexible scheduling based on the GPU model label.

      Exclude a particular card model

      Use the GPU model label with node affinity and anti-affinity to prevent your application from running on certain GPU models.

      Replace Tesla-V100-SXM2-32GB in values: - "Tesla-V100-SXM2-32GB" with the actual GPU model of your node.

      Expand to view the YAML file details

      apiVersion: batch/v1
      kind: Job
      metadata:
        name: tensorflow-mnist
      spec:
        parallelism: 1
        template:
          metadata:
            labels:
              app: tensorflow-mnist
          spec:
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                  - matchExpressions:
                    - key: aliyun.accelerator/nvidia_name  # Card model scheduling label
                      operator: NotIn
                      values:
                      - "Tesla-V100-SXM2-32GB"            # Prevents the pod from being scheduled to a node with a Tesla-V100-SXM2-32GB card.
            containers:
            - name: tensorflow-mnist
              image: registry.cn-beijing.aliyuncs.com/acs/tensorflow-mnist-sample:v1.5
              command:
              - python
              - tensorflow-sample-code/tfjob/docker/mnist/main.py
              - --max_steps=1000
              - --data_dir=tensorflow-sample-code/data
              resources:
                limits:
                  nvidia.com/gpu: 1
              workingDir: /root
            restartPolicy: Never

      After the Job is created, the application is not scheduled to nodes that have the label key aliyun.accelerator/nvidia_name and the value Tesla-V100-SXM2-32GB. However, it can be scheduled to GPU nodes with other GPU models.