When you install an NVIDIA driver for a node, ensure that the driver version is supported by ACK. This topic describes the NVIDIA driver versions that ACK supports.
Introduction to CUDA
CUDA is a parallel computing platform and programming model that NVIDIA introduced in 2007. CUDA uses a graphics processing unit (GPU) to significantly improve computing performance.
The following figure shows the CUDA architecture. The CUDA software stack includes a driver API and a runtime API. The differences are as follows.
Driver API: This API is feature-rich but complex to use.
CUDA Runtime API: This API encapsulates some driver APIs. It simplifies usage by hiding some driver initialization operations.
The CUDA Driver API is provided by the NVIDIA Driver package. The CUDA Library and CUDA Runtime are provided by the CUDA Toolkit package.
Driver and cluster version compatibility
The following table lists the NVIDIA GPU driver versions that are supported by different ACK cluster versions.
ACK Lingjun clusters and Node Lingjun in ACK Pro managed clusters include built-in GPU drivers in their operating system images. Therefore, you cannot use node labels to install a specific GPU driver version. This also applies to Edge node pools in ACK Edge clusters.
Drivers of version 510 and later may occasionally cause XID 119 or XID 120 faults. If you encounter these issues, see What do I do if a GPU card is dropped due to an XID 119 or XID 120 fault?.
Driver version 550 fixes frequent issues in some applications, including XID 119, XID 120, and XID 31 faults, and kernel panics. Upgrade the GPU drivers on your existing GPU nodes to version 550.
ACK periodically updates the default driver versions for different cluster versions. As a result, newly created GPU nodes in your cluster may use different driver versions. To prevent this, specify a driver version for the node pool. For more information, see Customize the GPU driver version for a node by specifying a version number.
When you create a node pool, if the driver version you specify is not in the compatibility list, ACK automatically installs the default driver version. If you specify a driver version that is incompatible with the latest operating system, the node may fail to be added. In this case, select the latest supported driver version.
If your node pool uses custom GPU drivers configured by specifying driver versions or adding OSS URLs, OS image upgrades may trigger OS-driver incompatibility issues. Always select the latest validated driver version from the List of NVIDIA driver versions supported by ACK.
Cluster version | Default driver version | Custom driver version support | Supported NVIDIA driver versions |
1.28 and later | 535.161.07 | Yes |
The following driver versions are incompatible with the latest operating system.
|
1.26 | 535.161.07 | Yes | |
1.24 | 535.161.07 | Yes | |
1.22 | 535.161.07 | Yes | |
1.20 | 535.161.07 | Yes |
|
1.18.8 | 418.181.07 | Yes | |
1.16.9 | 418.181.07 | Yes | |
1.16.6 | 418.87.01 | No | |
1.14.8 | 418.181.07 | Yes |
Driver and OS kernel version compatibility
For more information about the mapping between kernel versions and OS image IDs, see the mapping table of kernel versions and image IDs.
Driver version | Alibaba Cloud Linux 2 | Alibaba Cloud Linux 3 | CentOS | Ubuntu |
550.163.01 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, ∞) |
550.144.03 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, ∞) |
550.90.07 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, ∞) |
550.54.15 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, ∞) |
550.54.14 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, ∞) |
535.247.01 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, ∞) |
535.230.02 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, ∞) |
535.161.07 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, ∞) |
535.129.03 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64] Unsupported range: [5.10.134-18.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, 5.15.0-101-generic] Unsupported range: [5.15.0-106-generic, ∞) |
535.98 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64] Unsupported range: [5.10.134-18.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, 5.15.0-101-generic] Unsupported range: [5.15.0-106-generic, ∞) |
535.54.03 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64] Unsupported range: [5.10.134-18.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, 5.15.0-101-generic] Unsupported range: [5.15.0-106-generic, ∞) |
525.147.05 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64] Unsupported range: [5.10.134-18.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, 5.15.0-101-generic] Unsupported range: [5.15.0-106-generic, ∞) |
525.105.17 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64] Unsupported range: [5.10.134-18.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, 5.15.0-101-generic] Unsupported range: [5.15.0-106-generic, ∞) |
515.105.01 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64] Unsupported range: [5.10.134-18.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, 5.15.0-101-generic] Unsupported range: [5.15.0-106-generic, ∞) |
515.86.01 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64] Unsupported range: [5.10.134-18.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, 5.15.0-101-generic] Unsupported range: [5.15.0-106-generic, ∞) |
510.108.03 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64] Unsupported range: [5.10.134-18.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, 5.15.0-101-generic] Unsupported range: [5.15.0-106-generic, ∞) |
510.54 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64] Unsupported range: [5.10.134-15.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, 5.15.0-101-generic] Unsupported range: [5.15.0-106-generic, ∞) |
510.47.03 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64] Unsupported range: [5.10.134-15.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, 5.15.0-101-generic] Unsupported range: [5.15.0-106-generic, ∞) |
470.256.02 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, ∞) |
470.161.03 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64] Unsupported range: [5.10.134-18.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, 5.15.0-101-generic] Unsupported range: [5.15.0-106-generic, ∞) |
470.103.01 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64] Unsupported range: [5.10.134-15.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, 5.15.0-101-generic] Unsupported range: [5.15.0-106-generic, ∞) |
470.82.01 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64] Unsupported range: [5.10.134-15.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, 5.15.0-101-generic] Unsupported range: [5.15.0-106-generic, ∞) |
470.57.02 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64] Unsupported range: [5.10.134-15.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Unsupported range: [5.15.0-40-generic, ∞) |
460.106.00 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64] Unsupported range: [5.10.134-15.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Supported range: [5.15.0-40-generic, 5.15.0-101-generic] Unsupported range: [5.15.0-106-generic, ∞) |
460.91.03 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64] Unsupported range: [5.10.134-15.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Unsupported range: [5.15.0-40-generic, ∞) |
460.73.01 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64] Unsupported range: [5.10.134-15.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Unsupported range: [5.15.0-40-generic, ∞) |
460.32.03 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64] Unsupported range: [5.10.134-15.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Unsupported range: [5.15.0-40-generic, ∞) |
450.119.04 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64] Unsupported range: [5.10.134-15.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Unsupported range: [5.15.0-40-generic, ∞) |
450.102.04 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Supported range: [5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64] Unsupported range: [5.10.134-15.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Unsupported range: [5.15.0-40-generic, ∞) |
450.80.02 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Unsupported range: [5.10.23-5.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Unsupported range: [5.15.0-40-generic, ∞) |
440.33.01 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Unsupported range: [5.10.23-5.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Unsupported range: [5.15.0-40-generic, ∞) |
418.181.07 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Unsupported range: [5.10.23-5.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Unsupported range: [5.15.0-40-generic, ∞) |
418.113 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Unsupported range: [5.10.23-5.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Unsupported range: [5.15.0-40-generic, ∞) |
418.87.01 | Supported range: [4.19.81-17.1.al7.x86_64, ∞) | Unsupported range: [5.10.23-5.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, ∞) | Unsupported range: [5.15.0-40-generic, ∞) |
410.93 | Supported range: [4.19.81-17.1.al7.x86_64, 4.19.91-18.al7.x86_64] Unsupported range: [4.19.91-19.1.al7.x86_64, ∞) | Unsupported range: [5.10.23-5.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, 3.10.0-957.21.3.el7.x86_64] Unsupported range: [3.10.0-1062.9.1.el7.x86_64, ∞) | Unsupported range: [5.15.0-40-generic, ∞) |
410.79 | Supported range: [4.19.81-17.1.al7.x86_64, 4.19.91-18.al7.x86_64] Unsupported range: [4.19.91-19.1.al7.x86_64, ∞) | Unsupported range: [5.10.23-5.al8.x86_64, ∞) | Supported range: [3.10.0-862.14.4.el7.x86_64, 3.10.0-957.21.3.el7.x86_64] Unsupported range: [3.10.0-1062.9.1.el7.x86_64, ∞) | Unsupported range: [5.15.0-40-generic, ∞) |
Driver and CUDA Toolkit compatibility
You can select a suitable NVIDIA driver version based on the CUDA Toolkit version that your application uses. For the compatibility between CUDA Toolkit and NVIDIA driver versions, see CUDA Toolkit Release Notes.
Get the Driver API version
If an NVIDIA driver package is installed on a node, you can run the nvidia-smi
command to view the driver version and the CUDA Driver API version. In the following example, the driver version is 550.144.03 and the Driver API version is 12.6. This indicates that the driver supports a maximum CUDA Runtime API version of 12.6.
Mon Mar 24 08:51:55 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P4 On | 00000000:00:07.0 Off | 0 |
| N/A 33C P8 7W / 75W | 0MiB / 7680MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Get the runtime API version
When you build a container image that requires CUDA Toolkit, we recommend that you use the official CUDA base images from NVIDIA. These images have CUDA Toolkit pre-installed and are available in different versions. You can build your application container images on top of these base images.
When you use GPUs in containers, the CUDA Runtime API version available to your application is determined by the CUDA base image that you use. For example, if you build your application's Docker image from the NVIDIA/CUDA:12.2.0-base-Ubuntu20.04
base image, the application uses CUDA Runtime API version 12.2.0.