The ack-node-problem-detector component is an optimized and enhanced version of the open source Node Problem Detector (NPD) from the Kubernetes community. The component is used to monitor nodes, integrate third-party monitoring plug-ins, and detect node anomalies in a Container Service for Kubernetes (ACK) cluster. It supports the event center feature and lets you integrate custom monitoring plug-ins to enhance node monitoring and detect more node anomalies. This topic describes ack-node-problem-detector and provides its usage notes and release notes.
Introduction
The ack-node-problem-detector component is a node diagnostic tool for ACK clusters that monitors and reports node anomalies. The component consists of the following three modules:
kube-event-init: When you install ack-node-problem-detector, kube-event-init initializes the resources in the event center of Simple Log Service. This allows ack-node-problem-detector-daemonset and kube-eventer to use these resources to store and analyze event data.
ack-node-problem-detector-daemonset: This module runs a pod on each node that meets specified conditions to monitor the node's health status and report cluster status and events. In the release notes tables, the image address of ack-node-problem-detector refers to the image address of ack-node-problem-detector-daemonset.
kube-eventer: By default, kube-eventer reports all events in the cluster to the event center of Simple Log Service. The event center retains event data for 90 days and provides features such as dashboards, alerts, and event search and analysis. You can configure kube-eventer to report cluster events to other systems, such as DingTalk and EventBridge, for data integration. For more information, see kube-eventer.
Usage notes
For more information about how to install ack-node-problem-detector, and about its usage notes and new features, see Event monitoring.
Change log
July 2025
Version number | Image address | Last Modified | Description |
1.2.27 | | 2025-07-24 |
Note This version is in a canary release. To use this version, submit a ticket. Security hardening for kube-eventer and kube-event-init. ACK dedicated clusters support security hardening using the enhanced mode for accessing ECS instance metadata. During authentication, the system accesses ECS instance metadata in enhanced mode to improve cluster security. For more information, see Enforce the enhanced mode to access ECS instance metadata.
|
June 2025
Version number | Image address | Modification Time | Description |
1.2.26 | ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.16-8d2193b-aliyun npd-gpu: registry-__ACK_REGION_ID__-vpc.ack.aliyuncs.com/acs/npd-gpu-plugin:v0.4.1-7359b830-aliyun kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.12-c7c1896-aliyun kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.8-e43647f-aliyun
| 2025-06-11 |
Note This version is in a canary release. To use this version, submit a ticket. |
Version number | Image address | Modification Time | Description |
1.2.25 | ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.16-8ed7053-aliyun npd-gpu: registry-__ACK_REGION_ID__-vpc.ack.aliyuncs.com/acs/npd-gpu-plugin:v0.4.0-e434dc36-aliyun kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.12-c7c1896-aliyun kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.8-e43647f-aliyun
| 2025-06-06 |
Note This version is in a canary release. To use this version, submit a ticket. The npd-gpu container is added for GPU fault detection. Isolation of specified GPUs is supported when GPU faults are detected. Multiple check items are supported, including NvidiaXID44Error, NvidiaXID61Error, NvidiaXID62Error, and NvidiaXID69Error. For more information, see GPU fault detection and automatic isolation. You can configure the GPU check items that you want to enable using ack-node-problem-detector-config. The size of the ack-node-problem-detector image is optimized.
|
August 2024
Version number | Image address | Modification Time | Description |
1.2.20 | ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.14-3c6002c-aliyun kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.11-0620284-aliyun kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.8-e43647f-aliyun
| 2024-08-20 | The GPU fault inspection feature is supported on ECS nodes. The kube-eventer component is upgraded to optimize the performance bottleneck in scenarios where many events are reported in a cluster. The kube-eventer component is upgraded to support the V4 signature algorithm for Data Transmission Service of Simple Log Service. A new component parameter is added. You can manually configure the local port of the ack-node-problem-detector DaemonSet pod to 20256 or 20257. This port is disabled by default.
|
December 2023
Version number | Image address | Last Modified | Description |
v1.2.18 | ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.13-003ac31-aliyun kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-27a468a-aliyun kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun
| 2023-12-18 | Fixed a bug that caused false positive anomalous activity reports due to cached historical kernel logs when a PodOOMKilling anomaly was detected. When you upgrade from an earlier version of the ack-node-problem-detector component, custom component parameters can be inherited.
|
August 2023
Version number | Image address | Modification Time | Description |
v1.2.17 | ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.12-bf8aff8-aliyun kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-27a468a-aliyun kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun
| 2023-08-24 | You can modify the component parameters on the component management page in the ACK console to update the configurations of the Project and Logstore instances in Simple Log Service. You can attach additional tags, such as the cluster name, when you send log data to Simple Log Service. This information is displayed by default in the Simple Log Service data in the ACK Event Center.
|
June 2023
Version number | Image address | Modified Time | Description |
v1.2.16 | ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.12-bf8aff8-aliyun kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-019546c-aliyun kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun
| 2023-06-27 | You can configure the resource specification parameters of the component on the component management page in the ACK console. |
v1.2.15 | ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.12-bf8aff8-aliyun kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-019546c-aliyun kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun
| 2023-06-06 | Optimized the performance load that ack-node-problem-detector imposes on API Server and etcd when PodOOMKilling frequently occurs in large-scale clusters. |
February 2023
Version number | Image address | Modification Time | Description |
v1.2.14 | ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.11-edc7907-aliyun kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.6-bbf76f7-aliyun kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun
| 2023-02-03 | |
September 2022
Version number | Image address | Release date | Description |
v1.2.11 | ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.11-edc7907-aliyun kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.6-bbf76f7-aliyun kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun
| 2022-09-30 | |
February 2022
Version number | Image address | Release date | Description |
v1.2.9 | ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.10-e0ff7d2 kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.6-f0efecf-aliyun kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.6-a92aba6-aliyun
| 2022-02-22 | |
January 2022
Version number | Image address | Last Modified | Description |
v1.2.8 | ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.10-e0ff7d2 kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.5-cc7ec54-aliyun kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.6-a92aba6-aliyun
| 2022-01-20 | |
November 2021
Version number | Image address | Release date | Description |
v1.2.7 | ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.10-e0ff7d2 kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.5-cc7ec54-aliyun kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.6-a92aba6-aliyun
| 2021-11-25 | |
April 2021
Version number | Image address | Last Modified | Description |
v1.2.5 | ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.6.3-28-160499f kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.4-0f5aaee-aliyun kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:1.5-5e0e7c1-aliyun
| 2021-04-25 | Fixed an issue where kube-event-init in the kube-system namespace would return a "414 Request Too Large" error when the Event Center was enabled. Optimized the eventer list-watch mechanism to prevent excessive request traffic to etcd. For more information, see eventer list-watch. Fixed an issue where kube-eventer incorrectly parsed the timestamps of some system events. For more information, see fix FailedScheduling event write to sls with wrong timestamp.
|
July 2020
Version number | Image address | Modification Time | Description |
v0.6.3-28-160499f | registry.aliyuncs.com/acs/node-problem-detector:v0.6.3-28-160499f | 2020-07-27 | Optimized OOM Killing event messages to include information such as the pod name, namespace, and UID. Optimized the execution efficiency of the check_fd plugin. Optimized event notifications for the node PID watermark. Upgraded the network issue detection plugin. Added a plugin to monitor and alert on the inode watermark of the node's system disk.
|