All Products
Search
Document Center

Container Service for Kubernetes:ack-node-problem-detector

Last Updated:Jul 29, 2025

The ack-node-problem-detector component is an optimized and enhanced version of the open source Node Problem Detector (NPD) from the Kubernetes community. The component is used to monitor nodes, integrate third-party monitoring plug-ins, and detect node anomalies in a Container Service for Kubernetes (ACK) cluster. It supports the event center feature and lets you integrate custom monitoring plug-ins to enhance node monitoring and detect more node anomalies. This topic describes ack-node-problem-detector and provides its usage notes and release notes.

Introduction

The ack-node-problem-detector component is a node diagnostic tool for ACK clusters that monitors and reports node anomalies. The component consists of the following three modules:

  • kube-event-init: When you install ack-node-problem-detector, kube-event-init initializes the resources in the event center of Simple Log Service. This allows ack-node-problem-detector-daemonset and kube-eventer to use these resources to store and analyze event data.

  • ack-node-problem-detector-daemonset: This module runs a pod on each node that meets specified conditions to monitor the node's health status and report cluster status and events. In the release notes tables, the image address of ack-node-problem-detector refers to the image address of ack-node-problem-detector-daemonset.

    Note

    For more information about the open source node-problem-detector, see node-problem-detector.

  • kube-eventer: By default, kube-eventer reports all events in the cluster to the event center of Simple Log Service. The event center retains event data for 90 days and provides features such as dashboards, alerts, and event search and analysis. You can configure kube-eventer to report cluster events to other systems, such as DingTalk and EventBridge, for data integration. For more information, see kube-eventer.

Usage notes

For more information about how to install ack-node-problem-detector, and about its usage notes and new features, see Event monitoring.

Change log

July 2025

Version number

Image address

Last Modified

Description

1.2.27

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.13-b4a3960-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.9-2b115d6-aliyun

2025-07-24

Note

This version is in a canary release. To use this version, submit a ticket.

  • Security hardening for kube-eventer and kube-event-init.

  • ACK dedicated clusters support security hardening using the enhanced mode for accessing ECS instance metadata. During authentication, the system accesses ECS instance metadata in enhanced mode to improve cluster security. For more information, see Enforce the enhanced mode to access ECS instance metadata.

June 2025

Version number

Image address

Modification Time

Description

1.2.26

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.16-8d2193b-aliyun

  • npd-gpu: registry-__ACK_REGION_ID__-vpc.ack.aliyuncs.com/acs/npd-gpu-plugin:v0.4.1-7359b830-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.12-c7c1896-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.8-e43647f-aliyun

2025-06-11

Note

This version is in a canary release. To use this version, submit a ticket.

  • The issue of the NvidiaDeviceRecoverd event not being successfully emitted in some GPU self-healing scenarios is fixed.

  • The size of the ack-node-problem-detector image is optimized.

Version number

Image address

Modification Time

Description

1.2.25

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.16-8ed7053-aliyun

  • npd-gpu: registry-__ACK_REGION_ID__-vpc.ack.aliyuncs.com/acs/npd-gpu-plugin:v0.4.0-e434dc36-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.12-c7c1896-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.8-e43647f-aliyun

2025-06-06

Note

This version is in a canary release. To use this version, submit a ticket.

  • The npd-gpu container is added for GPU fault detection.

  • Isolation of specified GPUs is supported when GPU faults are detected.

  • Multiple check items are supported, including NvidiaXID44Error, NvidiaXID61Error, NvidiaXID62Error, and NvidiaXID69Error. For more information, see GPU fault detection and automatic isolation.

  • You can configure the GPU check items that you want to enable using ack-node-problem-detector-config.

  • The size of the ack-node-problem-detector image is optimized.

August 2024

Version number

Image address

Modification Time

Description

1.2.20

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.14-3c6002c-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.11-0620284-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.8-e43647f-aliyun

2024-08-20

  • The GPU fault inspection feature is supported on ECS nodes.

  • The kube-eventer component is upgraded to optimize the performance bottleneck in scenarios where many events are reported in a cluster.

  • The kube-eventer component is upgraded to support the V4 signature algorithm for Data Transmission Service of Simple Log Service.

  • A new component parameter is added. You can manually configure the local port of the ack-node-problem-detector DaemonSet pod to 20256 or 20257. This port is disabled by default.

December 2023

Version number

Image address

Last Modified

Description

v1.2.18

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.13-003ac31-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-27a468a-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

2023-12-18

  • Fixed a bug that caused false positive anomalous activity reports due to cached historical kernel logs when a PodOOMKilling anomaly was detected.

  • When you upgrade from an earlier version of the ack-node-problem-detector component, custom component parameters can be inherited.

August 2023

Version number

Image address

Modification Time

Description

v1.2.17

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.12-bf8aff8-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-27a468a-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

2023-08-24

  • You can modify the component parameters on the component management page in the ACK console to update the configurations of the Project and Logstore instances in Simple Log Service.

  • You can attach additional tags, such as the cluster name, when you send log data to Simple Log Service. This information is displayed by default in the Simple Log Service data in the ACK Event Center.

June 2023

Version number

Image address

Modified Time

Description

v1.2.16

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.12-bf8aff8-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-019546c-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

2023-06-27

You can configure the resource specification parameters of the component on the component management page in the ACK console.

v1.2.15

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.12-bf8aff8-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-019546c-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

2023-06-06

Optimized the performance load that ack-node-problem-detector imposes on API Server and etcd when PodOOMKilling frequently occurs in large-scale clusters.

February 2023

Version number

Image address

Modification Time

Description

v1.2.14

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.11-edc7907-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.6-bbf76f7-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

2023-02-03

  • Optimized the image pull speed of the component.

  • ACK Edge clusters are supported.

September 2022

Version number

Image address

Release date

Description

v1.2.11

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.11-edc7907-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.6-bbf76f7-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

2022-09-30

  • Optimized the performance of the ack-node-problem-detector inspection logic to reduce the load on the core components of the cluster.

  • Image security hardening.

February 2022

Version number

Image address

Release date

Description

v1.2.9

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.10-e0ff7d2

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.6-f0efecf-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.6-a92aba6-aliyun

2022-02-22

  • Kernel inspection is supported.

  • Security hardening.

January 2022

Version number

Image address

Last Modified

Description

v1.2.8

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.10-e0ff7d2

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.5-cc7ec54-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.6-a92aba6-aliyun

2022-01-20

  • Compatible with different modes of containerd.

  • Optimized the Quality of Service (QoS) limits for component resources to improve component stability.

November 2021

Version number

Image address

Release date

Description

v1.2.7

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.10-e0ff7d2

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.5-cc7ec54-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.6-a92aba6-aliyun

2021-11-25

  • Compatible with system services of kernel versions such as Alibaba Cloud Linux 3 and CentOS 8.

  • ARM architecture environments are supported.

April 2021

Version number

Image address

Last Modified

Description

v1.2.5

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.6.3-28-160499f

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.4-0f5aaee-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:1.5-5e0e7c1-aliyun

2021-04-25

  • Fixed an issue where kube-event-init in the kube-system namespace would return a "414 Request Too Large" error when the Event Center was enabled.

  • Optimized the eventer list-watch mechanism to prevent excessive request traffic to etcd. For more information, see eventer list-watch.

  • Fixed an issue where kube-eventer incorrectly parsed the timestamps of some system events. For more information, see fix FailedScheduling event write to sls with wrong timestamp.

July 2020

Version number

Image address

Modification Time

Description

v0.6.3-28-160499f

registry.aliyuncs.com/acs/node-problem-detector:v0.6.3-28-160499f

2020-07-27

  • Optimized OOM Killing event messages to include information such as the pod name, namespace, and UID.

  • Optimized the execution efficiency of the check_fd plugin.

  • Optimized event notifications for the node PID watermark.

  • Upgraded the network issue detection plugin.

  • Added a plugin to monitor and alert on the inode watermark of the node's system disk.