Gateway with Inference Extension is an enhanced component based on the Kubernetes Gateway API and its Inference Extension specification. It supports Layer 4 and Layer 7 routing services in Kubernetes and provides intelligent load balancing for large language model (LLM) inference scenarios. This topic describes the Gateway with Inference Extension component, its usage, and its release notes.
Component information
Gateway with Inference Extension is built on the Envoy Gateway project. It is compatible with Gateway API features and integrates the inference extensions from the Gateway API. It is primarily used to provide load balancing and routing for LLM inference services.
Usage notes
The Gateway with Inference Extension component depends on the custom resource definitions (CRDs) provided by the Gateway API component. Before you install this component, make sure that the Gateway API component is installed in your cluster. For more information, see Install components.
Release notes
May 2025
Version number | Modification Time | Changes | Impact |
v1.4.0-aliyun.1 | May 27, 2025 |
| Upgrading from an earlier version restarts the gateway pod. Perform the upgrade during off-peak hours. |
April 2025
Version number | Modification Time | Changes | Impact |
v1.3.0-aliyun.2 | May 7, 2025 |
| Upgrading from an earlier version restarts the gateway pod. Perform the upgrade during off-peak hours. |
March 2025
Version number | Modification Time | Changes | Impact |
v1.3.0-aliyun.1 | March 12, 2025 |
| This upgrade does not affect your services. |