This topic describes how CloudMonitor automatically processes the status change events of Elastic Compute Service (ECS) instances using Simple Message Queue (formerly MNS) (SMQ) queues.
Prerequisites
A queue is created in the SMQ console, for example, ecs-cms-event.
For more information about how to create a queue, see Create a queue.
A system event-triggered alert rule is created in the CloudMonitor console. For more information about how to create a system event-triggered alert rule, see Manage system event-triggered alert rules (old).
Python dependencies are installed.
All code in this topic uses Python 3.7 as an example. You must install the MNS SDK for Python and the ECS SDK for Python.
For more information about how to install the Python SDK, see Install the Python SDK.
For other programming languages, see Download and use MNS SDKs and ECS SDK overview.
Background information
In addition to the existing system events, CloudMonitor supports the status change events for ECS. The status change events include interruption notification events that are applied to spot instances. A status change event is triggered when the status of an ECS instance changes. Instance status changes can be caused by operations that you perform using the ECS console or SDKs or by calling API operations. Instance status changes can also be caused by automatic scaling, overdue payments, or system exceptions.
CloudMonitor provides the following notification methods for system events: SMQ, Function Compute, callback URLs, and Simple Log Service. This topic uses SMQ as an example to describe three best practices about how CloudMonitor automatically processes the status change events of ECS instances.
Procedure
CloudMonitor sends all status change events of ECS instances to SMQ. SMQ receives messages and handles the messages.
Practice 1: Record all creation and release events of ECS instances
You cannot query ECS instances that have been released in the ECS console. To query released ECS instances, you can store status change events of all ECS instances in a database or Simple Log Service. When an ECS instance is created, CloudMonitor sends a Created event. When an ECS instance is released, CloudMonitor sends a Deleted event.
Create a Conf file.
The Conf file must contain the
endpoint
of Simple Message Queue (formerly MNS), theaccess_key
andaccess_key_secret
for your Alibaba Cloud account, theregion_id
(such as cn-beijing), and thequeue_name
.NoteTo obtain the
endpoint
, go to the Queues page in the Simple Message Queue (formerly MNS) console and click Get Endpoint.import os # Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables are configured. # If the project code is leaked, the AccessKey pair may be leaked and the security of all resources within your account may be compromised. The following sample code shows how to use environment variables to obtain an AccessKey pair. The sample code is for reference only. We recommend that you use a more secure method, such as using Security Token Service (STS). class Conf: endpoint = 'http://<id>.mns.<region>.aliyuncs.com/' access_key = os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'] access_key_secret = os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET'] region_id = 'cn-beijing' queue_name = 'test' vsever_group_id = 'your_vserver_group_id'
Use the SMQ SDK to develop an MNS client for receiving messages from SMQ.
# -*- coding: utf-8 -*- import json from mns.mns_exception import MNSExceptionBase import logging from mns.account import Account from . import Conf class MNSClient(object): def __init__(self): self.account = Account(Conf.endpoint, Conf.access_key, Conf.access_key_secret) self.queue_name = Conf.queue_name self.listeners = dict() def regist_listener(self, listener, eventname='Instance:StateChange'): if eventname in self.listeners.keys(): self.listeners.get(eventname).append(listener) else: self.listeners[eventname] = [listener] def run(self): queue = self.account.get_queue(self.queue_name) while True: try: message = queue.receive_message(wait_seconds=5) event = json.loads(message.message_body) if event['name'] in self.listeners: for listener in self.listeners.get(event['name']): listener.process(event) queue.delete_message(receipt_handle=message.receipt_handle) except MNSExceptionBase as e: if e.type == 'QueueNotExist': logging.error('Queue %s not exist, please create queue before receive message.', self.queue_name) else: logging.error('No Message, continue waiting') class BasicListener(object): def process(self, event): pass
The preceding code is used to receive messages from SMQ and delete the messages after the listener is called to consume the messages.
Register a listener to consume events. The listener generates a log entry each time the listener receives a Created or Deleted event.
# -*- coding: utf-8 -*- import logging from .mns_client import BasicListener class ListenerLog(BasicListener): def process(self, event): state = event['content']['state'] resource_id = event['content']['resourceId'] if state == 'Created': logging.info(f'The instance {resource_id} state is {state}') elif state == 'Deleted': logging.info(f'The instance {resource_id} state is {state}')
Add the following code to the Main function:
mns_client = MNSClient() mns_client.regist_listener(ListenerLog()) mns_client.run()
In the production environment, you can store the events in a database or Simple Log Service for subsequent queries and audits.
Practice 2: Automatically restart ECS instances that are shut down
In scenarios where ECS instances may be shut down unexpectedly, you may need to automatically restart the ECS instances.
You can reuse the MNS client developed in Practice 1 and create another listener. When you receive a Stopped event for an ECS instance, you can run the start command on the ECS instance to start the instance.
# -*- coding: utf-8 -*- import logging from alibabacloud_ecs20140526.client import Client as Ecs20140526Client from alibabacloud_ecs20140526.models import StartInstanceRequest from alibabacloud_tea_openapi.models import Config from .config import Conf from .mns_client import BasicListener class ECSClient(object): def __init__(self, client): self.client = client # Start the ECS instance. def start_instance(self, instance_id): logging.info(f'Start instance {instance_id} ...') request = StartInstanceRequest( instance_id=instance_id ) self.client.start_instance(request) class ListenerStart(BasicListener): def __init__(self): ecs_config = Config( access_key_id=Conf.access_key, access_key_secret=Conf.access_key_secret, endpoint=f'ecs.{Conf.region_id}.aliyuncs.com' ) client = Ecs20140526Client(ecs_config) self.ecs_client = ECSClient(client) def process(self, event): detail = event['content'] instance_id = detail['resourceId'] if detail['state'] == 'Stopped': self.ecs_client.start_instance(instance_id)
In the production environment, you can listen to Starting, Running, or Stopped events after the start command is run. Then, you can perform further O&M using a timer and a counter based on whether the ECS instance is started.
Practice 3: Automatically remove spot instances from SLB instances before the spot instances are released
An interruption notification event is triggered about 5 minutes before a spot instance is released. During the 5 minutes, you can perform specific operations to prevent your services from being interrupted. For example, you can remove the spot instance from a Server Load Balancer (SLB) instance.
You can reuse the MNS client developed in Practice 1 and create another listener. When the listener receives the interruption notification event for a spot instance, you can call the SLB SDK to remove the spot instance from an SLB instance.
# -*- coding: utf-8 -*- from alibabacloud_slb20140515.client import Client as Slb20140515Client from alibabacloud_slb20140515.models import RemoveVServerGroupBackendServersRequest from alibabacloud_tea_openapi.models import Config from .config import Conf from .mns_client import BasicListener class SLBClient(object): def __init__(self): self.client = self.create_client() def create_client(self): config = Config() config.access_key_id = Conf.access_key config.access_key_secret = Conf.access_key_secret config.endpoint = 'slb.aliyuncs.com' return Slb20140515Client(config) def remove_vserver_group_backend_servers(self, vserver_group_id, instance_id): request = RemoveVServerGroupBackendServersRequest( region_id=Conf.region_id, vserver_group_id=vserver_group_id, backend_servers="[{'ServerId':'" + instance_id + "','Port':'80','Weight':'100'}]" ) response = self.client.remove_vserver_group_backend_servers(request) return response class ListenerSLB(BasicListener): def __init__(self, vsever_group_id): self.slb_caller = SLBClient() self.vserver_group_id = Conf.vsever_group_id def process(self, event): detail = event['content'] instance_id = detail['instanceId'] if detail['action'] == 'delete': self.slb_caller.remove_vserver_group_backend_servers(self.vserver_group_id, instance_id)
ImportantThe event name for a spot instance release alert is different from the others. The event name is
mns_client.regist_listener(ListenerSLB(Conf.vsever_group_id), 'Instance:PreemptibleInstanceInterruption')
.In the production environment, you can apply for another spot instance and add it as a backend server of an SLB instance to ensure the performance of your services.