CN106844165B

CN106844165B - Alarm method and device

Info

Publication number: CN106844165B
Application number: CN201611170786.7A
Authority: CN
Inventors: 刘胜; 赵波; 郑振宇
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2016-12-16
Filing date: 2016-12-16
Publication date: 2020-09-29
Anticipated expiration: 2036-12-16
Also published as: CN112214382A; CN106844165A

Abstract

The embodiment of the application provides an alarming method and an alarming device, relates to the technical field of communication, and solves the problem that an existing alarming mode in the prior art is low in efficiency in a large-scale resource scene. The method comprises the following steps: acquiring a target resource list meeting a resource filtering condition according to the resource filtering condition defined in a pre-established alarm rule, wherein the alarm rule defines the resource filtering condition, a monitoring parameter of the target resource and an alarm threshold of the monitoring parameter; for each target resource in the target resource list, respectively performing the following operations: acquiring the current value of the monitoring parameter of the target resource; determining whether the current value is within the alert threshold range; and if the current value is within the alarm threshold range, sending an alarm message.

Description

Alarm method and device

Technical Field

The present application relates to the field of communications technologies, and in particular, to an alarm method and apparatus.

Background

The OpenStack is an open source project for providing software for construction and management of public cloud, private cloud and mixed cloud, and is also the largest open source cloud platform at present. OpenStack contains multiple items, such as telemetrology items. The telemetering project comprises a Ceilometer sub-project and an Aodh sub-project, and the Ceilometer sub-project is mainly responsible for the functions of collecting, warehousing, inquiring and the like of metering monitoring information in the telemetering project; the Aodh sub-project is mainly responsible for alarm service, and comprises functions of alarm definition, alarm evaluation, alarm notification and the like.

Currently, the telemetrology project provides three alarm modes, including: threshold (threshold) alarm, Composite alarm, and gnochi alarm. The three alarm modes all need to establish an alarm rule for each resource, which is low in efficiency in a large-scale resource scene.

Therefore, how to improve the alarm efficiency in a large-scale resource scene under the OpenStack platform is an urgent problem to be solved at present.

Disclosure of Invention

The embodiment of the application provides an alarm method and an alarm device, which at least solve the problem that the existing alarm mode is low in efficiency in a large-scale resource scene.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

on one hand, an alarm method is provided, which is applied to an OpenStack platform, and the method comprises the following steps: acquiring a target resource list conforming to a resource filtering condition according to the resource filtering condition defined in a pre-established alarm rule, wherein the alarm rule defines the resource filtering condition, a monitoring parameter of the target resource and an alarm threshold of the monitoring parameter; for each target resource in the target resource list, respectively performing the following operations: acquiring the current value of the monitoring parameter of the target resource; determining whether the current value is within the alarm threshold range; and if the current value is within the alarm threshold range, sending an alarm message. Compared with the existing method that an alarm rule needs to be established for each resource in a group of resources when an alarm is established for the same monitoring parameter of the group of resources in a large-scale resource scene of an OpenStack platform, the alarm method provided by the application can perform alarm monitoring only by establishing one alarm rule, so that redundancy generated by establishing a large number of alarm rules is reduced, the alarm efficiency in the large-scale scene is improved, and the management cost of the OpenStack platform is further reduced.

In one possible design, the alarm rule further defines an aggregation function of the monitored parameters and a time span for monitoring the target resource; the obtaining of the current value of the monitoring parameter of the target resource includes: and calling the aggregation function to query a statistical database according to the identifier of the target resource, the monitoring parameter and the time span to obtain a current value of the monitoring parameter of the target resource, wherein the statistical database comprises the identifier of the target resource, the monitoring parameter and the corresponding relation of the time span.

In one possible design, the resource filtering condition includes: the type of the target resource; or the type of the target resource meeting the preset condition. Because the resource filtering condition defined in the alarm rule can be the type of the target resource meeting the preset condition, a group of special resource establishment alarm rules can be monitored according to the user requirements, and the user experience is further improved.

In one possible design, the alert message includes an identification of the target resource.

In one possible design, the alarm rules also define packet keywords; after the target resource list meeting the resource filtering condition is obtained according to the resource filtering condition defined in the pre-established alarm rule, the method further comprises the following steps: and grouping the target resources in the target resource list according to the grouping keywords to obtain at least one group of target resources.

In one possible design, the alert message also includes a group identification of the group in which the target resource is located.

In another aspect, an embodiment of the present application provides an alarm device, where the alarm device is applied to an OpenStack platform, and the alarm device includes: the device comprises an acquisition module, a determination module and a sending module; the acquisition module is used for acquiring a target resource list meeting the resource filtering condition according to the resource filtering condition defined in the pre-established alarm rule, wherein the alarm rule defines the resource filtering condition, the monitoring parameter of the target resource and the alarm threshold of the monitoring parameter; for each target resource in the list of target resources: the acquisition module is also used for acquiring the current value of the monitoring parameter of the target resource; the determining module is configured to determine whether the current value is within the alarm threshold range; the sending module is used for sending an alarm message to the external device if the current value is within the alarm threshold range. Compared with the existing method that an alarm rule needs to be established for each resource in a group of resources when an alarm is established for the same monitoring parameter of the group of resources in a large-scale resource scene of an OpenStack platform, the alarm equipment provided by the application can perform alarm monitoring only by establishing one alarm rule, so that redundancy generated by establishing a large number of alarm rules is reduced, the alarm efficiency in the large-scale scene is improved, and the management cost of the OpenStack platform is further reduced.

In one possible design, the alarm rule further defines an aggregation function of the monitored parameters and a time span for monitoring the target resource; the obtaining module is further configured to obtain a current value of the monitoring parameter of the target resource, and specifically includes: and calling the aggregation function to query a statistical database according to the identifier of the target resource, the monitoring parameter and the time span to obtain a current value of the monitoring parameter of the target resource, wherein the statistical database comprises the identifier of the target resource, the monitoring parameter and the corresponding relation of the time span.

In one possible design, the alarm rule further defines a grouping key, and the alarm device further includes a grouping module; the grouping module is used for grouping the target resources in the target resource list according to the grouping keywords after the acquisition module acquires the target resource list meeting the resource filtering condition according to the resource filtering condition defined in the pre-established alarm rule, so as to obtain at least one group of target resources.

In another aspect, an embodiment of the present application provides an alert device, including: a processor, a memory, a bus, and a communication interface; the memory is used for storing computer execution instructions, the processor is connected with the memory through the bus, and when the alarm device runs, the processor executes the computer execution instructions stored in the memory so as to enable the alarm device to execute the alarm method.

In yet another aspect, the present application provides a computer storage medium for storing computer software instructions for an alarm method according to any one of the above methods, which includes a program designed to execute the alarm method according to any one of the above methods.

In still another aspect, the present application provides a computer program, where the computer program includes instructions, and when the computer program is executed by a computer, the computer may execute the flow in the alarm method of any one of the above.

In addition, the technical effects brought by any design mode in the above alarm device embodiments can be referred to the technical effects brought by different design modes in the above alarm method embodiments, and are not described herein again.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

Fig. 1 is a logic architecture diagram of OpenStack applied in the embodiment of the present application;

fig. 2 is a schematic diagram of a telemeasurement project according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a computer device according to an embodiment of the present application;

fig. 4 is a schematic flowchart of an alarm method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an alarm device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of another alarm device provided in the embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Fig. 1 is a logic architecture diagram of OpenStack applied in the embodiment of the present application. As shown in fig. 1, OpenStack includes a plurality of items, and these items constitute a set of fully functional OpenStack cloud environments. The most core items include: nova, circle and Neutron, which constitute the most basic functions of an OpenStack cloud environment.

Of course, in addition to the Nova, Cinder, and Neutron projects, there are some very important projects that provide sophisticated OpenStack cloud platform capabilities, such as Glance, Keystone, Horizon, Swift, Ironic, Trove, Sahara, Heat, and Ceilometer. These items in OpenStack are all running in Virtual Machines (VMs).

The following briefly introduces the functions of each item in OpenStack as follows:

nova: virtualization capability management of physical machine resources is provided, as well as virtual machine lifecycle management.

And (3) a Cinder: the ability to block storage device management for virtual machines is provided.

Neutron: management of virtualized Network resources is provided, including advanced services such as networks, subnets, ports, and Virtual Private Networks (VPNs).

Glance: and image management services are provided, including images required by the virtual machine, snapshots and the like.

Keystone: user management and authentication services are provided.

Horizon: dashboard (web page) services for OpenStack are provided.

Swift: an object storage service is provided.

Ironic: bare metal management services are provided.

And (5) Trove: a "database as a service" functionality is provided.

Sahara: big data services are provided.

Heat: business orchestration and software configuration services are provided.

The Ceilometer project: a monitoring metering service is provided.

Among them, the cenometer item in fig. 1 is named as telemetric item after OpenStack version M (OpenStack names its version using alphabetical order) and M. The telemetrology project splits the functionality of the ceramer project prior to the OpenStack version M into two parts: one is a Ceilometer sub-project and is specially responsible for the functions of collecting, warehousing, inquiring and the like of metering monitoring information in a Telemetry project, for example, collecting related resources of virtual machines such as virtual machines, volumes and mirror images in an OpenStack environment and metering information and state information of a physical host; one is an Aodh sub-item and is specially responsible for the functions of defining, evaluating and notifying alarms.

FIG. 2 is a diagram illustrating the structure of a Telemetry project. As shown in fig. 2, the telemetric item includes: a Ceilometer sub-project, a database, an Application Programming Interface (API), and an Aodh sub-project.

The following briefly introduces the functions of the various modules in the telemetrology project as follows:

ceilometer sub-project: including Polling Agents (english: Polling Agents) services, Notification Agents (english: Notification Agents) services, and collection services (english: Collectors).

The polling agent: the resource information collection agent is operated on a set of control nodes and all computing nodes of OpenStack. The nodes where the management services operate are control nodes, and the nodes where the virtual machines operate are computing nodes. If the network node runs on the control node, the polling agent is responsible for acquiring the resource information of the OpenStack through the API of each module of the OpenStack, for example, the API of the Glance is called to acquire the information of the size of the mirror image; if the virtual machine runs on a computing node, the polling agent is responsible for collecting information of the virtual machine on the node, such as collecting the usage rate of a Central Processing Unit (CPU), memory usage rate, disk read-write rate, and the like of the virtual machine running on the host.

The notification agent: in OpenStack, each service sends a notification message to a notification bus (the notification bus is also called a message queue) when some virtualized resource is processed and the state of the resource changes. And the notification agent is responsible for receiving notification messages sent by other components in the OpenStack and receiving and processing messages sent by the polling agent from the notification bus.

A collector: after the notification agent processes the message, the acquired message is continuously sent to the notification bus, and the collector monitors the notification bus, receives the acquired message, formats the acquired message into a sampling record and stores the sampling record in the database.

API: including ceilometer-api and aodh-api.

Among them, the Ceilometer-API provides API services for Ceilometer sub-items. The main APIs include: an API for querying metering data (sample-list), an API for querying metering indicators (meter-list), an API for querying statistics of metering data (static-list), an API for querying resource object lists (resource-list) that have already been collected, and the like.

The Aodh-API provides API services for Aodh sub-items.

A database: the Ceilometer data storage system is used for storing data collected by a collector of the Ceilometer and data input through an API.

Aodh child item: including an Aodh evaluation (english: Aodh-evaluater) service and an Aodh notification (english: Aodh-notifier) service.

The alarm method provided by the embodiment of the application is mainly realized through an Aodh sub-item. In the embodiment of the present application, a device running an Aodh sub-item is referred to as an alarm device, and of course, the device running the Aodh sub-item may also be referred to by other names, which is not specifically limited in the embodiment of the present application.

It should be noted that "/" in this context means "or", for example, A/B may mean A or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. "plurality" means two or more than two.

As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).

It should be noted that in the embodiments of the present application, words such as "exemplary" or "for example" are used to indicate examples, illustrations or explanations. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

In the embodiments of the present invention, "of", "corresponding" and "corresponding" may be mixed, and it should be noted that the intended meaning is consistent when the difference is not emphasized.

As shown in fig. 3, the warning device in the embodiment of the present application may be implemented by the computer device (or system) in fig. 3.

Fig. 3 is a schematic diagram of a computer device according to an embodiment of the present application. The computer device 300 comprises at least one processor 301, a communication bus 302, a memory 303 and at least one communication interface 304.

The processor 301 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application-Specific Integrated Circuit (ASIC), or one or more Integrated circuits for controlling the execution of programs according to the present disclosure.

The communication bus 302 may include a path that conveys information between the aforementioned components.

The communication interface 304 may be any transceiver or other communication Network, such as an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc.

Memory 303 may be a Read-Only Memory (ROM) or other type of static Memory device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic Memory device that can store information and instructions, an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical disk storage, optical disk storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integral to the processor.

The memory 303 is used for storing application program codes for executing the scheme of the application, and the processor 301 controls the execution. The processor 301 is configured to execute the application program code stored in the memory 303, thereby implementing the alarm method in the embodiment of the present application.

In particular implementations, processor 301 may include one or more CPUs such as CPU0 and CPU1 in fig. 3, for example, as an example.

In particular implementations, computer device 300 may include multiple processors, such as processor 301 and processor 308 in FIG. 3, as an example. Each of these processors may be a single-core (or multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

In particular implementations, computer device 300 may also include an output device 305 and an input device 306, as one embodiment. The output device 305 is in communication with the processor 301 and may display information in a variety of ways. For example, the output device 305 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED), a Cathode Ray Tube (CRT), a projector, or the like. The input device 306 is in communication with the processor 301 and can accept user input in a variety of ways. For example, the input device 306 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.

The computer device 300 may be a general purpose computer device or a special purpose computer device. In a specific implementation, the computer device 300 may be a desktop computer, a portable computer, a web server, a Personal Digital Assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, a communication device, an embedded device, or a device with a similar structure as in fig. 3. The embodiment of the present application does not limit the type of the computer apparatus 300.

As shown in fig. 4, a flowchart of an alarm method provided in the embodiment of the present application includes steps S401 to S402:

s401, the alarm device acquires a target resource list according with the resource filtering condition according to the resource filtering condition defined in the alarm rule established in advance.

In the embodiment of the application, the pre-created alarm rule defines a resource filtering condition, a monitoring parameter of a target resource and an alarm threshold of the monitoring parameter.

Of course, in the embodiment of the present application, information such as a level of the alarm rule, an alarm state of the alarm rule, a tenant identifier, and a user identifier may also be defined in the alarm rule created in advance, which is not specifically limited in the embodiment of the present application.

For example, the monitoring parameter of the target resource may be a CPU usage rate of the virtual machine, a hard disk usage rate of the virtual machine, a disk read-write rate of the virtual machine, a network rate of the virtual machine, and the like, which is not specifically limited in this embodiment of the present application.

S402, for each target resource in the target resource list, the alarm device respectively executes the following operations:

t1: the alarm device obtains the current value of the monitoring parameter of the target resource.

T2: the alerting device determines whether a current value of the monitored parameter of the target resource is within an alert threshold range.

T3: and if the current value of the monitoring parameter of the target resource is within the alarm threshold value range, the alarm device sends an alarm message to the external equipment.

Optionally, the alarm message may include an identifier of the target resource, so that after receiving the alarm message, the external device may query a location of the resource where the alarm occurs according to the identifier of the target resource, and further process the resource where the alarm occurs.

Optionally, in this embodiment of the application, after the warning device determines whether the current value of the monitoring parameter of the target resource is within the warning threshold range, an update message may also be sent to a database storing the monitoring parameter of the target resource, where the update message carries the warning state corresponding to the monitoring parameter of the target resource and the identifier of the target resource, so that the database updates the warning state corresponding to the monitoring parameter of the target resource stored in the database according to the warning state carried in the update message. If one alarm rule is applicable to M resources in the resource list, wherein M is a positive integer greater than or equal to 1, the alarm state corresponding to the monitoring parameters of the M resources needs to be updated, so that the external device can inquire the identifier of the resource where the alarm occurs according to the updated alarm state, and then determine the position of the resource where the alarm occurs according to the identifier of the resource where the alarm occurs, thereby processing the resource where the alarm occurs. Among them, the alarm state generally includes three types: data is insufficient, normal and alarm. In the embodiment of the application, if the current value of the monitoring parameter of the target resource is within the alarm threshold range, the corresponding alarm state is 'alarm'; if the current value of the monitoring parameter of the target resource is not within the alarm threshold value range, the corresponding alarm state is normal; if the current value of the monitoring parameter of the target resource is missing and cannot be judged whether the current value is within the alarm threshold range, the corresponding alarm state is 'data shortage'.

Compared with the existing method that an alarm rule needs to be established for each resource in a group of resources when an alarm is established for the same monitoring parameter of the group of resources in a large-scale resource scene of an OpenStack platform, the alarm method provided by the application can perform alarm monitoring only by establishing one alarm rule, so that redundancy generated by establishing a large number of alarm rules is reduced, the alarm efficiency in the large-scale scene is improved, and the management cost of the OpenStack platform is further reduced.

Further, the alarm rule may further define an aggregation function of the monitoring parameters of the target resource and a time span of monitoring the target resource, and the obtaining, by the alarm device, the current value of the monitoring parameters of the target resource may specifically include: and the alarm device calls an aggregation function of the monitoring parameters of the target resource to query the statistical database according to the identification of the target resource, the monitoring parameters of the target resource and the time span to obtain the current value of the monitoring parameters of the target resource. The statistical database comprises the corresponding relation among the identification of the target resource, the monitoring parameters of the target resource and the time span.

Optionally, the aggregation function may be a function of averaging, a function of minimum, a function of maximum, and a function of variance, which is not specifically limited in this embodiment of the present application.

In one possible implementation, the resource filtering condition may include: a type of target resource; or the type of the target resource meeting the preset condition.

For example, the type of target resource may be a virtual machine, hard disk, server, etc.

In the embodiment of the application, the type of the target resource may be directly defined in the resource filtering condition, or the type of the target resource may be determined by defining some parameters related to the type of the target resource in the resource filtering condition, for example, when the type of the target resource is a virtual machine, some parameters related to the virtual machine may be defined in the resource filtering condition, and these parameters may uniquely determine that the type of the target resource is the virtual machine; or, when the type of the target resource is a hard disk, some parameters related to the hard disk may be defined in the resource filtering condition, and the parameters may uniquely determine that the type of the target resource is a hard disk; or, when the target resource type is a server, some parameters related to the server may be defined in the resource filtering condition, and these parameters may uniquely determine that the target resource type is the server; the embodiment of the present application is not particularly limited to this.

The type of the target resource meeting the preset condition may be a group of virtual machines to which the user adds a "vip _ vm" tag, and the like.

The resource filtering condition defined in the alarm rule can be the type of the target resource meeting the preset condition, so that the alarm rule can be established for monitoring a group of special resources according to the user requirement, and the user experience is further improved.

Optionally, when a virtual machine is newly added to the OpenStack platform, a user may also add a "vip _ vm" tag to the newly added virtual machine, so that an alarm rule for performing alarm monitoring on the same monitoring parameter of the virtual machine carrying the "vip _ vm" tag stored in the database may be used to perform alarm monitoring on the same monitoring parameter of the newly added virtual machine, and it is no longer necessary to create an alarm rule for a certain monitoring parameter again according to the existing alarm method, which reduces redundancy generated by creating redundant alarm rules, improves the alarm efficiency, and further reduces the management cost of the OpenStack platform.

Further, the alarm rule may also define a packet key. In this way, after the warning device obtains the target resource list meeting the resource filtering condition according to the resource filtering condition defined in the warning rule created in advance (step S401), the method may further include: and the warning device groups the target resources in the target resource list according to the grouping keywords to obtain at least one group of target resources.

For example, in the embodiment of the present application, the target resources in the target resource list may be grouped by using the unique identifier of the resource as a grouping key to obtain at least one group of target resources, where each group of target resources includes one target resource. The unique identifier of the resource may be, for example, a Universal Unique Identifier (UUID) of the virtual machine, a UUID of the disk, or the like.

Of course, in this embodiment of the present application, the grouping key may be other, and a certain group of target resources in at least one group of target resources obtained after grouping may also include multiple target resources, which is not specifically limited in this embodiment of the present application.

Optionally, the alert message sent by the alerting device to the external device may further include a group identifier of the group in which the target resource is located. Thus, after receiving the alarm message, the external device can quickly query the grouped position of the resource generating the alarm according to the grouped identification of the target resource, and further can determine the position of the resource generating the alarm according to the identification of the resource generating the alarm, thereby processing the resource generating the alarm.

It should be noted that the alarm method in the embodiment of the present application may also be applied to an existing Composite manner, for example, a set of resources is monitored by defining an alarm rule through "and" or "and the like in combination with a plurality of threshold ranges of a plurality of monitoring parameters in the set of resources, which may specifically refer to the description of the method embodiment, and the embodiment of the present application is not described herein again.

In summary, in a large-scale resource scene of the OpenStack platform, when an alarm is created for the same monitoring parameter of a group of resources, compared with the existing method that an alarm rule needs to be created for each resource in a group of resources, the alarm method provided by the application can perform alarm monitoring only by creating one alarm rule, so that redundancy caused by creating a large number of alarm rules is reduced, the alarm efficiency in the large-scale scene is improved, and the management cost of the OpenStack platform is reduced.

For example, assume that system a includes 5 virtual machines and 5 hard disks, and each virtual machine includes 1 CPU. Table 1 shows the CPU utilization of 5 virtual machines of system a at time 1, and table 2 shows the utilization of 5 hard disks of system a at time 1.

TABLE 1

TABLE 2

Resource list	Monitoring parameters
		Hard disk 1 of system A	The utilization rate of the hard disk 1 is 67 percent
Hard disk 2 of system A	The utilization rate of the hard disk 2 is 54%
		Hard disk 3 of system A	The utilization rate of the hard disk 3 is 85 percent
Hard disk 4 of system A	The utilization rate of the hard disk 4 is 49 percent
		Hard disk 5 of system A	The utilization rate of the hard disk 5 is 23 percent

Suppose that the user needs to monitor two sets of resources of system a:

1. monitoring whether the CPU utilization rate of 5 virtual machines of the system A at a certain moment is more than or equal to 80%, if so, triggering an alarm, and sending a message requesting to release resources to external equipment.

2. Monitoring whether the utilization rate of 5 hard disks of the system A at a certain moment is more than or equal to 80%, and triggering an alarm if the utilization rate is more than or equal to 80%.

According to the existing method for creating the alarm rule, a user needs to create 10 alarm rules for two groups of resources in the system A, but the former 5 alarm rules in the 10 alarm rules are the same except for UUIDs, and the latter 5 alarm rules are the same except for UUIDs.

Taking the utilization rates of the CPUs of the 5 virtual machines of the monitoring system a as an example, the alarm method provided according to the above embodiment may perform the following processing on the group of virtual machines:

firstly, the warning device creates a warning rule according to the requirement of a user, and the warning rule specifically can be as follows: and triggering an alarm when the CPU utilization rate of the virtual machine of the system A at a certain moment is more than or equal to 80%, and sending a message for requesting to release resources to the external equipment after triggering the alarm. The resource filtering condition defined in the alarm rule is the virtual machine of the system a, the monitoring parameter of the target resource is the utilization rate of the CPU of the virtual machine, the threshold value of the monitoring parameter of the target resource is greater than or equal to 80%, and the execution action after triggering the alarm is to send a message requesting to release the resource to the external device.

It should be noted that, in the embodiment of the present application, when the alarm device creates the alarm rule, it is usually necessary to check whether each defined parameter is correct, and if it is correct, the alarm rule is created and stored in the database.

It should be noted that the alarm rule in the embodiment of the present application may or may not define the action to be executed after the alarm is triggered, and this is not specifically limited in the embodiment of the present application.

Secondly, the warning device obtains a target resource list meeting the resource filtering condition as a first column in table 1 according to a resource filtering condition "virtual machine of system a" defined in a warning rule established in advance, and the target resource list includes 5 virtual machines of system a.

Thirdly, the alarm device performs the following operations on 5 virtual machines in the target resource list meeting the resource filtering condition respectively:

for virtual machine 1 of system a: first, the warning device acquires the current value of the usage rate of the CPU1 of the virtual machine 1, and as can be seen from table one, the usage rate of the CPU1 is 40% at time 1. Next, the warning device determines whether the current value of the usage rate of the CPU1 is 80% or more, and if it is 80% or more, a warning is triggered. Since 40% is less than 80%, the current value of CPU1 usage is not within the warning threshold range and therefore no warning message may be sent. Optionally, the warning device may send an update message to the database storing the monitoring parameters of the target resource, where the update message carries the warning state of the usage rate of the CPU1 of the virtual machine 1 as "normal" and the identifier of the virtual machine 1, so that the database updates the warning state corresponding to the usage rate warning state of the CPU1 of the virtual machine 1 stored in the database as "normal" according to the warning state carried in the update message as "normal".

For virtual machine 2 to virtual machine 4 of system a: as can be seen from table one, the utilization rate of the CPU2 of the virtual machine 2, the utilization rate of the CPU3 of the virtual machine 3, and the utilization rate of the CPU4 of the virtual machine 4 are not within the alarm threshold range at time 1, so that the processing method for the virtual machine 1 can be referred to above, and details of the embodiment of the present application are not repeated herein.

For virtual machine 5 of system a: first, the alarm device acquires the current value of the usage rate of the CPU5 of the virtual machine 5, and as can be seen from table one, the usage rate of the CPU5 is 81% at time 1. Next, the warning device determines whether the current value of the usage rate of the CPU5 is 80% or more, and if it is 80% or more, a warning is triggered. Since 81% is greater than 80%, the usage rate of the CPU5 is in the range of the warning, and thus the warning means transmits the warning message to the external device. Optionally, the warning device may send an update message to the database storing the monitoring parameters of the target resource, where the warning state carrying the usage rate of the CPU5 of the virtual machine 5 in the update message is "warning" and the identifier of the virtual machine 5, so that the database updates the warning state corresponding to the usage rate warning state of the CPU5 of the virtual machine 5 stored in the database to be "warning" according to the warning state carried in the update message being "warning". Because the alarm rule also defines that the alarm triggers the alarm and then sends a message requesting to release to the external equipment, after the alarm device sends the alarm message to the external device, the message requesting to release the resource can also be sent to the external equipment. Optionally, the alarm message sent by the alarm device to the external device may carry an identifier of the virtual machine 5 of the system a, so as to indicate that the alarm is sent by the virtual machine 5 of the system a currently.

Similarly, when the user needs to monitor the usage rate of the 5 hard disks of the system a, the alarm device may create an alarm rule according to the user's requirement. The alarm rule may specifically be: and triggering an alarm when the hard disk utilization rate of the hard disk of the system A at a certain moment is more than or equal to 80%. The resource filtering condition defined in the alarm rule is a hard disk of the system A, the monitoring parameter of the target resource is the utilization rate of the hard disk, and the threshold value of the monitoring parameter of the target resource is greater than or equal to 80%. The method for monitoring the utilization rate of 5 hard disks of the system a based on the alarm rule may refer to the method for monitoring the CPU utilization rate, and the embodiment of the present application is not described herein again.

It can be seen from the above example that, in a large-scale resource scene of the OpenStack platform, compared with the existing method that an alarm rule needs to be created for each resource in a group of resources, the alarm method provided by the present application can perform alarm monitoring only by creating one alarm rule, thereby reducing redundancy caused by creating a large number of alarm rules, improving alarm efficiency in a large-scale scene, and further reducing management cost of the OpenStack platform.

The above description mainly introduces the scheme provided by the present application from the perspective of an alarm method. It is understood that, in order to implement the above functions, the warning device in the warning method includes a hardware structure and/or a software module corresponding to the hardware structure and/or the software module for performing each function. Those of skill in the art will readily appreciate that the modules and method steps of the various examples described in connection with the embodiments disclosed herein may be implemented as hardware or a combination of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the alarm device may be divided into the functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.

In the case of dividing each function module according to each function, fig. 5 shows a possible structural diagram of the warning device according to the above embodiment, where the warning device 500 includes: an acquisition module 501, a determination module 502 and a sending module 503. The obtaining module 501 is configured to support the warning device 500 to execute step S401 in fig. 4, and T1 in step S402; the determining module 502 is used to support the warning device 500 to execute T2 in step S402 in fig. 4; the sending module 503 is used to support the warning apparatus 500 to execute T3 in step S402 in fig. 4.

Optionally, the alarm rule further defines an aggregation function of the monitoring parameters of the target resource and a time span for monitoring the target resource. The obtaining module 501 obtains the current value of the monitoring parameter of the target resource, which may specifically include: and calling an aggregation function of the monitoring parameters of the target resource to query a statistical database according to the identification of the target resource, the monitoring parameters of the target resource and the time span for monitoring the target resource to obtain a current value of the monitoring parameters of the target resource, wherein the statistical database comprises the corresponding relation between the identification of the target resource, the monitoring parameters of the target resource and the time span for monitoring the target resource.

Optionally, the alarm rule further defines a group keyword. As shown in fig. 5, the alerting device 500 may also include a grouping module 504. The grouping module 504 is configured to, after the obtaining module 501 obtains the target resource list meeting the resource filtering condition according to the resource filtering condition defined in the pre-created alarm rule, group the target resources in the target resource list meeting the resource filtering condition according to the grouping keyword, and obtain at least one group of target resources.

All relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.

Compared with the existing method that an alarm rule needs to be established for each resource in a group of resources in a large-scale resource scene of an OpenStack platform, the alarm method provided by the application can perform alarm monitoring only by establishing one alarm rule, so that redundancy generated by establishing a large number of alarm rules is reduced, the alarm efficiency in the large-scale scene is improved, and the management cost of the OpenStack platform is further reduced.

In the case of an integrated unit, fig. 6 shows a schematic diagram of a possible structure of the warning device according to the above embodiment, and the warning device 600 includes: a processing unit 601 and a communication unit 602. Wherein, the processing unit 601 is used to support the alerting device 600 to execute T1 and T2 in steps S401 and S402 in fig. 4; the communication unit 602 is used to support the alerting device 600 to perform T3 in step S402 in fig. 4.

Optionally, the alarm rule further defines an aggregation function of the monitoring parameters of the target resource and a time span for monitoring the target resource. The processing unit 601 is further configured to invoke an aggregation function of the monitoring parameters of the target resource to query the statistical database according to the identifier of the target resource, the monitoring parameters of the target resource, and the time span of the monitored target resource, so as to obtain a current value of the monitoring parameters of the target resource, where the statistical database includes a corresponding relationship between the identifier of the target resource, the monitoring parameters of the target resource, and the time span of the monitored target resource.

Optionally, the alarm rule further defines a group keyword. The processing unit 601 is further configured to, after acquiring a target resource list meeting the resource filtering condition according to the resource filtering condition defined in the pre-created alarm rule, group target resources in the target resource list meeting the resource filtering condition according to the group keyword, and acquire at least one group of target resources.

Compared with the existing method that an alarm rule needs to be established for each resource in a group of resources in a large-scale resource scene of an OpenStack platform, the alarm device provided by the application can perform alarm monitoring only by establishing one alarm rule, so that redundancy generated by establishing a large number of alarm rules is reduced, the alarm efficiency in the large-scale scene is improved, and the management cost of the OpenStack platform is further reduced.

The embodiment of the present application further provides a computer storage medium, which is used for storing computer software instructions for the above alarm device, and which contains a program designed for executing the above method embodiment. The alarm method may be implemented by executing a configured program.

The embodiment of the present application further provides a computer program, which includes instructions, when the computer program is executed by a computer, the computer may execute the procedures of the above method embodiments.

While the present application has been described in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed application, from a review of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus (device), or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. A computer program stored/distributed on a suitable medium supplied together with or as part of other hardware, may also take other distributed forms, such as via the Internet or other wired or wireless telecommunication systems.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Although the present application has been described in conjunction with specific features and embodiments thereof, it will be evident that various modifications and combinations can be made thereto without departing from the spirit and scope of the application. Accordingly, the specification and figures are merely exemplary of the present application as defined in the appended claims and are intended to cover any and all modifications, variations, combinations, or equivalents within the scope of the present application. It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. An alarm method, which is applied to a system, is characterized in that the method comprises the following steps:

acquiring a target resource list meeting a resource filtering condition according to the resource filtering condition defined in a pre-established alarm rule, wherein the alarm rule defines the resource filtering condition, a monitoring parameter of the target resource and an alarm threshold of the monitoring parameter; the alarm rule also defines a grouping keyword; grouping the target resources in the target resource list according to the grouping keywords to obtain at least one group of target resources;

for each target resource in each group of target resources in the target resource list, respectively performing the following operations:

acquiring the current value of the monitoring parameter of the target resource;

determining whether the current value is within the alert threshold range;

and if the current value is within the alarm threshold range, sending an alarm message.

2. The method of claim 1, wherein the alarm rules further define an aggregation function of the monitored quantities and a time span for monitoring the target resource;

the obtaining of the current value of the monitoring parameter of the target resource includes:

and calling the aggregation function to query a statistical database according to the identification of the target resource, the monitoring parameters and the time span to obtain the current value of the monitoring parameters of the target resource, wherein the statistical database comprises the corresponding relation among the identification of the target resource, the monitoring parameters and the time span.

3. The method of claim 1 or 2, wherein the resource filtering condition comprises: a type of the target resource; or the type of the target resource which meets the preset condition.

4. The method of claim 1, wherein the alert message comprises an identification of the target resource.

5. The method of claim 1, wherein the alert message further comprises a group identification of the group in which the target resource is located.

6. An alarm device, which is applied to a system, the alarm device comprising: the device comprises an acquisition module, a determination module, a sending module and a grouping module;

the acquisition module is used for acquiring a target resource list meeting the resource filtering condition according to the resource filtering condition defined in a pre-established alarm rule, wherein the alarm rule defines the resource filtering condition, the monitoring parameter of the target resource and the alarm threshold of the monitoring parameter; the alarm rule also defines a grouping keyword; the grouping module is used for grouping the target resources in the target resource list according to the grouping keywords to obtain at least one group of target resources;

for each target resource in each set of target resources in the target resource list:

the acquisition module is further used for acquiring the current value of the monitoring parameter of the target resource;

the determining module is used for determining whether the current value is within the alarm threshold range;

and the sending module is used for sending an alarm message to the external equipment if the current value is within the alarm threshold range.

7. The apparatus of claim 6, wherein the alarm rules further define an aggregation function of the monitored quantities and a time span for monitoring the target resource;

the obtaining module is further configured to obtain a current value of the monitoring parameter of the target resource, and specifically includes:

8. The apparatus of claim 6 or 7, wherein the resource filtering condition comprises: a type of the target resource; or the type of the target resource which meets the preset condition.

9. The apparatus of claim 6, wherein the alert message comprises an identification of the target resource.

10. The apparatus of claim 6, wherein the alert message further comprises a group identification of the group in which the target resource is located.

11. An alert device, comprising: a processor, a memory, a bus, and a communication interface;

the memory is used for storing computer execution instructions, the processor is connected with the memory through the bus, and when the alarm device runs, the processor executes the computer execution instructions stored by the memory so as to enable the alarm device to execute the alarm method according to any one of claims 1-5.

12. A computer storage medium storing computer software instructions for an alerting method of any of claims 1-5, comprising a program designed for performing the alerting method of any of claims 1-5.

13. A computer program product, characterized in that the computer program product comprises instructions which, when executed by a computer, cause the computer to carry out the procedure in the alerting method of any one of claims 1-5.