CN114185744A - Alarm information aggregation method, device, monitoring system and storage medium - Google Patents

Alarm information aggregation method, device, monitoring system and storage medium Download PDF

Info

Publication number
CN114185744A
CN114185744A CN202111524740.1A CN202111524740A CN114185744A CN 114185744 A CN114185744 A CN 114185744A CN 202111524740 A CN202111524740 A CN 202111524740A CN 114185744 A CN114185744 A CN 114185744A
Authority
CN
China
Prior art keywords
alarm
alarm information
attribute
information
aggregation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111524740.1A
Other languages
Chinese (zh)
Inventor
李子佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pingan Payment Technology Service Co Ltd
Original Assignee
Pingan Payment Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pingan Payment Technology Service Co Ltd filed Critical Pingan Payment Technology Service Co Ltd
Priority to CN202111524740.1A priority Critical patent/CN114185744A/en
Publication of CN114185744A publication Critical patent/CN114185744A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • G06F11/3082Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting the data filtering being achieved by aggregating or compressing the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the application is suitable for the technical field of pedestal operation and maintenance, and provides an alarm information aggregation method, an alarm information aggregation device, a monitoring system and a storage medium, wherein the alarm information aggregation method is applied to the monitoring system, and comprises the following steps: acquiring alarm information generated when a system is abnormal; the alarm information comprises attribute values corresponding to a plurality of alarm attributes respectively; decomposing the alarm information step by step according to the plurality of alarm attributes to update the aggregation tree; the aggregation tree comprises a plurality of child nodes, and each child node corresponds to an attribute value; for any current terminal child node, if the current terminal child node and other terminal nodes are brother nodes, aggregating the alarm information corresponding to the current terminal child node and other terminal nodes to obtain aggregated information; and sending the aggregation information to a user terminal of a worker. By adopting the method, the sending cost of the monitoring system for sending the alarm information can be reduced.

Description

Alarm information aggregation method, device, monitoring system and storage medium
Technical Field
The application belongs to the technical field of pedestal operation and maintenance, and particularly relates to an alarm information aggregation method, an alarm information aggregation device, a monitoring system and a storage medium.
Background
A cluster system is a system that handles complex computational problems by connecting multiple machines together, which can provide uninterrupted service for many applications. For a cluster system with a complex structure, such as a virtual machine cluster system used for cloud computing, a corresponding monitoring system is usually required to be equipped to monitor the operation conditions of each virtual machine device in the virtual machine cluster system in real time. Generally, when a monitoring system finds that a virtual machine device is abnormal, alarm information is generated and sent to operation and maintenance personnel.
However, in the prior art, a large amount of information redundancy generally exists in the generated alarm information, and especially when a large amount of alarm information occurs in a virtual machine cluster system in a short time, the monitoring system also sends the generated alarm information to the operation and maintenance personnel one by one. Therefore, not only the operation and maintenance personnel will spend a lot of time reading a lot of alarm information, but also the sending cost of the alarm information sent by the monitoring system is increased.
Disclosure of Invention
The embodiment of the application provides an alarm information aggregation method, an alarm information aggregation device, a monitoring system and a storage medium, and can solve the problem that the sending cost is high when the monitoring system sends alarm information.
In a first aspect, an embodiment of the present application provides an alarm information aggregation method, which is applied to a monitoring system, and the method includes:
acquiring alarm information generated when a system is abnormal; the alarm information comprises attribute values corresponding to a plurality of alarm attributes respectively;
decomposing the alarm information step by step according to the plurality of alarm attributes to update the aggregation tree; the aggregation tree comprises a plurality of child nodes, and each child node corresponds to an attribute value of one alarm attribute;
for any current terminal node, if the current terminal child node and other terminal nodes are brother nodes, aggregating the alarm information corresponding to the current terminal child node and other terminal nodes to obtain aggregated information;
and sending the aggregation information to a user terminal of a worker.
In a second aspect, an embodiment of the present application provides an alarm information aggregation apparatus, which is applied to a monitoring system, and the apparatus includes:
the acquisition module is used for acquiring alarm information generated when the system is abnormal; the alarm information comprises attribute values corresponding to a plurality of alarm attributes respectively;
the updating module is used for decomposing the alarm information step by step according to the plurality of alarm attributes so as to update the aggregation tree; the aggregation tree comprises a plurality of child nodes, and each child node corresponds to an attribute value of one alarm attribute;
the aggregation module is used for aggregating the alarm information corresponding to the current terminal child node and other terminal nodes to obtain aggregated information if the current terminal child node and other terminal nodes are brother nodes aiming at any current terminal child node;
and the sending module is used for sending the aggregation information to the user terminal of the staff.
In a third aspect, an embodiment of the present application provides a monitoring system, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method according to the first aspect when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, where the computer program is implemented to implement the method of the first aspect as described above when executed by a processor.
In a fifth aspect, embodiments of the present application provide a computer program product, which, when run on a monitoring system, causes the monitoring system to perform the method of the first aspect.
Compared with the prior art, the embodiment of the application has the advantages that: for the alarm information generated when the system is abnormal, the monitoring system can decompose the alarm information step by step according to the alarm attribute to generate a polymerization tree. Then, the monitoring system may aggregate the alarm information corresponding to each terminal child node belonging to the same previous child node in the aggregation tree, to obtain an aggregated information. Therefore, the monitoring system does not need to store and send each piece of alarm information independently, and only needs to send one piece of aggregated information. Furthermore, the monitoring system can realize larger information compression and reduce the sending cost of a large amount of alarm information on the basis of keeping the whole amount of alarm information and avoiding information discarding.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a flowchart illustrating an implementation of an alarm information aggregation method according to an embodiment of the present application;
fig. 2 is a flowchart illustrating an implementation of an alarm information aggregation method according to another embodiment of the present application;
fig. 3 is a schematic structural diagram of an aggregation tree in an alarm information aggregation method according to an embodiment of the present application;
fig. 4 is a schematic diagram illustrating an implementation manner of S103 of a method for aggregating alarm information according to an embodiment of the present application;
fig. 5 is a schematic diagram illustrating an implementation manner of S102 of a method for aggregating alarm information according to an embodiment of the present application;
fig. 6 is a schematic diagram illustrating an implementation manner of S2 of an alarm information aggregation method according to an embodiment of the present application;
fig. 7 is a schematic diagram illustrating an implementation manner of S104 of a method for aggregating alarm information according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an alarm information aggregation apparatus according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a monitoring system according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Referring to fig. 1, fig. 1 shows a flowchart of an implementation of an alarm information aggregation method provided in an embodiment of the present application, where the method includes the following steps:
s101, a monitoring system acquires alarm information generated when the system is abnormal; the alarm information includes attribute values corresponding to a plurality of alarm attributes, respectively.
In an embodiment, the system includes, but is not limited to, a centralized cluster system, a load imbalance system, and a virtualized cluster system. For example, the system may be a virtualized cluster system using cloud computing functionality, and the virtualized cluster system includes a plurality of virtual machine devices for running. The executing body executing the steps in S101-S104 may be a monitoring system for monitoring the operation condition of the entire virtualization cluster system in real time. The monitoring system is mainly used for generating alarm information and sending the alarm information to a terminal used by a worker when the monitoring system monitors that the system is abnormal.
In one embodiment, the plurality of alarm attributes include, but are not limited to, attributes such as alarm source, alarm type, alarm level, application name, IP address, text description information, alarm time, and event identifier. Wherein, the alarm source is a monitoring platform for discovering abnormity. Specifically, the monitoring system generally includes two monitoring platforms. For example, a monitoring platform for monitoring the resource usage amount of each virtual machine device in the virtualized cluster system and a platform for monitoring the access amount of each virtual machine device in the virtualized cluster system and/or whether the access request is timed out are included.
In an embodiment, the alarm level is used to identify an alarm urgency level of the alarm information, and may be divided according to a level rule preset by a worker. The alarm level degree can be classified into A, B, C types. Wherein the urgency is A > B > C.
In an embodiment, the application name is a name of an application in which an exception occurs. The IP address is an IP address of a virtual machine device where the application having the abnormality is located. The text description information is used for describing the exception and can be generated by the virtualization cluster system according to a pre-configured alarm template. The alarm time is the time when the abnormality occurs. The event identifier may be an event identifier corresponding to the alarm information, has a unique identifier, and may be identified by the virtualization cluster system.
In an embodiment, when the alarm information is obtained, the alarm information generated when the system is abnormal generally includes a large amount of redundant information. I.e. not only the above mentioned plurality of alert attributes but also a plurality of other non-essential alert attributes. At this time, the monitoring system can also analyze and standardize the acquired alarm information. Namely, the alarm source, the alarm type, the alarm level, the application name, the IP address and the text description information are used as a plurality of alarm attributes, and attribute values corresponding to other unnecessary alarm attributes in the alarm information are deleted.
In an embodiment, the attribute value is an information value for specifically describing the alarm information. For example, the alarm source attribute of the alarm information may have an attribute value of alarm source 1; and, for the application name attribute, the attribute value may be the application name corresponding to the application in which the abnormality occurs, and the like.
In one embodiment, each alarm message has a unique event identifier; the alarm attribute also comprises newly increased alarm and alarm recovery; referring to fig. 2, after acquiring the alarm information generated when the system is abnormal in S101, the following steps S11-S12 are further included to process the generated alarm information:
and S11, if the alarm information is a new alarm, the same event identifier is not received within the preset time period, and the alarm information is recovered, the monitoring system gradually decomposes the alarm information for the new alarm.
And S12, if the alarm information is a new alarm, and the event identifiers are the same in the preset time period and are the rest of the alarm information recovered by the alarm, deleting the alarm information and the rest of the alarm information by the monitoring system.
In an embodiment, the event identifier is already explained in the above S101, and will not be described again. It should be noted that the alarm information is generally divided into a new alarm and an alarm recovery. It will be appreciated that there are some exceptions in the virtualized cluster system and recovery can be fast within a certain time. Thus, the virtualized cluster system will generate two types of alarm messages over time, and both types of alarm messages are directed to the same anomaly. Therefore, the event identifications of the two kinds of alarm information are the same, and are respectively 'new alarm' and 'alarm recovery' in sequence, and the alarm time difference is smaller.
In this case, if the interval time between the two pieces of alarm information is short enough and there is no actual influence on the operation service of the virtualized cluster system, the two pieces of alarm information may be considered to cancel each other. I.e. the alarm information as well as the new alarm information can be deleted.
It can be understood that, if the same event identifier is not received within the preset time period and the alarm information is the alarm recovery alarm information, the alarm information of the newly-added alarm generated before may be inserted into the aggregation tree, or the alarm information and the existing multiple alarm information in the aggregation tree may be decomposed step by step to generate a new aggregation tree.
The preset time period can be set by a worker according to actual conditions.
It is necessary to supplement that, for the generated new alarm information, it also needs to compare with the existing multiple alarm information in the aggregation tree, and if there is information in the existing multiple alarm information that has the same attribute value as all the attributes of the new alarm information, the new alarm information may be deleted. Therefore, the quantity of alarm information required to be processed by the monitoring system can be reduced.
S102, the monitoring system decomposes the alarm information step by step according to a plurality of alarm attributes so as to update the aggregation tree; the aggregation tree comprises a plurality of child nodes, and each child node corresponds to an attribute value of one type of alarm attribute.
In an embodiment, the aggregation tree is a tree diagram that is built according to a plurality of alarm attributes. Each child node in the aggregation tree corresponds to an attribute value of one alarm attribute, and in a tree-shaped path from a root node in the aggregation tree to any terminal child node in the aggregation tree, each child node in the path corresponds to an attribute value of each alarm attribute contained in one alarm information.
Exemplarily, taking fig. 3 as an example, fig. 3 is a schematic structural diagram of an aggregation tree. The root node is not shown in fig. 3, and the attribute for the source of the alarm may be the next child node of the root node. Namely, a plurality of alarm information are firstly divided by the alarm attribute of the alarm source, and the alarm information can be divided into child nodes of the alarm source 1 and child nodes of the alarm source 2. Then, dividing the alarm information belonging to the alarm source 1 by taking the application name as an alarm attribute again; the node can also be divided into child nodes of the application name 1 and child nodes of the application name N; and repeatedly carrying out the segmentation on the alarm information contained in each child node by using other alarm attributes. At this time, the alarm information divided by using the IP address as the alarm attribute is specifically divided into alarm information corresponding to the IP address 1, the IP address 2, the IP address 3, the IP address 4, the IP address 5, and the like.
Based on this, it can be understood that the attribute values of the alarm attributes such as the alarm source 1, the application name 1, etc. in fig. 3 may be all child nodes in the aggregation tree, and the IP address 1, the IP address 2, the IP address 3, the IP address 4, and the IP address 5 may be considered as terminal child nodes in the aggregation tree. At this time, the attribute value corresponding to the child node included in the IP address 1 from the root node is the attribute value of all the alarm attributes included in one type of alarm information.
The updating of the aggregation tree may be to insert the alarm information into the value aggregation tree according to each child node in the aggregation tree to generate a new terminal child node.
S103, aiming at any current tail terminal node, if the current tail terminal node and other tail terminal nodes are brother nodes, the monitoring system aggregates the alarm information corresponding to the current tail terminal node and other tail terminal nodes to obtain aggregated information.
In one embodiment, if the previous child node includes the current end child node and other end child nodes, the other end child nodes may be considered as siblings of the current end child node. That is, the current end child node and the other end child nodes have the same parent node, and as can be seen from fig. 3, if the current end child node is IP address 1, IP address 2, IP address 3, IP address 4, and IP address 5 should all be siblings of IP address 1.
In addition, it can be understood that the alarm information corresponding to the IP address 1 should have different attribute values from the alarm information corresponding to other sibling nodes and only the attribute values corresponding to the IP address are different. That is, the attribute values corresponding to other alarm attributes need to be the same. For example, all belong to the alarm source 1, and all belong to the alarm information generated by the application name 1, and so on.
At this time, for the alarm information corresponding to the current terminal child node and the brother node, the monitoring system does not need to send the alarm information to the user terminal one by one, but can aggregate the alarm information to form an aggregated information to send. Therefore, the number of times of sending of the monitoring system can be reduced, and larger information compression can be realized on the basis of keeping the whole amount of information and avoiding information discarding.
Specifically, the aggregation tree includes a root node; referring to fig. 4, the monitoring system may aggregate the alarm information through the following steps S1031 to S1033 to obtain aggregated information:
s1031, the monitoring system determines attribute values of alarm attributes corresponding to the child nodes between the root node and the previous child node respectively; the last child node is the parent node of the current end child node.
S1032, the monitoring system generates attribute value sets of the alarm attributes corresponding to the current terminal sub-node and the other terminal sub-nodes respectively.
In an embodiment, the process of aggregating the attribute values of the alarm attributes respectively corresponding to the current terminal child node and the other terminal child nodes by the monitoring system may be: and storing the attribute values corresponding to each terminal child node in a set or list mode to generate the attribute value set of the corresponding alarm attribute.
For example, taking fig. 3 as an example, the alarm attribute corresponding to the terminal child node is an IP address list. Based on this, its corresponding 1 attribute value set may be: IP address list ═ IP1, IP2, IP3, IP4, IP 5. That is, the monitoring system may record 5 attribute values of the alarm attribute simultaneously in the form of an IP address list.
S1033, the monitoring system maintains the attribute values of the alarm attributes corresponding to the sub-nodes in the alarm information unchanged according to any alarm information; and replacing the attribute value of the alarm attribute corresponding to the terminal child node in the alarm information to obtain aggregated information.
In one embodiment, as explained in S103 above, it can be seen that: the current terminal child node and other terminal child nodes are brother nodes and share a father node. Based on this, it can be considered that the attribute values of the respective alarm attributes corresponding to the parent node from the root node in each terminal child node are the same. Therefore, the monitoring system can determine the attribute values of the alarm attributes corresponding to the root node and the parent node as the same unique attribute value for storage. That is, the monitoring system may use the attribute values of the alarm attributes respectively corresponding to the root node to the parent node in any alarm information.
For example, taking fig. 3 as an example, 5 pieces of alarm information are shared in the aggregation tree, and the attribute values corresponding to all the alarm attributes (alarm source, application name,..) are the same except the alarm attribute of the IP address. Therefore, the 5 pieces of alarm information can be aggregated into the following aggregated information: alarm source 1, application 1, …, IP address list IP1, IP2, IP3, IP4, IP 5.
And S104, the monitoring system sends the aggregation information to the user terminal of the worker.
In an embodiment, according to the explanation in S1033, when the alarm information corresponding to the current end child node and the other end child nodes is aggregated, only the attribute values of the alarm attributes corresponding to the root node to the parent node in one alarm information may be used. And then, storing the attribute values corresponding to each terminal child node in a set or list mode, and generating the attribute value set of the corresponding alarm attribute.
In one embodiment, when a plurality of alarm messages are aggregated, unique and different attribute values are recorded in a list form, so that the pressure and time cost for reading a large amount of alarm messages by workers are reduced, and the sending cost of the large amount of alarm messages is reduced.
At this time, for the alarm information generated when the system is abnormal, the monitoring system can decompose the alarm information step by step according to the alarm attribute to generate the aggregation tree. Then, the monitoring system may aggregate the alarm information corresponding to each terminal child node belonging to the same previous child node in the aggregation tree, to obtain an aggregated information. Therefore, the monitoring system does not need to store and send each piece of alarm information independently, and only needs to send one piece of aggregated information. Furthermore, the monitoring system can realize larger information compression and reduce the sending cost of a large amount of alarm information on the basis of keeping the whole amount of alarm information and avoiding information discarding.
In one embodiment, each alarm message has a unique event identifier; the alarm attribute also comprises newly increased alarm and alarm recovery; referring to fig. 5, in step S102, the alarm information is progressively decomposed according to a plurality of alarm attributes to update the aggregation tree, which may be specifically implemented by the following sub-steps S1-S5:
s1, the monitoring system initializes a root node, the root node comprises an alarm information set formed by a plurality of alarm information and an alarm attribute set formed by a plurality of alarm attributes.
S2, the monitoring system determines the best attribute from the alarm attribute set according to the attribute values corresponding to the alarm attributes respectively.
In an embodiment, the initialization root node is: all the alarm information existing in the monitoring system is initialized, and an alarm information set comprising a plurality of alarm information and an alarm attribute set comprising a plurality of alarm attributes respectively contained in the plurality of alarm information are generated. The monitoring system may then determine the best attribute from the plurality of alarm attributes, i.e., the next child node after the initialized root node.
It should be added that if the grouping is performed randomly according to the alarm attribute, the structure of the aggregation tree finally generated by the monitoring system may be too complex, which is not favorable for performing the lossless compression on the alarm information in the aggregation tree subsequently. Namely can not be realized
Therefore, referring to FIG. 6, the monitoring system further needs to determine the best attribute from the set of alarm attributes through the following steps S21-S24:
s21, aiming at any current alarm attribute, the monitoring system determines a plurality of attribute values included in the current alarm attribute.
S22, the monitoring system respectively counts the number of the alarm information corresponding to each attribute value.
In an embodiment, taking the alarm attributes such as the alarm source, the alarm type, the alarm level, the application name, the IP address, and the text description information described in the above S101 as examples, the current alarm information is the alarm information currently processed in the 6 alarm information. And the attribute values are the attribute values contained in the current alarm attributes when the current alarm attributes are grouped. That is, if the current alarm attribute is an information source, the attribute values may be a plurality of alarm sources 1. Then, for a set of alarm information corresponding to each attribute value, the monitoring system may count the number of alarm information belonging to the set.
S23, the monitoring system calculates the information entropy of the current alarm attribute according to the number and the total number of the alarm information; the information entropy is used for measuring the aggregation degree when a plurality of alarm information is grouped by the current alarm attribute.
S24, the monitoring system determines the current alarm attribute corresponding to the minimum value in the information entropy as the optimal attribute.
In an embodiment, the monitoring system may specifically calculate the information entropy of the current alarm attribute by the following steps: aiming at any current attribute value in the current alarm attributes, calculating the ratio of the number of the alarm information corresponding to the current attribute value to the total number of the alarm information to obtain the probability that the attribute value in any alarm information is the current attribute value; calculating the initial information entropy of the current attribute value according to the probability; and adding the initial information entropies corresponding to each current attribute value in the current alarm attribute to obtain the information entropies of the current alarm attribute.
The initial information entropy of the current attribute value is calculated according to the probability, wherein the product of the probability and the probability logarithm is calculated, and the opposite number after the product is used as the initial information entropy of the current alarm attribute value. Specifically, the monitoring system may calculate the number and the total number of the alarm information through an information minimum entropy calculation formula, so as to obtain an information entropy used for representing the current alarm attribute. Specifically, the information minimum entropy calculation formula is as follows:
Figure BDA0003409747010000081
Figure BDA0003409747010000082
wherein the content of the first and second substances,
Figure BDA0003409747010000083
is the ith sub-node of the ith layer in the aggregation tree; h (S)ijA) is the information entropy about the alarm attribute a in the alarm information set S; p (a ═ a)k) Is the probability that the alarm information belongs to the kth attribute value in the alarm attribute a.
Specifically, the monitoring system performs layer-by-layer recursive decomposition on the alarm information set, and may define the root node as the 0 th layer. The child node aij is the jth child node of the ith level, where i is a child node level, specifically, i is 0,1,2, …,6, which indicates the child nodePoints are located at hierarchical positions throughout the aggregation tree; node sequence number j>And 0 represents the position number of the node in the ith layer of nodes, that is, the jth grouping after the alarm information set of the layer is grouped by the alarm attribute corresponding to the child node i. Wherein, layer 0 is taken as an example, the alarm information set formed may be S0,0The corresponding set of alarm attributes may be p0,0And initializing the attribute values contained in the alarm attribute set to be root nodes.
Wherein, the above H (S, a) represents the information entropy of the alarm information set S about the alarm attribute a, and V attribute values of the alarm attribute a in the alarm information set S are defined and are recorded as a set { a1,a2,…,aV}; wherein, P (a ═ a)k)∈(0,1]The alarm information in the set S belongs to the alarm attribute a ═ akThe probability of (c).
The above formula is specifically expressed as: for node pi,jHaving a set of alarm messages Si,jAnd alarm attribute set Ai,j. From the alarm attribute set Ai,jTo select the best alarm attribute
Figure BDA0003409747010000084
As a grouping basis. According to different attribute values of the optimal alarm attribute, collecting alarm information Si,jGrouping the alarm information with the same attribute value into the same group, and recording the group as a subset Si+1,j′And is distributed to the same child node pi+1,j′Simultaneously from Ai,jMiddle culling attributes
Figure BDA0003409747010000091
As attribute subset attribute Ai+1,j′Wherein j'>=0。
Based on this, according to the expression of the above formula: for the initialized root node, i and j in the hierarchy are both 0 at this time, i.e. there is only one root node p0,0I.e. there is only one set of alarm information and alarm attributes. At this time, the root node is corresponding to the alarm information set P0,0. For any current alarm attribute a, if it has a V attribute value (i.e. the alarm source is the current alarm attribute)Property a having V kinds of alarm sources), and V kinds of alarm sources are respectively used as child nodes to alarm information set P0,0Grouping to obtain an alarm information set S0,0{a1,a2,…,av}. Then, according to the alarm information set S0,0In (2), calculating P (a ═ a)k)]I.e. calculating a set S of alarm information0,0Each alarm information in the set belongs to the attribute value akProbability of the corresponding packet. And then, according to the formula 2, summing the probabilities corresponding to the attribute values contained in the current alarm attribute a to obtain the information entropy grouped by the current alarm attribute a.
In addition, the set of alarm attributes A due to initializationi,0There are 6 kinds of alarm attributes, so that finally, 6 information entropies are obtained. Then, the monitoring system can take the current alarm attribute corresponding to the minimum value in the 6 information entropies as the optimal attribute.
It can be understood that, when there is a certain alarm attribute a, all the alarm information in the alarm information set S corresponding to the certain alarm attribute a have the same attribute value on the alarm attribute a, that is, V ═ 1. At this time, it can be inferred that:
Figure BDA0003409747010000092
Figure BDA0003409747010000093
that is, when the alarm attribute a is grouped according to the attribute value included in the alarm attribute a, if only one attribute value exists, the corresponding information entropy reaches the minimum value of 0 when the alarm attribute a is grouped. That is, the alarm information set S has the minimum information amount on the alarm attribute a, and the aggregation degree is better and easier when all the alarm information in the alarm information set S is grouped by the current alarm attribute a.
S3, grouping the alarm information sets based on the attribute values respectively corresponding to the optimal attributes to obtain a new alarm information set and an alarm attribute set; and the attribute values corresponding to the optimal attributes in each new alarm information set are the same, and each attribute value corresponding to the optimal attribute is respectively used as a child node.
And S4, for any new alarm information set and alarm attribute set, repeating the steps S2-S3 to group the new alarm information set and alarm attribute set until the terminal child node is obtained.
It can be understood that after the calculation of the above steps S21-S24 is completed, the node p in the aggregation tree is realized0,0The splitting result of (2) is to obtain the node p0,0A series of child nodes p1,j′Alarm information set S corresponding to child node1,j′And alarm attribute set A1,j′
Because the monitoring system has already determined the best attribute from 6 kinds of alarm attributes, at this moment, only 5 kinds of alarm attributes exist in each alarm attribute set obtained. Namely, in the new alarm attribute set, there are only 5 kinds of attribute values corresponding to the alarm attributes at this time. Then, repeating the above-mentioned mode to respectively group the alarm information sets and alarm attribute sets corresponding to the j child nodes until the alarm attribute set Ai+1,j′Is empty. That is, referring to fig. 3, grouping is finally performed with the IP address as the alarm attribute. At this time, for the alarm information set in the IP address 1, there is no 7 th alarm attribute because the corresponding alarm attribute set is empty. Therefore, the set of alarm information belonging to IP address 1 does not have to be grouped again.
Finally, for the attribute value corresponding to the determined optimal attribute each time, the attribute value can be respectively used as each child node of each layer in the aggregation tree. That is, referring to fig. 3, when the optimal attribute determined by the first layer is an alarm source, its child nodes may be alarm sources 1. Then, the monitoring system may determine the optimal attribute again for the alarm information sets under the child nodes of the alarm sources 1.. and N, respectively, until the terminal child node is obtained. Namely, each terminal child node which carries out grouping by taking the alarm attribute as the IP address is obtained.
And S5, the monitoring system generates an aggregation tree according to the root node, the child nodes and the terminal child nodes.
In an embodiment, after the root node, the child node, and the end child node are obtained by sequentially grouping, they may be sequentially connected according to a path during grouping to generate the aggregation tree.
It should be noted that the update of the aggregation tree may be performed every preset time period. For example, the aggregation tree is updated every 0.5s, or 1 s. The updating method includes, but is not limited to, inserting new alarm information into the aggregation tree according to the attribute value corresponding to each child node in the aggregation tree if the new alarm information is obtained. Or, all the alarm information existing in the current aggregation tree is initialized and updated again. I.e. re-executing steps S1-S5 once, so that the generated aggregation tree is more beneficial to performing lossless compression on all alarm information existing at the moment.
In one embodiment, each alarm message includes a life cycle; referring to fig. 7, the step of sending the aggregation information to the user terminal of the staff member at S104 includes the following steps S1041-S1043, which are detailed as follows:
s1041, the monitoring system determines the updating time used when the alarm information is updated to the aggregation tree.
S1042, the monitoring system determines the remaining time of the alarm information according to the updating time and the life cycle.
And S1043, when the remaining time is exhausted, the monitoring system sends the aggregated information containing the alarm information to a user terminal of a worker.
In an embodiment, the life cycle is a time period from the generation of the alarm information to the sending to the user terminal. The monitoring system needs a certain time when analyzing and standardizing the alarm information and updating the aggregation tree. Based on this, the monitoring system can calculate the remaining time of each alarm message according to the update time and the life cycle. And then, sending the residual time exhaustion warning information to the user terminal.
It should be noted that, when the alarm information with the exhausted remaining time is sent to the user terminal, if the alarm information has other sibling nodes, the alarm information should be aggregated with the alarm information corresponding to the sibling nodes, and then the aggregated information is sent. Therefore, the alarm information with the exhausted residual time and other alarm information similar to the alarm information can be processed by the staff in time.
Referring to fig. 8, fig. 8 is a block diagram of an alarm information aggregation device according to an embodiment of the present application. The alarm information aggregation apparatus in this embodiment includes modules for executing the steps in the embodiments corresponding to fig. 1, fig. 3 to fig. 7. Please refer to fig. 1, fig. 3 to fig. 7 and the related descriptions in the embodiments corresponding to fig. 1, fig. 3 to fig. 7. For convenience of explanation, only the portions related to the present embodiment are shown. Referring to fig. 8, the warning information aggregating apparatus 800 may include: an obtaining module 810, an updating module 820, an aggregating module 830, and a sending module 840, wherein:
the acquiring module 810 is configured to acquire alarm information generated when a system is abnormal; the alarm information includes attribute values corresponding to a plurality of alarm attributes, respectively.
An updating module 820, configured to decompose the alarm information step by step according to the multiple alarm attributes, so as to update the aggregation tree; the aggregation tree comprises a plurality of child nodes, and each child node corresponds to an attribute value of one type of alarm attribute.
The aggregation module 830 is configured to, for any current end child node, aggregate the alarm information corresponding to the current end child node and the other end child nodes to obtain aggregated information if the current end child node and the other end child nodes are sibling nodes.
A sending module 840, configured to send the aggregation information to a user terminal of a worker.
In one embodiment, each alarm message has a unique event identifier; the alarm attribute also comprises newly increased alarm and alarm recovery; the warning information aggregating apparatus 800 further includes:
and the decomposition module is used for decomposing the alarm information which is newly increased alarm step by step if the alarm information is newly increased alarm, the same alarm information of the event identifier is not received within the preset time period, and the alarm information is recovered from the alarm.
And the deleting module is used for deleting the alarm information and the rest of the alarm information if the alarm information is a newly added alarm, the event identifiers are the same in the preset time period and the rest of the alarm information is recovered for the alarm.
In one embodiment, the update module 820 is further configured to:
s1, initializing a root node, wherein the root node comprises an alarm information set formed by a plurality of alarm information and an alarm attribute set formed by a plurality of alarm attributes; s2, determining the optimal attribute from the alarm attribute set according to the attribute values respectively corresponding to the alarm attributes; s3, grouping the alarm information sets based on the attribute values respectively corresponding to the optimal attributes to obtain a new alarm information set and an alarm attribute set; the attribute values corresponding to the optimal attributes in each new alarm information set are the same, and each attribute value corresponding to the optimal attribute is respectively used as a child node; s4, aiming at any new alarm information set and alarm attribute set, repeating the steps S2-S3 to group the new alarm information set and the alarm attribute set until a terminal child node is obtained; and S5, generating the aggregation tree according to the root node, the child nodes and the terminal child nodes.
In one embodiment, the update module 820 is further configured to:
aiming at any current alarm attribute, determining a plurality of attribute values included by the current alarm attribute; respectively counting the number of alarm information corresponding to each attribute value; calculating the information entropy of the current alarm attribute according to the number and the total number of the alarm information; the information entropy is used for measuring the aggregation degree when a plurality of alarm information are grouped according to the current alarm attribute; and determining the current alarm attribute corresponding to the minimum value in the information entropy as the optimal attribute.
In one embodiment, the update module 820 is further configured to:
aiming at any current attribute value in the current alarm attributes, calculating the ratio of the number of the alarm information corresponding to the current attribute value to the total number to obtain the probability that the attribute value in any alarm information is the current attribute value; calculating the initial information entropy of the current attribute value according to the probability; and adding the initial information entropies corresponding to each current attribute value in the current alarm attribute to obtain the information entropies of the current alarm attribute.
In an embodiment, the aggregation tree includes a root node; the aggregation module 830 is further configured to:
determining attribute values of alarm attributes corresponding to each child node between a root node and a previous child node; the last child node is a father node of the current terminal child node; generating attribute value sets of alarm attributes respectively corresponding to the current terminal child node and other terminal child nodes; according to any one of the alarm information, maintaining the attribute values of the alarm attributes corresponding to the child nodes in the alarm information unchanged; and replacing the attribute value of the alarm attribute corresponding to the terminal child node in the alarm information to obtain aggregated information.
In one embodiment, each alarm message includes a life cycle; the sending module 840 is further configured to:
determining the updating time used when the alarm information is updated to the aggregation tree; determining the remaining time of the alarm information according to the updating time and the life cycle; and when the remaining time is exhausted, sending the aggregated information containing the alarm information to a user terminal which is a worker.
It should be understood that, in the structural block diagram of the alarm information aggregation device shown in fig. 8, each module is used to execute each step in the embodiments corresponding to fig. 1 and fig. 3 to fig. 7, and each step in the embodiments corresponding to fig. 1 and fig. 3 to fig. 7 has been explained in detail in the above embodiments, specifically please refer to the relevant description in the embodiments corresponding to fig. 1 and fig. 3 to fig. 7 and fig. 1 and fig. 3 to fig. 7, which is not described herein again.
Fig. 9 is a block diagram of a monitoring system according to an embodiment of the present application. As shown in fig. 9, the monitoring system 900 of this embodiment includes: a processor 910, a memory 920 and a computer program 930, such as a program of an alarm information aggregation method, stored in the memory 920 and executable at the processor 910. The processor 910, when executing the computer program 930, implements the steps in the embodiments of the alarm information aggregation methods described above, such as S101 to S104 shown in fig. 1. Alternatively, the processor 910, when executing the computer program 930, implements the functions of the modules in the embodiment corresponding to fig. 8, for example, the functions of the modules 810 to 840 shown in fig. 8, please refer to the related description in the embodiment corresponding to fig. 8.
Illustratively, the computer program 930 may be divided into one or more modules, and the one or more modules are stored in the memory 920 and executed by the processor 910 to implement the alarm information aggregation method provided by the embodiment of the present application. One or more of the modules may be a series of computer program instruction segments that can perform particular functions and that describe the execution of computer program 930 in monitoring system 900. For example, the computer program 930 may implement the alarm information aggregation method provided by the embodiment of the present application.
The monitoring system 900 may include, but is not limited to, a processor 910, a memory 920. Those skilled in the art will appreciate that fig. 9 is merely an example of a monitoring system 900 and is not intended to be limiting of monitoring system 900 and may include more or fewer components than shown, or some components in combination, or different components, e.g., the monitoring system may also include input output devices, network access devices, buses, etc.
The processor 910 may be a central processing unit, but may also be other general purpose processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 920 may be an internal storage unit of the monitoring system 900, such as a hard disk or a memory of the monitoring system 900. The memory 920 may also be an external storage device of the monitoring system 900, such as a plug-in hard disk, a smart card, a flash memory card, etc. provided on the monitoring system 900. Further, memory 920 may also include both internal storage units and external storage devices of monitoring system 900.
The embodiment of the present application provides a monitoring system, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the alarm information aggregation method in the above embodiments is implemented.
The embodiment of the present application provides a computer-readable storage medium, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the alarm information aggregation method in the above embodiments is implemented.
The embodiment of the present application provides a computer program product, which, when running on a monitoring system, enables the monitoring system to execute the alarm information aggregation method in the above embodiments.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. An alarm information aggregation method is applied to a monitoring system, and the method comprises the following steps:
acquiring alarm information generated when a system is abnormal; the alarm information comprises attribute values corresponding to a plurality of alarm attributes respectively;
decomposing the alarm information step by step according to the plurality of alarm attributes to update the aggregation tree; the aggregation tree comprises a plurality of child nodes, and each child node corresponds to an attribute value of an alarm attribute;
for any current terminal node, if the current terminal child node and other terminal nodes are brother nodes, aggregating the alarm information corresponding to the current terminal child node and other terminal child nodes to obtain aggregated information;
and sending the aggregation information to a user terminal of a worker.
2. The alarm information aggregation method according to claim 1, wherein each of the alarm information has a unique event identifier; the alarm attribute also comprises a newly added alarm and an alarm recovery; after the alarm information generated when the system is abnormal is obtained, the method further comprises the following steps:
if the alarm information is the newly added alarm, the same event identifier alarm information is not received within a preset time period, and the alarm information is the alarm information recovered, the alarm information of the newly added alarm is decomposed step by step;
and if the alarm information is the newly added alarm, and the event identifiers which are the same and are recovered for the alarm are received in the preset time period, deleting the alarm information and the rest of the alarm information.
3. The method for aggregating alarm information according to claim 1, wherein the progressively decomposing the alarm information according to the plurality of alarm attributes to update the aggregation tree comprises:
s1, initializing a root node, wherein the root node comprises an alarm information set formed by a plurality of alarm information and an alarm attribute set formed by a plurality of alarm attributes;
s2, determining the optimal attribute from the alarm attribute set according to the attribute values corresponding to the alarm attributes respectively;
s3, grouping the alarm information sets based on the attribute values respectively corresponding to the optimal attributes to obtain a new alarm information set and an alarm attribute set; wherein, the attribute values corresponding to the optimal attributes in each new alarm information set are the same, and each attribute value corresponding to the optimal attribute is respectively used as a child node;
s4, for any new alarm information set and any new alarm attribute set, repeating the steps S2-S3 to group the new alarm information set and the new alarm attribute set until a terminal child node is obtained;
s5, generating the aggregation tree according to the root node, the child nodes and the tail terminal node.
4. The method for aggregating alarm information according to claim 3, wherein the determining an optimal attribute from the alarm attribute set according to the attribute values corresponding to the plurality of alarm attributes respectively comprises:
determining a plurality of attribute values included in any current alarm attribute;
respectively counting the number of alarm information corresponding to each attribute value;
calculating the information entropy of the current alarm attribute according to the number and the total number of the alarm information; the information entropy is used for measuring the aggregation degree when a plurality of alarm information is grouped by the current alarm attribute;
and determining the current alarm attribute corresponding to the minimum value in the information entropy as the optimal attribute.
5. The alarm information aggregation method according to claim 4, wherein the calculating an information entropy of the current alarm attribute according to the number and the total number of the alarm information includes:
calculating the ratio of the number of the alarm information corresponding to the current attribute value to the total number aiming at any current attribute value in the current alarm attribute to obtain the probability that the attribute value in any alarm information is the current attribute value;
calculating the initial information entropy of the current attribute value according to the probability;
and summing the initial information entropies corresponding to each current attribute value in the current alarm attribute to obtain the information entropies of the current alarm attribute.
6. The alarm information aggregation method according to claim 1, wherein the aggregation tree includes a root node;
aggregating the alarm information corresponding to the current terminal subnode and the other terminal subnodes to obtain aggregated information, including:
determining attribute values of the alarm attribute corresponding to each child node between the root node and the previous child node; the previous child node is a parent node of the current terminal child node;
generating attribute value sets of alarm attributes corresponding to the current terminal sub-node and the other terminal sub-nodes respectively;
according to any one of the alarm information, maintaining the attribute values of the alarm attributes corresponding to the sub-nodes in the alarm information unchanged; and replacing the attribute value of the alarm attribute corresponding to the terminal child node in the alarm information by the attribute value set to obtain the aggregation information.
7. The method for aggregating alarm information according to any one of claims 1-6, wherein each of the alarm information includes a life cycle;
the sending the aggregation information to the user terminal of the staff comprises the following steps:
determining the updating time used when the alarm information is updated to the aggregation tree;
determining the remaining time of the alarm information according to the updating time and the life cycle;
and when the remaining time is exhausted, sending the aggregated information containing the alarm information to a user terminal of the worker.
8. An alarm information aggregation device, applied to a monitoring system, the device comprising:
the acquisition module is used for acquiring alarm information generated when the system is abnormal; the alarm information comprises attribute values corresponding to a plurality of alarm attributes respectively;
the updating module is used for decomposing the alarm information step by step according to the plurality of alarm attributes so as to update the aggregation tree; the aggregation tree comprises a plurality of child nodes, and each child node corresponds to an attribute value of an alarm attribute;
the aggregation module is used for aggregating the alarm information corresponding to the current terminal child node and other terminal child nodes to obtain aggregated information if the current terminal child node and other terminal child nodes are brother nodes aiming at any current terminal child node;
and the sending module is used for sending the aggregation information to a user terminal of a worker.
9. A monitoring system comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202111524740.1A 2021-12-14 2021-12-14 Alarm information aggregation method, device, monitoring system and storage medium Pending CN114185744A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111524740.1A CN114185744A (en) 2021-12-14 2021-12-14 Alarm information aggregation method, device, monitoring system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111524740.1A CN114185744A (en) 2021-12-14 2021-12-14 Alarm information aggregation method, device, monitoring system and storage medium

Publications (1)

Publication Number Publication Date
CN114185744A true CN114185744A (en) 2022-03-15

Family

ID=80543664

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111524740.1A Pending CN114185744A (en) 2021-12-14 2021-12-14 Alarm information aggregation method, device, monitoring system and storage medium

Country Status (1)

Country Link
CN (1) CN114185744A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114760186A (en) * 2022-03-23 2022-07-15 深信服科技股份有限公司 Alarm analysis method and device, electronic equipment and storage medium
CN116886448A (en) * 2023-09-07 2023-10-13 卓望数码技术(深圳)有限公司 DDoS attack alarm studying and judging method and device based on semi-supervised learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114760186A (en) * 2022-03-23 2022-07-15 深信服科技股份有限公司 Alarm analysis method and device, electronic equipment and storage medium
CN114760186B (en) * 2022-03-23 2024-05-28 深信服科技股份有限公司 Alarm analysis method, alarm analysis device, electronic equipment and storage medium
CN116886448A (en) * 2023-09-07 2023-10-13 卓望数码技术(深圳)有限公司 DDoS attack alarm studying and judging method and device based on semi-supervised learning
CN116886448B (en) * 2023-09-07 2023-12-01 卓望数码技术(深圳)有限公司 DDoS attack alarm studying and judging method and device based on semi-supervised learning

Similar Documents

Publication Publication Date Title
US10691728B1 (en) Transforming a data stream into structured data
US10108411B2 (en) Systems and methods of constructing a network topology
CN103513983B (en) method and system for predictive alert threshold determination tool
WO2021129367A1 (en) Method and apparatus for monitoring distributed storage system
US20180288129A1 (en) Introspection driven monitoring of multi-container applications
CN114185744A (en) Alarm information aggregation method, device, monitoring system and storage medium
CN110650038B (en) Security event log collecting and processing method and system for multiple classes of supervision objects
US10361943B2 (en) Methods providing performance management using a proxy baseline and related systems and computer program products
US10878335B1 (en) Scalable text analysis using probabilistic data structures
US20210406288A1 (en) Novelty detection system
CN110740061A (en) Fault early warning method and device and computer storage medium
CN112527848B (en) Report data query method, device and system based on multiple data sources and storage medium
CN113472555B (en) Fault detection method, system, device, server and storage medium
CN113806191A (en) Data processing method, device, equipment and storage medium
CN114443437A (en) Alarm root cause output method, apparatus, device, medium, and program product
CN111427749B (en) Monitoring tool and method for ironic service in opentack environment
CN110309206B (en) Order information acquisition method and system
CN111352930A (en) Template data processing method and device, server and storage medium
CN111274032A (en) Task processing system and method, and storage medium
CN116804957A (en) System monitoring method and device
CN113285837A (en) Carrier network service fault diagnosis method and device based on topology sensing
CN114706893A (en) Fault detection method, device, equipment and storage medium
CN114936150A (en) Big data stream synchronization and monitoring test method, device and storage medium
CN111143318A (en) Information processing method and device, electronic equipment and storage medium
CN114422324B (en) Alarm information processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination