CN116860507A - Alarm root cause determining method, device, equipment and medium - Google Patents

Alarm root cause determining method, device, equipment and medium Download PDF

Info

Publication number
CN116860507A
CN116860507A CN202311068710.3A CN202311068710A CN116860507A CN 116860507 A CN116860507 A CN 116860507A CN 202311068710 A CN202311068710 A CN 202311068710A CN 116860507 A CN116860507 A CN 116860507A
Authority
CN
China
Prior art keywords
alarm
directed graph
sample
alarms
alert
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311068710.3A
Other languages
Chinese (zh)
Inventor
荣翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202311068710.3A priority Critical patent/CN116860507A/en
Publication of CN116860507A publication Critical patent/CN116860507A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Alarm Systems (AREA)

Abstract

The disclosure provides a method, a device, equipment and a storage medium for determining an alarm root cause, which can be applied to the fields of big data, information security and financial technology. The method comprises the following steps: acquiring real-time alarm data generated by an information system in a first preset period, wherein the real-time alarm data comprises N alarms and unique identification codes corresponding to the N alarms respectively, and N is an integer greater than or equal to 2; inputting real-time alarm data into a generating model and outputting an initial alarm directed graph, wherein the generating model is used for establishing an association relation among N alarms based on a unique identification code, and the initial alarm directed graph comprises N nodes corresponding to the alarms and directed edges formed among the associated alarms; shearing the initial warning directed graph to obtain a target warning directed graph; and determining the root cause of the alarm generated by the information system based on the target alarm directed graph.

Description

Alarm root cause determining method, device, equipment and medium
Technical Field
The present disclosure relates to the field of big data, the field of information security, and the field of financial technology, and in particular, to a method, apparatus, device, medium, and program product for determining an alert root cause.
Background
With the continuous development of big data and cloud computing, the data volume required to be processed by an information system alarm platform is continuously increased, and the huge data volume influences the efficiency of positioning the fault root cause alarm. Because the alarms are not regular and are not regularly distributed in the actual operation and maintenance process, a large number of alarms are often generated in a certain time period, but the alarms are displayed in a dimension single and repeated mode, and the operation and maintenance personnel can not quickly acquire valuable alarm information in the alarms due to the occurrence of the large number of alarms, so that the root cause of the alarms is found.
Disclosure of Invention
In view of the foregoing, the present disclosure provides an alert root determining method, apparatus, device, medium, and program product.
According to a first aspect of the present disclosure, there is provided an alert root cause determining method, including:
acquiring real-time alarm data generated by an information system in a first preset period, wherein the real-time alarm data comprises N alarms and N unique identification codes respectively corresponding to the N alarms, and N is an integer greater than or equal to 2;
inputting the real-time alarm data into a generating model and outputting an initial alarm directed graph, wherein the generating model is used for establishing the association relation among N alarms based on the unique identification codes, and the initial alarm directed graph comprises N nodes corresponding to the alarms and directed edges formed between the associated alarms;
Shearing the initial warning directed graph to obtain a target warning directed graph; and
and determining the root cause of the alarm generated by the information system based on the target alarm directed graph.
According to an embodiment of the present disclosure, the training method for generating a model includes:
acquiring historical alarm data generated by the information system in a second preset period, wherein the historical alarm data comprises M first alarm samples, the occurrence time of the first alarm samples and unique identification codes corresponding to the M first alarm samples, and M is an integer greater than or equal to 2;
inputting the first alarm sample into an initial model so as to execute the following operations:
classifying the first alarm sample to obtain a classified first alarm sample;
determining sub-alarm samples corresponding to all alarm times of the first alarm samples in the second preset period according to each type of the first alarm samples;
traversing all the sub-alarm samples, and determining a second alarm sample set generated by the information system in a third preset period, wherein the second alarm sample set comprises all second alarm samples different from the first alarm sample in type, and the third preset period is determined according to the alarm time of the sub-alarm samples and the second preset period;
Based on the second alarm sample set, outputting an initial alarm sample directed graph;
and adjusting model parameters of the initial model according to the initial alarm sample directed graph and the real alarm sample directed graph so as to obtain the generated model.
According to an embodiment of the present disclosure, the second alarm sample set further includes: a time stamp of the second alert sample relative to the sub-alert sample;
wherein the traversing all the sub-alert samples, determining a second alert sample set generated by the information system in the third preset period, includes:
for each of the above sub-alert samples, the following operations are repeatedly performed:
acquiring all the second alarm samples generated by the information system in the third preset period;
determining a first accumulated alarm number before the occurrence time of the sub-alarm sample and a second accumulated alarm number after the occurrence time of the sub-alarm sample;
and determining the time tag according to the first accumulated alarm times and the second accumulated alarm times.
According to an embodiment of the disclosure, the initial alert sample directed graph includes nodes corresponding to the first alert sample and the second alert sample, directed edges having an association relationship between the nodes, and first weight values corresponding to the directed edges and used for representing association degrees, and the second alert sample set further includes: a time stamp of the second alert sample relative to the sub-alert sample;
Wherein, based on the alarm sample set, outputting an initial alarm sample directed graph includes:
determining the direction of a line segment formed by the node corresponding to the first alarm sample and the node corresponding to each second alarm sample according to the time tag;
constructing the directed edge with time from front to back according to the direction and the nodes;
determining the first weight value corresponding to the directed edge according to the alarm frequency generated by each second alarm sample in a third preset period;
and outputting the initial alarm sample directed graph.
According to an embodiment of the present disclosure, the initial alert directed graph further includes: the second weight value corresponding to the directed edge;
the step of shearing the initial warning directed graph to obtain a target warning directed graph comprises the following steps:
and under the condition that the second weight value does not meet the threshold value, cutting the directed edge corresponding to the second weight value to obtain the target alarm directed graph.
According to an embodiment of the disclosure, the determining, based on the target alert directed graph, a root cause of an alert generated by the information system includes:
determining a target root node according to the target alarm directed graph;
And determining the root cause of the alarm generated by the information system according to the target root node.
According to an embodiment of the present disclosure, the inputting the real-time alert data into the generating model, outputting an initial alert directed graph, includes:
inputting the real-time alert data into the generation model to perform the following operations:
classifying the alarms according to the unique identification codes to obtain classified alarms;
determining sub alarms corresponding to all alarm times in the first preset period for each type of the alarms;
traversing all the sub-alarms, and determining an alarm set generated by the information system in a fourth preset period, wherein the alarm set comprises all associated alarms different from the type of the alarms, and the fourth preset period is determined according to the alarm time of the sub-alarms and the first preset period;
and outputting the initial alarm directed graph based on the alarm set.
A second aspect of the present disclosure provides an alert root determining apparatus, including:
the system comprises a first acquisition module, a second acquisition module and a control module, wherein the first acquisition module is used for acquiring real-time alarm data generated by an information system in a first preset period, the real-time alarm data comprises N alarms and N unique identification codes respectively corresponding to the N alarms, and N is an integer greater than or equal to 2;
The processing module is used for inputting the real-time alarm data into a generating model and outputting an initial alarm directed graph, wherein the generating model is used for establishing the association relation among N alarms based on the unique identification codes, and the initial alarm directed graph comprises N nodes corresponding to the alarms and directed edges formed among the associated alarms;
the shearing module is used for shearing the initial warning directed graph to obtain a target warning directed graph; and
and the first determining module is used for determining the root cause of the alarm generated by the information system based on the target alarm directed graph.
A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the alert root determination described above.
A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the alert root determination method described above.
A fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described alert root determination method.
According to the embodiment of the disclosure, the acquired real-time alarm data are input into the generation model, the initial alarm directed graph is output, and massive alarms and the relation among the alarms are expressed in a computer-operated mode, so that the alarm processing flow is optimized. Further, the initial warning directed graph is sheared to obtain the target warning directed graph, so that the problems of low efficiency and low accuracy in searching the root cause warning from the generated initial warning directed graph due to huge warning quantity can be effectively solved. The root cause of the alarm generated by the information system is determined according to the target alarm directed graph, the efficiency of searching the root cause alarm is improved on the basis of ensuring that an accurate association relation is established between alarms, the time of searching the root cause alarm is saved, the accuracy of determining the root cause of the alarm is improved, the problem of the alarm is solved in time by operation and maintenance personnel, and the normal processing of the service associated with the alarm is further ensured.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates an application scenario diagram of an alert root determination method, apparatus, device, medium, and program product according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a method of alert root determination in accordance with an embodiment of the present disclosure;
FIG. 3 schematically illustrates an alert directed relationship diagram in accordance with an embodiment of the present disclosure;
FIG. 4 schematically illustrates a partial initial alert sample directed pictorial intent in accordance with an embodiment of the present disclosure;
FIG. 5 schematically illustrates an initial alert directed graph cut schematic in accordance with an embodiment of the present disclosure;
FIG. 6 schematically illustrates an architecture diagram of an alert root determination method according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a block diagram of an alert root determining apparatus according to an embodiment of the present disclosure; and
fig. 8 schematically illustrates a block diagram of an electronic device adapted to implement an alert root determination method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In the technical solution of the present disclosure, the related user information (including, but not limited to, user personal information, user image information, user equipment information, such as location information, etc.) and data (including, but not limited to, data for analysis, stored data, displayed data, etc.) are information and data authorized by the user or sufficiently authorized by each party, and the related data is collected, stored, used, processed, transmitted, provided, disclosed, applied, etc. and processed, all in compliance with the related laws and regulations and standards of the related country and region, necessary security measures are taken, no prejudice to the public order, and corresponding operation entries are provided for the user to select authorization or rejection.
In the technical scheme of the embodiment of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.
In the implementation of the present disclosure, it is found that with the rapid development of cloud computing and big data, the monitoring and alarm system of the platform becomes more and more important. The ever-increasing amount of data affects the efficiency of locating the root cause of the fault alert. Because the alarms are not regular and are not regularly distributed in the actual operation and maintenance process, a large number of alarms are often generated in a certain time period, but the alarms are displayed with dimension singleness and repeatability, such as alarms with overhigh CPU load, alarms which cannot be accessed by an application, bandwidth utilization rate alarms and the like, are only sent. The occurrence of massive alarms makes operation and maintenance personnel unable to quickly acquire valuable alarm information in the alarms and find the root cause of the alarms. The most common alarm positioning method at present is to find out the reason for the occurrence of the alarm through a command after receiving the alarm, the operation efficiency is extremely low, and meanwhile, the problems of high load of an alarm platform, low business throughput and the like are caused.
The embodiment of the disclosure provides an alarm root cause determining method, which comprises the following steps: acquiring real-time alarm data generated by an information system in a first preset period, wherein the real-time alarm data comprises N alarms and unique identification codes corresponding to the N alarms respectively, and N is an integer greater than or equal to 2; inputting real-time alarm data into a generating model and outputting an initial alarm directed graph, wherein the generating model is used for establishing an association relation among N alarms based on a unique identification code, and the initial alarm directed graph comprises N nodes corresponding to the alarms and directed edges formed among the associated alarms; shearing the initial warning directed graph to obtain a target warning directed graph; and determining the root cause of the alarm generated by the information system based on the target alarm directed graph.
FIG. 1 schematically illustrates an application scenario diagram of an alert root determination method, apparatus, device, medium, and program product according to an embodiment of the present disclosure.
As shown in fig. 1, an application scenario 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using at least one of the first terminal device 101, the second terminal device 102, and the third terminal device 103. Alarms may occur on the first terminal device 101, the second terminal device 102, the third terminal device 103, the network 104 and the server 105.
The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server providing support for websites browsed by the user using the first terminal device 101, the second terminal device 102, and the third terminal device 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that, the alert root determining method provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the alert root determining apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The alarm root cause determination method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105. Accordingly, the alert root determining apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The alarm root cause determination method of the disclosed embodiment will be described in detail below with reference to fig. 2 to 8 based on the scenario described in fig. 1.
FIG. 2 schematically illustrates a flow chart of a method of alert root determination according to an embodiment of the present disclosure.
As shown in fig. 2, the alert root determining method 200 of this embodiment includes operations S210 to S240.
In operation S210, real-time alarm data generated by the information system in a first preset period is acquired, where the real-time alarm data includes N alarms and unique identification codes corresponding to the N alarms, and N is an integer greater than or equal to 2.
According to an embodiment of the present disclosure, the first preset period is a preset sliding time window, for defining a time range for collecting real-time alarms. The first preset period can be configured according to actual requirements. For example, if the first preset period is T 1 The real-time alarms collected may be any T 1 Alarm in time.
According to embodiments of the present disclosure, the real-time alert may be any kind of alert in an information system. For example, system crashes, network link breaks, storage device failures, insufficient disk space, excessive memory usage, maximum limit on the number of concurrent connections, excessive CPU utilization, error logs, abnormal behavior logs, and the like.
The information system may include, among other things, database systems, log and monitoring systems, servers and operating systems, etc.
According to an embodiment of the disclosure, key fields in the real-time alert data may include alert occurrence time, alert unique ID, home application, IP address, index name, etc., where the unique identification code may include hash values of three fields of alert home application, IP address, index name.
According to the embodiment of the disclosure, the alarm root cause determining method can be applied to alarm root cause determination facing massive alarm data.
In operation S220, the real-time alert data is input into a generation model, and an initial alert directed graph is output, wherein the generation model is used for establishing an association relationship between N alerts based on the unique identification code, and the initial alert directed graph includes nodes corresponding to the N alerts and directed edges formed between the associated alerts.
According to an embodiment of the present disclosure, a generation model is used to generate an association relationship between each alarm, wherein the association relationship between the alarms is characterized in the form of a directed graph. Wherein, the association relationship may be a time sequence in which each alarm occurs.
According to the embodiment of the disclosure, the initial alarm directed graph characterizes the association relationship among N alarms, and the determination basis of the association relationship among N alarms can comprise the dependency relationship among the alarms and the event sequence of the alarms. The dependency relationship between alarms is determined by an alarm source in the alarm unique identification code, and the association relationship between alarms is determined by analyzing the dependency relationship between components and modules generated by the alarms.
For example, a port failure of switch a will trigger a network connection disruption alarm for server B connected to that port, indicating that the port failure of switch a is the cause of a network connection disruption for server B, and that there is a causal relationship between them, so that the association between alarms can be determined by such causal relationship. When the initial alarm directed graph is constructed, the port fault alarm of the switch A is taken as a Target Node (Target Node), the network connection interruption alarm of the server B is taken as a Source Node, and the association relation between the port fault alarm of the switch A and the network connection interruption alarm of the server B is established.
The association relation between alarms is determined through the event sequence of the alarms, and can be realized through other attributes such as the occurrence time of the alarms, the alarm level and the like. Specifically, the sequence of alarms can be determined by the time of the alarms, and the association relationship can be determined by the sequence of the alarms. The causal relationship between alarms can also be determined by the alarm levels, and thus the association relationship between alarms, because alarms at a high level are often triggered by alarms at a lower level.
For example, when the alarm a occurs before the alarm b in time, when the initial alarm directed graph is constructed, the alarm b is taken as a target node, the alarm a is taken as an initial node, and the association relationship between the alarms a and b is established.
For example, if a medium-level hard disk fault alarm occurs in the server B, a high-level network connection interrupt alarm of the server B may be triggered, and when an initial alarm directed graph is constructed, the medium-level hard disk fault alarm is taken as an initial node, the high-level network connection interrupt alarm is taken as a target node, and an association relationship between the medium-level hard disk fault alarm and the high-level network connection interrupt alarm is established.
According to the embodiment of the disclosure, the initial warning directed graph is finally obtained by determining the association relation among the nodes.
In operation S230, the initial alert directed graph is cut to obtain a target alert directed graph.
According to the embodiment of the disclosure, the unimportant edges in the initial alarm directed graph are cut, and a more simplified target alarm directed graph is obtained. Clipping may be performed based on the alert source, selecting to retain only certain alert source nodes and edges associated therewith, removing other nodes and edges. This manner of clipping may divide the graph into sub-graphs centered at a particular alert source. And a certain node and the dependent nodes related to the certain node can be selected and reserved according to the dependent relation, and other nodes and edges can be removed. The clipping approach may divide the initial alert directed graph into one or more sub-graphs, each representing an independent alert event chain or fault chain, focusing only on a particular fault chain or alert event chain.
According to the embodiment of the disclosure, the initial alarm directed graph can be cut according to the weight value of each edge, wherein the weight value can be used for representing the association strength between nodes in the initial alarm directed graph and can also be used for representing the possibility of fault occurrence.
In operation S240, a root cause of an alarm generated by the information system is determined based on the target alarm directed graph.
According to the embodiment of the disclosure, based on the simplified target alarm directed graph, the relationship between the ingress and egress of each node in the target alarm directed graph is analyzed, then the root node of the target alarm directed graph is traced back through a recursion method, and the alarms represented by the root node are root cause alarms to be determined. The method for recursively tracing the target alarm can be realized through an algorithm.
FIG. 3 schematically illustrates an alert directed relationship diagram according to an embodiment of the present disclosure.
As shown in fig. 3, the alarm directed relation 300 in this embodiment characterizes the association relation among the alarm a, the alarm B, the alarm C, the alarm D, the alarm E and the alarm F, wherein the ingress of the node a is 0, the ingress of the node B is 1, the ingress of the node C is 1, the ingress of the node D is 1, the ingress of the node E is 1, and the ingress of the node F is also 0. To find the root cause that induces a certain alarm, it is also necessary to find the root node by a recursive method. For example, to find the root cause alert node of node E, it is necessary to trace back through paths related to E, as shown in fig. 3, where there is only one traceable path of E, i.e., a→b→d→e, and it can be concluded that: the node A is the root cause alarm node of the node E, and the alarm represented by the node A is the root cause alarm of the alarm represented by the node E. And the node A and the node F are root nodes of the directed graph, and the node A and the node F are alarm root nodes of the alarm directed graph.
According to the embodiment of the disclosure, the acquired real-time alarm data are input into the generation model, the initial alarm directed graph is output, and massive alarms and the relation among the alarms are expressed in a computer-operated mode, so that the alarm processing flow is optimized. Further, the initial warning directed graph is sheared to obtain the target warning directed graph, so that the problems of low efficiency and low accuracy in searching the root cause warning from the generated initial warning directed graph due to huge warning quantity can be effectively solved. The root cause of the alarm generated by the information system is determined according to the target alarm directed graph, the efficiency of searching the root cause alarm is improved on the basis of ensuring that an accurate association relation is established between alarms, the time of searching the root cause alarm is saved, the accuracy of determining the root cause of the alarm is improved, the problem of the alarm is solved in time by operation and maintenance personnel, and the normal processing of the service associated with the alarm is further ensured.
According to an embodiment of the present disclosure, a training method for generating a model includes: acquiring historical alarm data generated by an information system in a second preset period, wherein the historical alarm data comprises M first alarm samples, the occurrence time of the first alarm samples and unique identification codes corresponding to the M first alarm samples, and M is an integer greater than or equal to 2; the historical alert data is entered into the initial model to perform the following operations: classifying the first alarm sample to obtain a classified first alarm sample; determining sub-alarm samples corresponding to all alarm times of the first alarm samples in a second preset period aiming at each type of first alarm samples; traversing all sub-alarm samples, and determining a second alarm sample set generated by the information system in a third preset period, wherein the second alarm sample set comprises all second alarm samples with different types from the first alarm samples, and the third preset period is determined according to the alarm time of the sub-alarm samples and the second preset period; based on the second alarm sample set, outputting an initial alarm sample directed graph; and adjusting model parameters of the initial model according to the initial alarm sample directed graph and the real alarm sample directed graph so as to obtain a generated model.
According to an embodiment of the disclosure, the second preset period is a preset sliding time window, and the second preset period is different from the first preset period and is used for limiting a time range for collecting the historical alarm data. The second preset period can be configured according to the actual situation in the training process.
For example, in the training process, the second preset period is 15 days, and it is found that the causal relationship between alarms cannot be comprehensively known, so that the second preset period can be increased, and the time of the second preset period is changed from 15 days to 30 days.
According to an embodiment of the present disclosure, the initial model is a trained model, and the generated model is obtained by inputting a large amount of historical alert data into the initial model training.
According to the embodiment of the disclosure, when training an initial model by using historical alarm data, first, the first alarm sample is classified according to the unique identification code corresponding to the first alarm sample, and the classification standard is irrelevant to the occurrence time of the alarm. For example, the first alarm sample includes c 1 Alert and c 2 Alarming, c 1 Alert and c 2 The alarms being alarms of the same unique identification code, although c 1 Alert and c 2 The occurrence time of alarms may be different, but when classifying, c will be 1 Alert and c 2 Alarms are classified into the same type of alarms, and c can be calculated by 1 Alert and c 2 Alarms are collectively referred to as class C alarms. For example, c 2 Alert and d 1 If the unique identification codes of alarms are different, c is set as 2 Alert and d 1 Alarms are divided into different categories of alarms.
According to an embodiment of the present disclosure, the sub-alert samples are all the same type of alert within a time range defined by the second preset period. For example, c 1 Alert and c 2 Alarms are all class C alarms, all occurring during a time period t 1 ,t 2 ]In, then it can be considered that c 1 Alert and c 2 The alarm is a sub-alarm sample of a class C alarm, where t 2 And t 1 The time difference of (2) is equal to the second preset period.
According to the embodiment of the disclosure, after classifying all the first alarm samples, searching sub-alarm samples corresponding to each type of first alarm sample in a time range defined by a second preset period for a certain type of alarm.
According to an embodiment of the present disclosure, the third preset period is a preset sliding time window, which is used herein to distinguish between the first preset period and the second preset period. The second preset period and the alarm time of the sub-alarm sample jointly determine a third preset period. For example, the second preset period is T 2 The alarm time of the sub-alarm sample is t 1 The third preset period is [ t ] 1 -T 2 ,t 1 +T 2 ]. Therefore, in the process of traversing all the sub-alarm samples to determine the second alarm sample set generated in the information system, the time ranges of the searching are different due to the time change of the sub-alarm samples.
According to the embodiment of the disclosure, a type of alarm is selected from the first alarm samples, all sub-alarm samples of the type of the first alarm samples are traversed, and a set of second alarm samples of each sub-alarm sample is searched. The second alarm sample set comprises all alarm samples with different types from the first alarm sample selected at the time. It is necessary to find a set of its second alert samples separately for each type of first alert sample.
For example, the first alert sample includes the following types: class a, class B, class C, the sub-alert samples of the class a first alert sample comprising a 1 、a 2 、a 3 The sub-alert samples of the class B first alert sample include B 1 、b 2 、b 3 The sub-alert samples of the class C first alert sample include C 1 、c 2 . First, a class A first alarm sample is selected to traverse all sub alarms, and in the traversing process, if a 1 The time of alarm occurrence is t 1 The second preset period is T 2 Find t 1 -T 2 ,t 1 +T 2 ]A in all and sub-alarm samples in time 1 A second set of alert samples, all class B and class C alerts. Then, traversing all class A sub-alarm samples according to the searching method. Likewise, a set of second alarm samples for each of its sub-alarm samples is found for class B and class C alarms.
According to the embodiment of the disclosure, a directed graph is constructed according to the sub-alarm samples and the second alarm sample set corresponding to the sub-alarm samples, wherein the occurrence sequence of each alarm can be determined in time sequence, and the association relation between alarms is established. And then outputting an initial alarm sample directed graph formed by taking the alarm samples as nodes and the association relation between alarms as edges. Wherein the initial sample alert directed graph is used to distinguish from the initial alert directed graph in the above-described embodiments.
According to the embodiment of the disclosure, a true alarm sample directed graph is obtained by artificial analysis and reflects the association relation between each alarm sample. The model parameters comprise a first preset period, a second preset period and a third preset period. And respectively adjusting the first preset period, the second preset period and the third preset period to continuously optimize the initial model by analyzing the difference between the alarm sample directed graph obtained by the current initial model and the real alarm sample directed graph until the initial alarm sample directed graph output by the initial model is the same as the real alarm sample directed graph, thereby obtaining the generated model.
According to the embodiment of the disclosure, the first alarm samples are classified, the second alarm sample set of the sub-alarm samples in each type of the first alarm samples is searched, an initial alarm directed graph containing the association relationship between each sub-alarm sample and other alarm samples is obtained, and the direction of the association relationship between alarms is determined. The model parameters are adjusted to continuously optimize the model to obtain a final generation model, the generation model is used for realizing the construction of the directed graph, the time for constructing the alarm directed graph is better saved, the directed graph is shown in a computer operation mode, and the alarm processing flow is optimized.
According to an embodiment of the present disclosure, the second set of alert samples further includes: a time tag of the second alert sample relative to the sub-alert sample; wherein traversing all sub-alert samples, determining a second alert sample set generated by the information system within a third preset period, comprising: for each sub-alarm sample, the following operations are repeatedly performed: acquiring all second alarm samples generated by the information system in a third preset period; determining a first accumulated alarm number of times of the second alarm sample before the sub-alarm sample occurs and a second accumulated alarm number of times of the second alarm sample after the sub-alarm sample occurs; and determining a time tag according to the first accumulated alarm times and the second accumulated alarm times.
According to an embodiment of the present disclosure, the first cumulative alert number is used to characterize the number of times a certain type of alert sample occurs before a certain sub-alert sample. The second accumulated number of alarms is used to characterize the number of times a certain type of alarm sample occurred before a certain sub-alarm sample. For example, sub-alarm sample a is to be found 1 Including a first alarm sample of type B in the second alarm sample set to be searched, wherein the sub-alarm sample B 1 Occurs in sub-alert sample a 1 Previously, sub-alarm sample b 2 And b 3 Occurs in sub-alert sample a 1 And then, the first accumulated alarm times are 1, and the second accumulated alarm times are 2.
According to embodiments of the present disclosure, a time tag is used to characterize the time of occurrence of a certain type of alert sample. Specifically, the time tag characterization occurs before a sub-alert sample or after a sub-alert sample.
According to the embodiment of the disclosure, the time tag of the alarm is determined by comparing the first accumulated alarm times with the second accumulated alarm times and selecting accumulated alarm times with more alarm occurrence times.
For example, the sub-alarm sample is a 1 In the process of traversing the type B alarms, the first accumulated alarm time is 11, the second accumulated alarm time is 200, and the second accumulated alarm time is greater than the first accumulated alarm time, so that the time tag characterization of the type B first alarm sample occurs in the sub-alarm sample a 1 After that, the process is performed.
According to the embodiment of the disclosure, by comparing the first accumulated alarm times with the second accumulated alarm times, whether a certain type of alarm occurs before or after the sub-alarm samples is determined, and by determining the occurrence time of the certain type of alarm relative to the sub-alarm samples and determining the time label, the more the alarm occurrence times are, the larger the probability of the alarm occurrence is represented, so that the time of the certain type of alarm relative to the sub-alarm samples is determined by the occurrence times of the certain type of alarm, the possible time sequence of the certain type of alarm and the current sub-alarm samples can be better responded, and the relative sequence among various types of alarms is determined, so that the problem that the relative relation among all nodes cannot be determined in a directed graph is solved to a certain extent.
According to an embodiment of the present disclosure, the initial alert sample directed graph includes nodes corresponding to the first alert sample and the second alert sample, directed edges having an association relationship between the nodes, and first weight values corresponding to the directed edges for characterizing the association degree, where the second alert sample set further includes: a time tag of the second alert sample relative to the sub-alert sample; wherein outputting an initial alert sample directed graph based on the alert sample set, comprises: determining the direction of a line segment formed by a node corresponding to the first alarm sample and a node corresponding to each second alarm sample according to the time tag; constructing a directional edge with time from front to back according to the direction and the nodes; determining a first weight value corresponding to the directed edge according to the alarm frequency generated by each second alarm sample in a third preset period; and outputting an initial alarm sample directed graph.
According to the embodiment of the disclosure, the precedence relation between the node corresponding to the first alarm sample and the node corresponding to the second alarm sample is determined according to the time label, and the occurrence time of the second alarm sample is determined according to the occurrence time of the sub alarm sample of the second alarm sample, the occurrence time of which is determined according to the time label. Specifically, the time at which the sub-alarm sample closest to the first alarm sample time occurs may be determined as the occurrence time of the second alarm sample. And determining a target node and a starting node corresponding to each edge in the initial warning directed relation diagram.
For example, the first alarm sample includes a class A alarm sample, and the sub-alarm sample in the class A alarm is a 1 The time tag of the second alarm sample of type B in the process of traversing the alarms of type B characterizes that the alarms of type B occur in the sub-alarm sample a 1 Thereafter, B can be selectedClass alarm sample and sub-alarm sample a 1 The sub-alarm sample b with the closest occurrence time 1 As the occurrence time of the alert sample.
According to an embodiment of the disclosure, the first weight value is used to characterize a degree of association between each node in the initial alert sample directed graph. The first weight value may be determined by the alarm frequency generated by the second alarm sample in the third preset period, and specifically, the first weight value may be determined according to the number of sub-alarm samples and the total number of alarms of the starting node. It should be clear that the method of determining the first weight value is more than one. The number of the sub-alarm samples of the starting node and the total number of the sub-alarm samples of the starting node can also be determined.
Fig. 4 schematically illustrates a partial initial alert sample directed pictorial intent in accordance with an embodiment of the present disclosure.
As shown in fig. 4, a portion of the initial alert sample directed graph 400 of this embodiment includes: a, a 1 Node, b 1 Node, b 1 Node point a 1 Directed line segment of node and used for representing a 1 Node and b 1 And the weight value of the association relation between the nodes.
Wherein, is used for representing a 1 Node and b 1 The weight value of the association relation between the nodes can be obtained through b 1 The number of occurrences of the alarms characterized by the node is determined by the ratio of the number of occurrences of the alarms in all alarm samples.
For example, the number of all alarm samples in the initial alarm sample directed graph may be 200, i.e., the number of all nodes in the initial alarm sample directed graph may be 200, b 1 The occurrence number of alarms represented by the node is 50, then b 1 The number of occurrences of alarms represented by the node is greater than the number of all alarm samples, and the obtained weight value between the alarm directional relations may be 0.4.
According to the embodiment of the disclosure, the relation between each node in the initial alarm sample directed graph is determined through the time label of the second alarm sample and the occurrence time of the sub-alarms in the second alarm sample, so that the problem that the association relation between alarms is difficult to determine is at least partially solved. In addition, a first weight value on the directed edge is determined according to the alarm frequency generated by the second alarm sample in the third preset period, and the first weight value characterizes the possibility of association between the initial node and the target node of the corresponding edge of the first weight value because the weight value is calculated according to the alarm frequency generated by the second alarm sample in the third preset period, so that the use value of the directed graph of the initial alarm sample is improved by setting the first weight value.
According to an embodiment of the present disclosure, the initial alert directed graph further includes: the second weight value corresponds to the directed edge; the method for shearing the initial warning directed graph to obtain the target warning directed graph comprises the following steps: and under the condition that the second weight value does not meet the threshold value, shearing the directed edge corresponding to the second weight value to obtain the target warning directed graph.
According to an embodiment of the disclosure, the second weight value is used for distinguishing from the first weight value, and the second weight value is used for representing the association degree between each node on the initial alarm directed graph. The method of calculating the weight value on the initial alert directed graph may include other forms, and the second weight value may be set by a similarity between the initial alert and the target alert. And acquiring an alarm source of the alarm according to the IP address of the alarm in the alarm unique representation code, wherein the more similar the alarm source functions are, the higher the similarity between alarms is. Of course, the method is not limited to this method, and the similarity between alarms can be determined by matching the similarity of texts between alarms.
According to embodiments of the present disclosure, according to the second weight value on the initial alert directed graph, a threshold may be set by analyzing the initial alert directed graph, or an average value of the second weight directed value may be calculated as a threshold by an algorithm. And shearing the initial warning directed graph according to the threshold value to obtain the target warning directed graph. After the shearing is finished, a plurality of directed graphs and a plurality of isolated nodes can appear, and the directed graph with the node number larger than 2 can be reserved as a target warning directed graph.
FIG. 5 schematically illustrates an initial alert directed graph cut schematic in accordance with an embodiment of the present disclosure.
As shown in FIG. 5, an initial alert directed graph cut schematic 500 of this embodiment shows one example of an initial alert directed graph cut resulting in a target alert directed graph. The initial warning directed graph comprises an alarm a, an alarm b, an alarm c, an alarm d, an alarm e and an alarm f, a threshold value is set to be 0.2, the initial warning directed graph is sheared according to the threshold value, edges corresponding to all weights smaller than the threshold value are sheared and removed, isolated points are deleted, and finally the target warning directed graph as shown in the figure is obtained.
According to embodiments of the present disclosure, if a bi-directional relationship exists between two nodes on an initial alert directed graph or a target alert directed graph, then the two nodes are considered to have an alert symbiotic relationship. If the alarm root node has a node with an alarm symbiotic relation with the alarm root node, the alarm root node indicates that two alarms are the reasons of the opposite side alarm, and the node with the alarm root node with the symbiotic relation can be used as the alarm root of the initial alarm directed graph.
According to an embodiment of the disclosure, in the target alert directed graph obtained in the foregoing embodiment, there may be a loop directed graph, that is, there is no root node in the graph, and then by comparing the second weight values of all the edges forming the loop in the directed graph, the smallest second weight value is found, and the edge corresponding to the smallest weight value is deleted, so as to destroy the loop structure in the target directed graph.
According to the embodiment of the disclosure, the edge corresponding to the second weight value smaller than the threshold value is deleted from the initial alarm directed graph to obtain the target alarm directed graph, and the edge with smaller association degree in the initial alarm directed graph and the related node are cut off to obtain the relatively simplified alarm directed graph, so that the time for searching the alarm root cause is saved, the efficiency for searching the alarm root cause is improved, the accuracy for determining the alarm root cause is improved, the alarm problem is solved in time by operation and maintenance personnel, and the normal processing of the service associated with the alarm is further ensured.
According to an embodiment of the present disclosure, determining a root cause of an alert generated by an information system based on a target alert directed graph includes: determining a target root node according to the target alarm directed graph; and determining the root cause of the alarm generated by the information system according to the target root node.
According to an embodiment of the present disclosure, the target root node is a node with zero degree of entry in the target directed graph. The target alarm directed graph can be searched through an incidence table algorithm, target root nodes in the target alarm directed graph are searched, and the root cause of the alarm generated by the information system is determined according to the alarm corresponding to the target root nodes. It should be clear here that there may be multiple alert sources for one alert.
Specifically, firstly, an ingress table is initialized, which is used for recording ingress of each node, and the ingress of all nodes is 0 in the initial stage. Then traversing all edges of the directed graph and updating the table of entries. For example, if the node B traverses to the edge < A, B >, the degree of the node B in the degree table is added with 1, all the edges in the target directed graph are traversed in turn to obtain a final degree table, the node with the degree of 0 in the degree table is searched, and the alarm represented by the node with the degree of 0 is determined as the root cause alarm.
According to the embodiment of the disclosure, the root cause of the alarm generated by the information system is determined by searching the target root node in the target alarm directed graph, so that the burden of manually processing a large number of alarms is reduced.
According to the embodiments of the present disclosure, the foregoing embodiments describe a training method of generating a model, and the following is used to describe a usage method of generating a model. Inputting the real-time alarm data into a generating model, outputting an initial alarm directed graph, comprising: inputting the real-time alert data into the generative model to perform the operations of: classifying the alarms according to the unique identification codes to obtain classified alarms; aiming at each type of alarms, determining sub alarms corresponding to all alarm times in a first preset period; traversing all sub alarms, and determining an alarm set generated by an information system in a fourth preset period, wherein the alarm set comprises all associated alarms different from the types of alarms, and the fourth preset period is determined according to the alarm time of the sub alarms and the first preset period; based on the alarm set, an initial alarm directed graph is output.
According to an embodiment of the present disclosure, the determination method of the fourth preset period is similar to the determination method of the third preset period. For example, the first preset period is T 1 Son (son)The alarm time of the alarm is t 1 The fourth preset period is [ t ] 1 -T 1 ,t 1 +T 1 ]。
Fig. 6 schematically illustrates an architecture diagram of an alert root determination method according to an embodiment of the present disclosure.
As shown in fig. 6, the architecture 600 of the alert root determining method of this embodiment includes an information system 601, historical alert data 602, an initial alert sample directed graph 603, real-time alert data 604, an initial alert directed graph 605, and training and application processes S610 and S620.
In the training process S610, the historical alert data 602 generated by the information system 601 is collected, an initial model is trained using the historical alert data 602, model parameters are adjusted by an initial alert sample directed graph output by the initial model, and the resulting generated model of the initial model is optimized. In an application process S620, where the trained model is applied, the real-time alert data 604 generated by the information system 601 is acquired, and the real-time alert data 604 is input into the generated model to obtain the initial alert directed graph 605.
According to the embodiment of the disclosure, the real-time alarms are acquired by using the trained generation model, the alarms are classified according to the unique identification codes, sub-samples of other types of alarms are traversed aiming at sub-alarms of each type of alarms, the alarm set generated by the information system in the fourth preset period is determined, and the initial alarm directed graph is output based on the alarm set.
Based on the alarm root cause determining method, the disclosure also provides an alarm root cause determining device. The device will be described in detail below in connection with fig. 7.
Fig. 7 schematically illustrates a block diagram of a configuration of an alert root determining apparatus according to an embodiment of the present disclosure.
As shown in fig. 7, the alarm root determining apparatus 700 of this embodiment includes a first acquisition module 710, a processing module 720, a clipping module 730, and a first determining module 740.
The first obtaining module 710 is configured to obtain real-time alarm data generated by the information system in a first preset period, where the real-time alarm data includes N alarms and unique identification codes corresponding to the N alarms, and N is an integer greater than or equal to 2.
The processing module 720 is configured to input the real-time alarm data into the generating model, and output an initial alarm directed graph, where the generating model is configured to establish an association relationship between N alarms based on the unique identifier, and the initial alarm directed graph includes nodes corresponding to the N alarms and directed edges formed between the associated alarms.
And a shearing module 730, configured to shear the initial alarm directed graph to obtain a target alarm directed graph.
A first determining module 740, configured to determine a root cause of an alarm generated by the information system based on the target alarm directed graph.
According to an embodiment of the present disclosure, the processing module 720 includes a generate model training sub-module, the generate model training sub-module including: the device comprises a first acquisition unit, a first input unit, a first determination unit, a traversing unit, an output unit and a second acquisition unit.
The first acquisition unit is used for acquiring historical alarm data generated by the information system in a second preset period, wherein the historical alarm data comprises M first alarm samples, the occurrence time of the first alarm samples and unique identification codes corresponding to the M first alarm samples, and M is an integer greater than or equal to 2.
And the first input unit is used for inputting the historical alarm data into the initial model.
The classifying unit is used for classifying the first alarm samples to obtain classified first alarm samples.
The first determining unit is used for determining sub-alarm samples corresponding to all alarm times of the first alarm samples in a second preset period aiming at each type of the first alarm samples.
The traversing unit is used for traversing all the sub-alarm samples and determining a second alarm sample set generated by the information system in a third preset period, wherein the second alarm sample set comprises all second alarm samples with different types from the first alarm samples, and the third preset period is determined according to the alarm time of the sub-alarm samples and the second preset period.
And the output unit is used for outputting an initial alarm sample directed graph based on the second alarm sample set.
And the second acquisition unit is used for adjusting model parameters of the initial model according to the initial alarm sample directed graph and the real alarm sample directed graph so as to obtain a generated model.
According to an embodiment of the present disclosure, the second set of alert samples further includes: time stamp of the second alert sample relative to the sub-alert sample.
The traversing unit includes: repeating the subunits.
A repeating subunit, configured to repeatedly perform, for each sub-alarm sample, the following operations: acquiring all second alarm samples generated by the information system in a third preset period; determining a first accumulated alarm number of times of the second alarm sample before the sub-alarm sample occurs and a second accumulated alarm number of times of the second alarm sample after the sub-alarm sample occurs; and determining a time tag according to the first accumulated alarm times and the second accumulated alarm times.
According to an embodiment of the present disclosure, the initial alert sample directed graph includes nodes corresponding to the first alert sample and the second alert sample, directed edges having an association relationship between the nodes, and first weight values corresponding to the directed edges for characterizing the association degree, where the second alert sample set further includes:
A time tag of the second alert sample relative to the sub-alert sample;
the output unit includes: the device comprises a first determining subunit, a second determining subunit, a third determining subunit and an output subunit.
The first determining subunit is used for determining the direction of a line segment formed by the node corresponding to the first alarm sample and the node corresponding to each second alarm sample according to the time tag;
the second determining subunit is used for constructing a directional edge with time from front to back according to the direction and the nodes;
the third determining subunit is used for determining a first weight value corresponding to the directed edge according to the alarm frequency generated by each second alarm sample in a third preset period;
and the output subunit is used for outputting the initial alarm sample directed graph.
According to an embodiment of the present disclosure, the initial alert directed graph further includes: and a second weight value corresponding to the directed edge.
The shear module 730 includes: and a second determination unit.
And the second determining unit is used for cutting the directed edge corresponding to the second weight value to obtain a target alarm directed graph under the condition that the second weight value is determined not to meet the threshold value.
According to an embodiment of the present disclosure, the first determining module 740 includes: the third determining unit and the fourth determining unit.
And the third determining unit is used for determining the target root node according to the target alarm directed graph.
And the fourth determining unit is used for determining the root cause of the alarm generated by the information system according to the target root node.
According to an embodiment of the present disclosure, the processing module 720 further includes: a first input unit.
A first input unit for inputting real-time alert data into the generation model to perform the following operations: classifying the alarms according to the unique identification codes to obtain classified alarms; aiming at each type of alarms, determining sub alarms corresponding to all alarm times in a first preset period; traversing all sub alarms, and determining an alarm set generated by an information system in a fourth preset period, wherein the alarm set comprises all associated alarms different from the types of alarms, and the fourth preset period is determined according to the alarm time of the sub alarms and the first preset period; based on the alarm set, an initial alarm directed graph is output.
According to an embodiment of the present disclosure, any of the first acquisition module 710, the processing module 720, the clipping module 730, and the first determination module 740 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the present disclosure, at least one of the first acquisition module 710, the processing module 720, the clipping module 730, the first determination module 740 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable way of integrating or packaging the circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, at least one of the first acquisition module 710, the processing module 720, the clipping module 730, the first determination module 740 may be at least partially implemented as a computer program module, which when executed may perform the respective functions.
Fig. 8 schematically illustrates a block diagram of an electronic device adapted to implement an alert root determination method according to an embodiment of the present disclosure.
As shown in fig. 8, an electronic device 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 801 may also include on-board memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the disclosure.
In the RAM 803, various programs and data required for the operation of the electronic device 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 802 and/or the RAM 803. Note that the program may be stored in one or more memories other than the ROM 802 and the RAM 803. The processor 801 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, the electronic device 800 may also include an input/output (I/O) interface 805, the input/output (I/O) interface 805 also being connected to the bus 804. The electronic device 800 may also include one or more of the following components connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 802 and/or RAM 803 and/or one or more memories other than ROM 802 and RAM 803 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to perform the methods provided by embodiments of the present disclosure.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed, and downloaded and installed in the form of a signal on a network medium, and/or from a removable medium 811 via a communication portion 809. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (11)

1. An alarm root cause determining method, comprising:
acquiring real-time alarm data generated by an information system in a first preset period, wherein the real-time alarm data comprises N alarms and N unique identification codes respectively corresponding to the N alarms, and N is an integer greater than or equal to 2;
Inputting the real-time alarm data into a generating model, and outputting an initial alarm directed graph, wherein the generating model is used for establishing association relations among N alarms based on the unique identification codes, and the initial alarm directed graph comprises N nodes corresponding to the alarms and directed edges formed between the associated alarms;
shearing the initial warning directed graph to obtain a target warning directed graph; and
and determining the root cause of the alarm generated by the information system based on the target alarm directed graph.
2. The method of claim 1, wherein the training method of generating the model comprises:
acquiring historical alarm data generated by the information system in a second preset period, wherein the historical alarm data comprises M first alarm samples, the occurrence time of the first alarm samples and unique identification codes corresponding to the M first alarm samples, and M is an integer greater than or equal to 2;
inputting the historical alert data into an initial model to perform the following operations:
classifying the first alarm sample to obtain a classified first alarm sample;
determining sub-alarm samples corresponding to all alarm times of the first alarm samples in the second preset period aiming at each type of the first alarm samples;
Traversing all the sub-alarm samples, and determining a second alarm sample set generated by the information system in a third preset period, wherein the second alarm sample set comprises all second alarm samples with different types from the first alarm samples, and the third preset period is determined according to the alarm time of the sub-alarm samples and the second preset period;
outputting an initial alarm sample directed graph based on the second alarm sample set;
and adjusting model parameters of the initial model according to the initial alarm sample directed graph and the real alarm sample directed graph so as to obtain the generated model.
3. The method of claim 2, wherein the second set of alert samples further comprises: a time tag of the second alert sample relative to the sub-alert sample;
wherein the traversing all the sub-alert samples, determining a second alert sample set generated by the information system in the third preset period, includes:
for each sub-alarm sample, repeating the following operations:
acquiring all the second alarm samples generated by the information system in the third preset period;
Determining a first accumulated alarm number of times when the second alarm sample occurs before the sub-alarm sample occurs and a second accumulated alarm number of times when the second alarm sample occurs after the sub-alarm sample occurs;
and determining the time tag according to the first accumulated alarm times and the second accumulated alarm times.
4. The method of claim 2, wherein the initial alert sample directed graph includes nodes corresponding to the first alert sample and the second alert sample, directed edges having an association relationship between the nodes, and first weight values corresponding to the directed edges for characterizing a degree of association, the second alert sample set further comprising: a time tag of the second alert sample relative to the sub-alert sample;
wherein the outputting an initial alert sample directed graph based on the alert sample set includes:
determining the direction of a line segment formed by the node corresponding to the first alarm sample and the node corresponding to each second alarm sample according to the time tag;
constructing the directed edge with time from front to back according to the direction and the nodes;
Determining the first weight value corresponding to the directed edge according to the alarm frequency generated by each second alarm sample in a third preset period;
and outputting the initial alarm sample directed graph.
5. The method of claim 1, wherein the initial alert directed graph further comprises: the second weight value corresponds to the directed edge;
the step of shearing the initial warning directed graph to obtain a target warning directed graph comprises the following steps:
and under the condition that the second weight value does not meet a threshold value, shearing the directed edge corresponding to the second weight value to obtain the target alarm directed graph.
6. The method of claim 1, wherein the determining a root cause of the information system generating an alert based on the target alert directed graph comprises:
determining a target root node according to the target alarm directed graph;
and determining the root cause of the alarm generated by the information system according to the target root node.
7. The method of claim 1, wherein said inputting the real-time alert data into a generative model, outputting an initial alert directed graph, comprises:
inputting the real-time alert data into the generation model to perform the following operations:
Classifying the alarms according to the unique identification codes to obtain classified alarms;
determining sub alarms corresponding to all alarm times of the alarms in the first preset period aiming at each type of alarms;
traversing all the sub-alarms, and determining an alarm set generated by the information system in a fourth preset period, wherein the alarm set comprises all associated alarms different from the alarm type, and the fourth preset period is determined according to the alarm time of the sub-alarms and the first preset period;
and outputting the initial alarm directed graph based on the alarm set.
8. An alert root cause determining apparatus comprising:
the system comprises a first acquisition module, a second acquisition module and a control module, wherein the first acquisition module is used for acquiring real-time alarm data generated by an information system in a first preset period, the real-time alarm data comprises N alarms and N unique identification codes respectively corresponding to the N alarms, and N is an integer greater than or equal to 2;
the processing module is used for inputting the real-time alarm data into a generating model and outputting an initial alarm directed graph, wherein the generating model is used for establishing the association relation among N alarms based on the unique identification code, and the initial alarm directed graph comprises N nodes corresponding to the alarms and directed edges formed between the associated alarms;
The shearing module is used for shearing the initial warning directed graph to obtain a target warning directed graph; and
and the first determining module is used for determining the root cause of the alarm generated by the information system based on the target alarm directed graph.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-7.
10. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-7.
11. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 7.
CN202311068710.3A 2023-08-23 2023-08-23 Alarm root cause determining method, device, equipment and medium Pending CN116860507A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311068710.3A CN116860507A (en) 2023-08-23 2023-08-23 Alarm root cause determining method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311068710.3A CN116860507A (en) 2023-08-23 2023-08-23 Alarm root cause determining method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN116860507A true CN116860507A (en) 2023-10-10

Family

ID=88234374

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311068710.3A Pending CN116860507A (en) 2023-08-23 2023-08-23 Alarm root cause determining method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116860507A (en)

Similar Documents

Publication Publication Date Title
US11586972B2 (en) Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs
CN113965389B (en) Network security management method, device and medium based on firewall log
US11645540B2 (en) Deep graph de-noise by differentiable ranking
CN115913710A (en) Abnormality detection method, apparatus, device and storage medium
CN115174353A (en) Fault root cause determination method, device, equipment and medium
CN115237804A (en) Performance bottleneck assessment method, performance bottleneck assessment device, electronic equipment, medium and program product
US20210064813A1 (en) Event detection based on text streams
CN116739605A (en) Transaction data detection method, device, equipment and storage medium
CN116860507A (en) Alarm root cause determining method, device, equipment and medium
CN115514618A (en) Alarm event processing method and device, electronic equipment and medium
CN115269315A (en) Abnormity detection method, device, equipment and medium
CN114443437A (en) Alarm root cause output method, apparatus, device, medium, and program product
CN113900905A (en) Log monitoring method and device, electronic equipment and storage medium
CN111880959A (en) Abnormity detection method and device and electronic equipment
KR102656541B1 (en) Device, method and program that analyzes large log data using a distributed method for each log type
CN113138903B (en) Method and apparatus for tracking performance of a storage system
CN116225746A (en) Method, apparatus, device, storage medium and program product for determining system problem
CN115484150B (en) Alarm information processing method, system, equipment and storage medium
CN115378746B (en) Network intrusion detection rule generation method, device, equipment and storage medium
CN113656271B (en) Method, device, equipment and storage medium for processing abnormal behaviors of user
CN114237856A (en) Operation type identification method and device, electronic equipment and storage medium
CN117785625A (en) Method, device, equipment and storage medium for predicting server performance
CN117853136A (en) Customer examination processing method and device
CN114996119A (en) Fault diagnosis method, fault diagnosis device, electronic equipment and storage medium
CN115687284A (en) Information processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination