CN111404740A - Fault analysis method and device, electronic equipment and computer readable storage medium - Google Patents

Fault analysis method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN111404740A
CN111404740A CN202010169755.XA CN202010169755A CN111404740A CN 111404740 A CN111404740 A CN 111404740A CN 202010169755 A CN202010169755 A CN 202010169755A CN 111404740 A CN111404740 A CN 111404740A
Authority
CN
China
Prior art keywords
monitoring
alarm information
target
alarm
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010169755.XA
Other languages
Chinese (zh)
Inventor
董雷
彭建春
汪文超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Tower Co Ltd
Original Assignee
China Tower Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Tower Co Ltd filed Critical China Tower Co Ltd
Priority to CN202010169755.XA priority Critical patent/CN111404740A/en
Publication of CN111404740A publication Critical patent/CN111404740A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/064Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Alarm Systems (AREA)

Abstract

The invention provides a fault analysis method, a fault analysis device, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: performing correlation analysis on the alarm information acquired in the monitoring time window to acquire at least two first monitoring objects with correlation; wherein, the first monitoring object is a monitoring object with alarm in the monitoring time window; determining at least one alarm message of a target monitoring object in the associated link of the at least two first monitoring objects; wherein, the target monitoring object is the first called monitoring object in the associated link; and determining a fault result of the at least two first monitoring objects which are alarmed based on at least one piece of alarm information of the target monitoring object. The embodiment of the invention can avoid analyzing the alarm information in the monitoring time window one by one, thereby reducing troubleshooting and recovery time.

Description

Fault analysis method and device, electronic equipment and computer readable storage medium
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a fault analysis method and apparatus, an electronic device, and a computer-readable storage medium.
Background
The alarm monitoring system is an important component of the network communication system, and can detect alarm information in the network communication system through the alarm monitoring system so as to achieve the effect of avoiding faults or predicting the faults.
At present, the general working principle of the alarm monitoring system is as follows: and installing the computer system agent on a monitored host, wherein the computer system agent is responsible for regularly collecting various indexes and sending the indexes to a server, and when some indexes are higher than a threshold value, triggering an alarm and sending alarm information to operation and maintenance personnel.
However, the existing alarm monitoring system is generally only responsible for triggering alarms, sending alarm messages, and making report statistics. When a large amount of alarm information is generated simultaneously, operation and maintenance personnel need to distribute the alarm information to specific responsible persons one by one according to resource attribution, or analyze the alarm information one by one according to manual experience, so that troubleshooting and recovery time are long.
Disclosure of Invention
The embodiment of the invention provides a fault analysis method, a fault analysis device, electronic equipment and a computer readable storage medium, and aims to solve the problem that fault troubleshooting and recovery time are long.
In a first aspect, an embodiment of the present invention provides a fault analysis method, where the method includes:
performing correlation analysis on the alarm information acquired in the monitoring time window to acquire at least two first monitoring objects with correlation; wherein, the first monitoring object is a monitoring object with alarm in the monitoring time window;
determining at least one alarm message of a target monitoring object in the associated link of the at least two first monitoring objects; wherein, the target monitoring object is the first called monitoring object in the associated link;
and determining a fault result of the at least two first monitoring objects which are alarmed based on at least one piece of alarm information of the target monitoring object.
In a second aspect, an embodiment of the present invention further provides a fault analysis apparatus, where the apparatus includes:
the analysis module is used for carrying out correlation analysis on the alarm information acquired in the monitoring time window to acquire at least two first monitoring objects with correlation; wherein, the first monitoring object is a monitoring object with alarm in the monitoring time window;
the first determining module is used for determining at least one piece of alarm information of a target monitoring object in an associated link of the at least two first monitoring objects; wherein, the target monitoring object is the first called monitoring object in the associated link;
and the second determining module is used for determining a fault result of the alarm of the at least two first monitoring objects based on at least one piece of alarm information of the target monitoring object.
In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a processor, a memory, and a computer program stored on the memory and operable on the processor, where the computer program, when executed by the processor, implements the steps of the fault analysis method described above.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the steps of the fault analysis method.
In the embodiment of the invention, correlation analysis is carried out on the alarm information acquired in the monitoring time window to acquire at least two first monitoring objects with correlation; wherein, the first monitoring object is a monitoring object with alarm in the monitoring time window; determining at least one alarm message of a target monitoring object in the associated link of the at least two first monitoring objects; wherein, the target monitoring object is the first called monitoring object in the associated link; and determining a fault result of the at least two first monitoring objects which are alarmed based on at least one piece of alarm information of the target monitoring object.
Therefore, the alarm occurring in the monitoring time window is converged by performing correlation analysis on the alarm information in the monitoring time window so as to determine a target monitoring object causing the alarm of the first monitoring object in the monitoring time window, and a fault result causing the alarm of the first monitoring object in the monitoring time window is determined based on the alarm information of the target monitoring object. Therefore, the alarm information in the monitoring time window can be prevented from being analyzed one by one, and the troubleshooting and recovery time can be further reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a flow chart of a method for fault analysis provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of alarm messages received within a monitoring time window;
FIG. 3 is a diagram of call relationships between systems involved in an alarm work order;
fig. 4 is a structural diagram of a failure analysis apparatus provided in the embodiment of the present invention;
fig. 5 is a block diagram of an electronic device provided in the practice of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
First, a fault analysis method provided in an embodiment of the present invention is explained.
It should be noted that the fault analysis method provided by the embodiment of the present invention may be applied to an electronic device, and is configured to perform association analysis on alarm information obtained in a monitoring time window to determine a target monitoring object that causes an alarm to occur in a first monitoring object in the monitoring time window, and determine a fault result that causes the alarm to occur in the first monitoring object in the monitoring time window based on the alarm information of the target monitoring object, so as to achieve the purpose of reducing troubleshooting and recovery time.
Referring to fig. 1, fig. 1 is a flowchart of a fault analysis method provided by an embodiment of the present invention, as shown in fig. 1, including the following steps:
step 101, performing correlation analysis on alarm information acquired in a monitoring time window to acquire at least two first monitoring objects with correlation; wherein, the first monitoring object is a monitoring object with alarm in the monitoring time window;
step 102, determining at least one alarm message of a target monitoring object in an associated link of the at least two first monitoring objects; wherein, the target monitoring object is the first called monitoring object in the associated link;
step 103, determining a fault result of the at least two first monitoring objects that are alarmed based on at least one alarm message of the target monitoring object.
In step 101, time windows may be divided according to a certain time interval, where the monitoring time window is any one of the divided time windows, and the alarm information monitored in the alarm monitoring system is obtained in the monitoring time window. The monitoring time window may be set according to actual conditions, for example, may be set to 1 hour.
The alarm monitoring system can be used for monitoring whether the network system is abnormal or not, and sending alarm information under the condition that the monitoring object in the network system is abnormal. The alarm information may include an alarm time and an alarm description. Wherein the alarm description is used to describe an abnormal event of the monitored object, such as the alarm description may be an error click report of the module C1 in the system C.
The monitoring object may be a system, or may also be an electronic device, such as a server, a network device, and the like, which is not limited specifically herein.
And continuously collecting alarm information in the monitoring time window, and carrying out correlation analysis on the collected alarm information. Before the correlation analysis, repeated alarm information and unimportant alarm information can be filtered according to the alarm description in the alarm information, and then the correlation analysis is carried out on the alarm information.
There are many ways to perform correlation analysis on the alarm information. For example, whether the monitored object alarmed in the alarm information has a correlation or not may be queried based on a preset correlation rule base, so as to obtain at least two first monitored objects having a correlation in the monitoring time window. Specifically, firstly, systems and modules related to alarm information can be automatically extracted based on a deep learning algorithm; then, an association link including at least two monitoring objects alarmed in the alarm information may be queried in the association rule base, and finally, in case of matching the association link, at least two first monitoring objects having an association are obtained. Wherein the at least two first monitoring objects are present in the associated link.
For example, based on a deep learning algorithm, a system B, a system C, a system E and a system M related to an alarm are extracted from alarm information, wherein an association link exists in an association rule base, and is system E- > system D- > system C- > system B. At this time, it can be concluded that there is associated alarm information related to system B, system C, and system E. Meanwhile, the alarm information of the system M is filtered out.
The preset association rule base stores an association link with a strong association relationship, and the strong association relationship can be understood as an association relationship determined by knowing. For example, a call chain relationship between applications, a relationship between a database and an application server, a relationship between a network device and a network device, a relationship between a network device and an application server, a relationship between a host and a virtual machine, and the like.
In an optional embodiment, in many application scenarios, the monitored objects related to many alarm information do not have strong association, and the association is unknown. In this application scenario, an implicit relationship between alarm information needs to be found through a machine learning algorithm.
Specifically, two machine learning algorithms, namely, an association rule algorithm and a neural network algorithm, may be used, and these are briefly described below.
The association rule algorithm mainly carries out the practice of association analysis algorithms Apriori and FPgrowth. The two algorithms are similar in function, and can both find a frequent item set. The FPGrowth algorithm is found to be more efficient than the Apriori algorithm.
The association rule algorithm can divide time windows according to a certain time interval, calculate the frequency of the occurrence of various alarm information in each time window, and analyze the association relation of the monitoring objects related to the alarm information by finding out time connection.
The neural network algorithm can process long-term alarm information by using a long-term and short-term memory network. Time correlation exists among a large number of alarms caused by certain faults, so that historical alarm information can be used as samples to establish an alarm correlation analysis model through a long-term and short-term memory network. And analyzing the incidence relation of the monitoring objects related to the alarm information based on the established alarm correlation analysis model.
In step 102, the step 102 specifically includes:
generating alarm information of a second monitored object; wherein the second monitoring object is a monitoring object in the associated link except for the at least two first monitoring objects;
and determining at least one piece of alarm information of the target monitored object based on the alarm information of the at least two first monitored objects or the alarm information of the second monitored object.
Specifically, the alarm information of the second monitored object is alarm information derived from alarm information in a monitoring time window, and the second monitored object is located in the associated link. Although the alarm is not sent in the monitoring time window, the alarm information of the second monitored object is not monitored in consideration of time delay or due to some reason of an alarm monitoring system, and at this time, the alarm information of the second monitored object is generated.
If multiple related monitoring objects alarm simultaneously in the same monitoring time window, at this time, there is a large probability that there is an association relationship between the alarms, and the alarms occurring in the same monitoring time window due to the association relationship between the monitoring objects can be considered to be possibly caused by the same root cause, and the root cause can be converged to the first called monitoring object in the association link to have a fault. Therefore, at least one alarm information of the target monitored object may be determined based on the alarm information of the at least two first monitored objects or the alarm information of the second monitored object.
In an optional embodiment, the second monitoring object may be a first called monitoring object in an associated link, and at this time, at least one alarm information of the target monitoring object may be determined based on the alarm information of the second monitoring object. In another optional embodiment, the second monitoring object is not the first called monitoring object in the associated link, and at this time, a target monitoring object may be determined in the at least two first monitoring objects, and the alarm information of the target monitoring object may be obtained.
In step 103, the target monitoring object includes a first monitoring sub-object, and at least one alarm information of the target monitoring object includes an alarm information corresponding to the first monitoring sub-object;
the step 103 specifically includes:
determining target alarm information from the alarm information corresponding to the first monitoring sub-object; the target alarm information is the alarm information with the maximum alarm weight in the alarm information corresponding to the first monitoring sub-object;
and determining the fault result of the at least two first monitoring objects which are alarmed based on the target alarm information.
Specifically, the first monitoring sub-object may include at least one module, and may set an alarm weight for an alarm caused by each module, where the target alarm information is the alarm information with the largest alarm weight in the alarm information corresponding to the first monitoring sub-object. For example, the first monitoring sub-object includes module 1, module 2, and module 3, the alarm weights of the alarm settings caused by module 1, module 2, and module 3 are set to 80, 10, and 10, respectively, and since the alarm weight of the alarm caused by module 1 is the last, the target alarm information is the alarm information of module 1.
In an optional embodiment, after converging to the target monitoring object, in a case that the target monitoring object includes only one monitoring sub-object, the first monitoring sub-object abnormality is a fault result of the at least two first monitoring objects having alarms, that is, the first monitoring sub-object abnormality is a root cause of the at least two first monitoring objects having alarms, and more specifically, the module abnormality corresponding to the alarm with the largest alarm weight in the first monitoring sub-object is a root cause of the at least two first monitoring objects having alarms.
In another optional embodiment, the target monitoring object further includes a second monitoring sub-object, and at least one alarm information of the target monitoring object further includes alarm information corresponding to the second monitoring sub-object;
the determining, based on the target alarm information, a failure result of the at least two first monitored objects being alarmed includes:
determining whether the target alarm information is associated with the alarm information corresponding to the second monitoring sub-object or not based on the association relationship between the first monitoring sub-object and the second monitoring sub-object;
determining the abnormal fault result of the second monitoring sub-object under the condition that the target alarm information is associated with the alarm information corresponding to the second monitoring sub-object;
and determining the abnormal fault result of the first monitoring sub-object under the condition that the target alarm information is not associated with the alarm information corresponding to the second monitoring sub-object.
The target monitoring object further includes a second monitoring sub-object, and the first monitoring sub-object and the second monitoring sub-object may be associated, for example, the first monitoring sub-object calls the second monitoring sub-object, or the first monitoring sub-object applies the second monitoring sub-object, for example, the first monitoring sub-object is an application, the second monitoring sub-object is a database, and at this time, the application may query the database.
If the first monitoring sub-object and the second monitoring sub-object have an association relationship, it may be determined that the target alarm information is associated with the alarm information corresponding to the second monitoring sub-object to a great extent. For example, if the target alarm information is that the application a queries a database overtime, and the alarm information corresponding to the second monitoring sub-object is that the database is abnormal, at this time, since the target alarm information and the alarm information corresponding to the second monitoring sub-object are both associated with the database, the two alarm information are consistent, and the result of the fault that the second monitoring sub-object is abnormal can be converged to. That is to say, the abnormality of the second monitoring sub-object is the root cause of the alarm of the at least two first monitoring objects.
Of course, the abnormal fault result of the first monitoring sub-object is determined under the condition that the target alarm information is not associated with the alarm information corresponding to the second monitoring sub-object. That is to say, the abnormality of the first monitoring sub-object is the root cause of the alarm of the at least two first monitoring objects. For example, the target alarm information is a GC alarm, the alarm information corresponding to the second monitored sub-object is a database anomaly, and at this time, since the target alarm information is not associated with the alarm information corresponding to the second monitored sub-object, the two alarm information are inconsistent, and the abnormal fault result of the first monitored sub-object is determined.
And if the first monitoring sub-object and the second monitoring sub-object are not associated, determining the abnormal fault result of the first monitoring sub-object.
In the embodiment, correlation analysis is performed on the alarm information acquired in the monitoring time window to acquire at least two first monitoring objects with correlation; wherein, the first monitoring object is a monitoring object with alarm in the monitoring time window; determining at least one alarm message of a target monitoring object in the associated link of the at least two first monitoring objects; wherein, the target monitoring object is the first called monitoring object in the associated link; and determining a fault result of the at least two first monitoring objects which are alarmed based on at least one piece of alarm information of the target monitoring object.
Therefore, the alarm occurring in the monitoring time window is converged by performing correlation analysis on the alarm information in the monitoring time window so as to determine a target monitoring object causing the alarm of the first monitoring object in the monitoring time window, and a fault result causing the alarm of the first monitoring object in the monitoring time window is determined based on the alarm information of the target monitoring object. Therefore, the alarm information in the monitoring time window can be prevented from being analyzed one by one, and the troubleshooting and recovery time can be further reduced.
In order to better understand the whole process, the fault analysis method provided by the embodiment of the invention is described in detail in the following by way of specific embodiments.
Application embodiment 1
Referring to fig. 2, fig. 2 is a schematic diagram of alarm information received in a monitoring time window, and as shown in fig. 2, an alarm work order platform receives a plurality of pieces of alarm information and originates from a plurality of systems. At the moment, a deep learning algorithm is used for automatically extracting the system and the module related to the alarm work order based on the alarm description in the work order, wherein the alarm work order respectively relates to a system B, a system C, a system E and a system M.
Referring to fig. 3, fig. 3 is a calling relationship diagram between systems involved in an alarm work order, and as shown in fig. 3, the associated system involved in the associated alarm includes a system B, a system C and a system E, wherein a module for generating an alarm in the system B is a module B1 and a module B3, a module for generating an alarm in the system C is a module C1, and a module for generating an alarm in the system E is a module E3. Meanwhile, the alarm information of the system M is filtered.
The alarms of the system D are generated by the alarms of the system B, the system C and the system E, the alarms of the system D are not received at present, and may be caused by the user or the alarms of the alarm monitoring system, and the system D is supposed to generate the alarms.
If there are multiple monitoring objects with strong association simultaneously alarming in the same monitoring time window, there is a large probability that there is association between the alarming, and it can be deduced that the above alarming may belong to one root fault initiation, which may belong to any one of system B, system C, system D and system E, by the call relationship between the systems and the alarming occurring in the same monitoring time window.
The alarm of the system B, which is the alarm of the system B, is found according to the call chain, according to the alarm weights of the modules configured in advance, for example, the alarm weight configured by the module B1 is 80, the alarm weight configured by the module B2 is 10, the alarm weight configured by the module B3 is 10, and the alarm generated by the module B1 of the system B is most likely to be the root alarm based on the alarm weights configured by the modules.
And generating a root alarm list, and directly sending the root alarm list to a responsible person of the system B, wherein the responsible person of the system B can directly process the root alarm list. Therefore, communication between the processing personnel and each system developer is avoided, so that the processing personnel can be assisted to quickly locate the alarm root, and the troubleshooting and recovery time is shortened.
Application example two
The multiple systems carry out service calling through remote procedure calling, and the calling relation is as follows:
system D- > System C- > System B- > System A
Each system includes, among other things, an application and a database. When the database query of the system A is overtime, the alarm is advanced layer by layer, so that a plurality of overtime alarms are generated in the system B, the system C and the system D. At this time, if a plurality of monitoring objects having a strong association relationship alarm simultaneously within the same monitoring time window, there is a large probability that an association relationship exists between alarms.
At this time, the alarm of the system A is generated by the alarm dispatch of the system B, the system C and the system D, and the derived alarms are respectively database abnormal alarm of the database type and application abnormal alarm of the application type for inquiring the database.
The alarm at the end, i.e. the alarm of system a, is found from the call chain. And determining the alarm with high alarm weight as the target alarm according to the alarm weight configured by each alarm in the system A.
For example, there are two types of alarms in the system a, which are database alarms and GC alarms, respectively. The alarm weight of the database alarm is 90, the alarm weight of the GC alarm is 10, and at the moment, the database alarm is determined as the target alarm.
Because the application A and the database have an incidence relation, and the database abnormal alarm of the database type and the abnormal alarm of the application A query database of the application type are consistent, the two types of alarms are combined, and the database abnormal is determined to be the root alarm of the system B, the system C and the system D.
And generating a root alarm list, and directly sending the root alarm list to a responsible person of the system A, wherein the responsible person of the system A can directly process the root alarm list. Therefore, communication between the processing personnel and each system developer is avoided, so that the processing personnel can be assisted to quickly locate the alarm root, and the troubleshooting and recovery time is shortened.
The following describes a failure analysis device provided in an embodiment of the present invention.
Referring to fig. 4, fig. 4 is a structural diagram of a fault analysis apparatus according to an embodiment of the present invention, which can implement details of the fault analysis method and achieve the same effect. As shown in fig. 4, the failure analysis device 400 includes:
the analysis module 401 is configured to perform association analysis on the alarm information acquired in the monitoring time window to obtain at least two first monitoring objects with associations; wherein, the first monitoring object is a monitoring object with alarm in the monitoring time window;
a first determining module 402, configured to determine at least one alarm information of a target monitoring object in an associated link of the at least two first monitoring objects; wherein, the target monitoring object is the first called monitoring object in the associated link;
a second determining module 403, configured to determine, based on at least one alarm information of the target monitored object, a failure result that the at least two first monitored objects are alarmed.
Optionally, the target monitoring object includes a first monitoring sub-object, and at least one alarm information of the target monitoring object includes alarm information corresponding to the first monitoring sub-object;
the second determining module 403 includes:
the first determining unit is used for determining target alarm information from the alarm information corresponding to the first monitoring sub-object; the target alarm information is the alarm information with the maximum alarm weight in the alarm information corresponding to the first monitoring sub-object;
and the second determining unit is used for determining the fault result of the alarm of the at least two first monitoring objects based on the target alarm information.
Optionally, the target monitoring object further includes a second monitoring sub-object, and the at least one alarm information of the target monitoring object further includes alarm information corresponding to the second monitoring sub-object;
the second determining unit is specifically configured to determine whether the target alarm information is associated with alarm information corresponding to a second monitoring sub-object based on an association relationship between the first monitoring sub-object and the second monitoring sub-object; determining the abnormal fault result of the second monitoring sub-object under the condition that the target alarm information is associated with the alarm information corresponding to the second monitoring sub-object; and determining the abnormal fault result of the first monitoring sub-object under the condition that the target alarm information is not associated with the alarm information corresponding to the second monitoring sub-object.
Optionally, the first determining module 402 includes:
the generating unit is used for generating the alarm information of the second monitored object; wherein the second monitoring object is a monitoring object in the associated link except for the at least two first monitoring objects;
a third determining unit, configured to determine at least one piece of alarm information of the target monitored object based on the alarm information of the at least two first monitored objects or the alarm information of the second monitored object.
The failure analysis apparatus 400 can implement each process implemented by the electronic device in the failure analysis method embodiment, and can achieve the same technical effect, and for avoiding repetition, details are not described here again.
Referring to fig. 5, fig. 5 is a structural diagram of an electronic device provided in the implementation of the present invention, where the electronic device shown in fig. 5 includes: a processor 501, a memory 502 and a computer program stored on said memory 502 and executable on said processor, the various components in the electronic device being coupled together by a bus interface 503, said computer program realizing the following steps when executed by said processor 501:
performing correlation analysis on the alarm information acquired in the monitoring time window to acquire at least two first monitoring objects with correlation; wherein, the first monitoring object is a monitoring object with alarm in the monitoring time window;
determining at least one alarm message of a target monitoring object in the associated link of the at least two first monitoring objects; wherein, the target monitoring object is the first called monitoring object in the associated link;
and determining a fault result of the at least two first monitoring objects which are alarmed based on at least one piece of alarm information of the target monitoring object.
Optionally, the target monitoring object includes a first monitoring sub-object, and at least one alarm information of the target monitoring object includes alarm information corresponding to the first monitoring sub-object;
the processor 501 is specifically configured to:
the determining, based on the at least one alarm information of the target monitored object, a failure result of the at least two first monitored objects being alarmed includes:
determining target alarm information from the alarm information corresponding to the first monitoring sub-object; the target alarm information is the alarm information with the maximum alarm weight in the alarm information corresponding to the first monitoring sub-object;
and determining the fault result of the at least two first monitoring objects which are alarmed based on the target alarm information.
Optionally, the target monitoring object further includes a second monitoring sub-object, and the at least one alarm information of the target monitoring object further includes alarm information corresponding to the second monitoring sub-object;
the processor 501 is specifically configured to:
determining whether the target alarm information is associated with the alarm information corresponding to the second monitoring sub-object or not based on the association relationship between the first monitoring sub-object and the second monitoring sub-object;
determining the abnormal fault result of the second monitoring sub-object under the condition that the target alarm information is associated with the alarm information corresponding to the second monitoring sub-object;
and determining the abnormal fault result of the first monitoring sub-object under the condition that the target alarm information is not associated with the alarm information corresponding to the second monitoring sub-object.
Optionally, the processor 501 is specifically configured to:
generating alarm information of a second monitored object; wherein the second monitoring object is a monitoring object in the associated link except for the at least two first monitoring objects;
and determining at least one piece of alarm information of the target monitored object based on the alarm information of the at least two first monitored objects or the alarm information of the second monitored object.
Preferably, an embodiment of the present invention further provides an electronic device, which includes a processor, a memory, and a computer program that is stored in the memory and can be run on the processor, and when the computer program is executed by the processor, the computer program implements each process of the fault analysis method according to any one of the above method embodiments, and can achieve the same technical effect, and in order to avoid repetition, details are not described here again.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the fault analysis method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method of fault analysis, the method comprising:
performing correlation analysis on the alarm information acquired in the monitoring time window to acquire at least two first monitoring objects with correlation; wherein, the first monitoring object is a monitoring object with alarm in the monitoring time window;
determining at least one alarm message of a target monitoring object in the associated link of the at least two first monitoring objects; wherein, the target monitoring object is the first called monitoring object in the associated link;
and determining a fault result of the at least two first monitoring objects which are alarmed based on at least one piece of alarm information of the target monitoring object.
2. The method according to claim 1, wherein the target monitoring object includes a first monitoring sub-object, and the at least one alarm message of the target monitoring object includes an alarm message corresponding to the first monitoring sub-object;
the determining, based on the at least one alarm information of the target monitored object, a failure result of the at least two first monitored objects being alarmed includes:
determining target alarm information from the alarm information corresponding to the first monitoring sub-object; the target alarm information is the alarm information with the maximum alarm weight in the alarm information corresponding to the first monitoring sub-object;
and determining the fault result of the at least two first monitoring objects which are alarmed based on the target alarm information.
3. The method according to claim 2, wherein the target monitoring object further includes a second monitoring sub-object, and the at least one alarm message of the target monitoring object further includes an alarm message corresponding to the second monitoring sub-object;
the determining, based on the target alarm information, a failure result of the at least two first monitored objects being alarmed includes:
determining whether the target alarm information is associated with the alarm information corresponding to the second monitoring sub-object or not based on the association relationship between the first monitoring sub-object and the second monitoring sub-object;
determining the abnormal fault result of the second monitoring sub-object under the condition that the target alarm information is associated with the alarm information corresponding to the second monitoring sub-object;
and determining the abnormal fault result of the first monitoring sub-object under the condition that the target alarm information is not associated with the alarm information corresponding to the second monitoring sub-object.
4. The method of claim 1, wherein the determining at least one alarm message of the target monitoring object in the associated link of the at least two first monitoring objects comprises:
generating alarm information of a second monitored object; wherein the second monitoring object is a monitoring object in the associated link except for the at least two first monitoring objects;
and determining at least one piece of alarm information of the target monitored object based on the alarm information of the at least two first monitored objects or the alarm information of the second monitored object.
5. A fault analysis device, characterized in that the device comprises:
the analysis module is used for carrying out correlation analysis on the alarm information acquired in the monitoring time window to acquire at least two first monitoring objects with correlation; wherein, the first monitoring object is a monitoring object with alarm in the monitoring time window;
the first determining module is used for determining at least one piece of alarm information of a target monitoring object in an associated link of the at least two first monitoring objects; wherein, the target monitoring object is the first called monitoring object in the associated link;
and the second determining module is used for determining a fault result of the alarm of the at least two first monitoring objects based on at least one piece of alarm information of the target monitoring object.
6. The apparatus according to claim 5, wherein the target monitoring object includes a first monitoring sub-object, and the at least one alarm information of the target monitoring object includes an alarm information corresponding to the first monitoring sub-object;
the second determining module includes:
the first determining unit is used for determining target alarm information from the alarm information corresponding to the first monitoring sub-object; the target alarm information is the alarm information with the maximum alarm weight in the alarm information corresponding to the first monitoring sub-object;
and the second determining unit is used for determining the fault result of the alarm of the at least two first monitoring objects based on the target alarm information.
7. The apparatus according to claim 6, wherein the target monitoring object further includes a second monitoring sub-object, and the at least one alarm information of the target monitoring object further includes an alarm information corresponding to the second monitoring sub-object;
the second determining unit is specifically configured to determine whether the target alarm information is associated with alarm information corresponding to a second monitoring sub-object based on an association relationship between the first monitoring sub-object and the second monitoring sub-object; determining the abnormal fault result of the second monitoring sub-object under the condition that the target alarm information is associated with the alarm information corresponding to the second monitoring sub-object; and determining the abnormal fault result of the first monitoring sub-object under the condition that the target alarm information is not associated with the alarm information corresponding to the second monitoring sub-object.
8. The apparatus of claim 5, wherein the first determining module comprises:
the generating unit is used for generating the alarm information of the second monitored object; wherein the second monitoring object is a monitoring object in the associated link except for the at least two first monitoring objects;
a third determining unit, configured to determine at least one piece of alarm information of the target monitored object based on the alarm information of the at least two first monitored objects or the alarm information of the second monitored object.
9. An electronic device, comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the fault analysis method as claimed in any one of claims 1 to 4.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the fault analysis method according to any one of claims 1 to 4.
CN202010169755.XA 2020-03-12 2020-03-12 Fault analysis method and device, electronic equipment and computer readable storage medium Pending CN111404740A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010169755.XA CN111404740A (en) 2020-03-12 2020-03-12 Fault analysis method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010169755.XA CN111404740A (en) 2020-03-12 2020-03-12 Fault analysis method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111404740A true CN111404740A (en) 2020-07-10

Family

ID=71432358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010169755.XA Pending CN111404740A (en) 2020-03-12 2020-03-12 Fault analysis method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111404740A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113886182A (en) * 2021-09-29 2022-01-04 深圳市金蝶天燕云计算股份有限公司 Alarm convergence method and device, electronic equipment and storage medium
CN114118453A (en) * 2020-08-24 2022-03-01 南京南瑞继保电气有限公司 Power grid multi-source alarm unified management and control and intelligent analysis method
CN115529219A (en) * 2022-09-16 2022-12-27 中国工商银行股份有限公司 Alarm analysis method and device, computer readable storage medium and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243004A (en) * 2015-09-15 2016-01-13 浪潮集团有限公司 Failure resource detection method and apparatus
CN106034051A (en) * 2015-03-12 2016-10-19 腾讯科技(深圳)有限公司 Network monitoring data processing method and network monitoring data processing device
US20180324029A1 (en) * 2016-02-03 2018-11-08 Tencent Technology (Shenzhen) Company Limited Alarm information processing method and apparatus, system, and computer storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106034051A (en) * 2015-03-12 2016-10-19 腾讯科技(深圳)有限公司 Network monitoring data processing method and network monitoring data processing device
CN105243004A (en) * 2015-09-15 2016-01-13 浪潮集团有限公司 Failure resource detection method and apparatus
US20180324029A1 (en) * 2016-02-03 2018-11-08 Tencent Technology (Shenzhen) Company Limited Alarm information processing method and apparatus, system, and computer storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118453A (en) * 2020-08-24 2022-03-01 南京南瑞继保电气有限公司 Power grid multi-source alarm unified management and control and intelligent analysis method
CN113886182A (en) * 2021-09-29 2022-01-04 深圳市金蝶天燕云计算股份有限公司 Alarm convergence method and device, electronic equipment and storage medium
CN115529219A (en) * 2022-09-16 2022-12-27 中国工商银行股份有限公司 Alarm analysis method and device, computer readable storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN111404740A (en) Fault analysis method and device, electronic equipment and computer readable storage medium
US9292408B2 (en) Automated detection of a system anomaly
US9967169B2 (en) Detecting network conditions based on correlation between trend lines
US11966319B2 (en) Identifying anomalies in a data center using composite metrics and/or machine learning
CN111176879A (en) Fault repairing method and device for equipment
CN111814999B (en) Fault work order generation method, device and equipment
CN110716842B (en) Cluster fault detection method and device
CN104639368A (en) Method and device for processing faults of communications network equipment
CN113037575B (en) Network element abnormal root cause positioning method and device, electronic equipment and storage medium
US20140325276A1 (en) Apparatus, method and storage medium for fault cause extraction utilizing performance values
CN107239388A (en) A kind of monitoring alarm method and system
JP2015028700A (en) Failure detection device, failure detection method, failure detection program and recording medium
CN116049146B (en) Database fault processing method, device, equipment and storage medium
Folmer et al. Detection of temporal dependencies in alarm time series of industrial plants
KR101281456B1 (en) Apparatus and method for anomaly detection in SCADA network using self-similarity
CN113704018A (en) Application operation and maintenance data processing method and device, computer equipment and storage medium
CN113656252A (en) Fault positioning method and device, electronic equipment and storage medium
US10110440B2 (en) Detecting network conditions based on derivatives of event trending
CN110609761B (en) Method and device for determining fault source, storage medium and electronic equipment
CN112256470A (en) Fault server positioning method and device, storage medium and electronic equipment
CN113259322B (en) Method, system and medium for preventing Web service abnormity
CN113285824B (en) Method and device for monitoring security of network configuration command
EP3457609B1 (en) System and method for computing of anomalies based on frequency driven transformation and computing of new features based on point anomaly density
CN113254313A (en) Monitoring index abnormality detection method and device, electronic equipment and storage medium
Hugo et al. Synthesis and evaluation of an Industry 4.0 control room

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 101, floors 1-3, building 14, North District, yard 9, dongran North Street, Haidian District, Beijing 100029

Applicant after: CHINA TOWER Co.,Ltd.

Address before: 100142 19th floor, 73 Fucheng Road, Haidian District, Beijing

Applicant before: CHINA TOWER Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200710