CN110493042B - Fault diagnosis method and device and server - Google Patents
Fault diagnosis method and device and server Download PDFInfo
- Publication number
- CN110493042B CN110493042B CN201910757406.7A CN201910757406A CN110493042B CN 110493042 B CN110493042 B CN 110493042B CN 201910757406 A CN201910757406 A CN 201910757406A CN 110493042 B CN110493042 B CN 110493042B
- Authority
- CN
- China
- Prior art keywords
- node
- fault
- alarm
- topology
- alarm information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0677—Localisation of faults
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/12—Discovery or management of network topologies
Abstract
The invention provides a fault diagnosis method, a fault diagnosis device and a server. The fault diagnosis method of the invention comprises the steps of establishing the topology of a network system; acquiring alarm information in a network system; associating the alarm information with the nodes in the topology to determine a fault link; and determining a fault root node in the fault link according to a preset fault probability calculation rule. The method determines the fault link by associating the alarm information with the topology of the network system, thereby rapidly acquiring the service influence range and degree caused by the alarm, and determining the fault root node in the fault link according to the preset rule, thereby realizing the automatic positioning and accurate positioning of the fault and improving the efficiency of fault processing.
Description
Technical Field
The present invention relates to network technologies, and in particular, to a fault diagnosis method, apparatus, and server.
Background
In various network systems, a large number of different types of network equipment exist, alarm information can be generated when the network equipment fails, and maintenance personnel can check the alarm information and timely find and process the faults existing in the system.
In the existing alarm processing system, usually, each alarm generated on each device is sent out an alarm message of the device, and if the alarm is frequently given out on each device due to the repeated reason, a plurality of repeated alarms are combined to send out an alarm message of the device.
For a network system with a complex structure and a large number of devices, because the alarm information sent by each device is not associated with each other, the operation condition of other devices associated with the device cannot be determined through the alarm information of one device, and when a maintenance worker locates a fault, the maintenance worker often needs to check a large amount of alarm information of a plurality of devices to determine the root cause of the fault, which is relatively low in efficiency.
Disclosure of Invention
The invention provides a fault diagnosis method, a fault diagnosis device and a server, which are used for quickly determining a fault root node in a network system and improving the fault positioning efficiency.
The invention provides a fault diagnosis method, which comprises the following steps:
establishing the topology of a network system;
acquiring alarm information in a network system;
associating the alarm information with the nodes in the topology to determine a fault link;
and determining a fault root node in the fault link according to a preset fault probability calculation rule.
Optionally, the associating the alarm information with the node in the topology to determine the failed link includes:
the alarm information is associated with the nodes in the topology, and the alarm nodes in the topology are determined;
and determining the link formed by the alarm node and the father node and/or the child node of the alarm node as a fault link.
Optionally, the determining a failure root node in the failed link according to a preset failure probability calculation rule includes:
calculating the fault probability of each node according to the alarm information of each node in the fault link, the fault probability of the child node of each node and the number of the child nodes of each node;
respectively judging whether the fault probability of each node is greater than or equal to the fault threshold corresponding to each node;
if the fault probability of the first node is larger than or equal to the fault threshold corresponding to the first node, adding the first node into a candidate list;
and determining the node with the highest level in the candidate list as the fault root node.
Optionally, the calculating the failure probability of each node according to the alarm information of each node in the failed link, the failure probability of the child node of each node, and the number of child nodes includes:
calculating the failure probability of each node in the failure link according to the following formula:
wherein, P is the fault probability of each node; a is 0 or 1, wherein if each node has alarm information, A is 1, and if each node has no alarm information, A is 0; sum is the sum of the failure probabilities of the child nodes of each node; the count is the number of child nodes of each node.
Optionally, the establishing a topology of the network system includes:
acquiring network element information of each device in a network system within a preset time period, wherein the network element information of each device comprises a hierarchy of each device in the network system;
performing upper and lower level association on the network element information of different types of equipment;
and generating a visual topology according to the associated network element information.
Optionally, the performing upper and lower level association on the network element information of the different types of devices includes:
and determining the next-level equipment of each equipment layer by layer from the equipment at the highest level to the lowest level by adopting a traversal method so as to perform upper-level and lower-level association on the network element information of different types of equipment.
Optionally, the method further includes:
and merging the alarm information of the fault root node and the child nodes of the fault root node to carry out associated alarm.
The invention provides a fault diagnosis device, comprising:
the establishing module is used for establishing the topology of the network system;
the acquisition module is used for acquiring alarm information in a network system;
the association module is used for associating the alarm information with the nodes in the topology and determining a fault link;
and the determining module is used for determining a fault root node in the fault link according to a preset fault probability calculation rule.
Optionally, the association module is configured to:
associating the alarm information with the nodes in the topology, and determining the alarm nodes in the topology;
and determining the link formed by the alarm node and the father node and/or the child node of the alarm node as a fault link.
Optionally, the determining module is configured to:
calculating the fault probability of each node according to the alarm information of each node in the fault link, the fault probability of the child node of each node and the number of the child nodes of each node;
respectively judging whether the fault probability of each node is greater than or equal to the fault threshold corresponding to each node;
if the fault probability of the first node is larger than or equal to the fault threshold corresponding to the first node, adding the first node into a candidate list;
and determining the node with the highest level in the candidate list as the fault root node.
Optionally, the determining module is specifically configured to:
calculating the failure probability of each node in the failed link according to the following formula:
wherein, P is the failure probability of each node; a is 0 or 1, wherein if each node has alarm information, A is 1, and if each node has no alarm information, A is 0; sum is the sum of the failure probabilities of the child nodes of each node; the count is the number of child nodes of each node.
Optionally, the establishing module is configured to:
acquiring network element information of each device in a network system within a preset time period, wherein the network element information of each device comprises a hierarchy of each device in the network system;
performing upper and lower level association on the network element information of different types of equipment;
and generating a visualized topology according to the associated network element information.
Optionally, the establishing module is specifically configured to:
and determining the next-level equipment of each equipment layer by layer from the equipment at the highest level to the lowest level by adopting a traversal method so as to perform upper-level and lower-level association on the network element information of different types of equipment.
Optionally, the apparatus further comprises:
and the alarm module is used for combining the alarm information of the fault root node and the alarm information of the child nodes of the fault root node to carry out associated alarm.
The present invention provides a server comprising: a memory and a processor; the memory is connected with the processor;
the memory for storing a computer program;
the processor is configured to implement the fault diagnosis method as described above when the computer program is executed.
The present invention provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the fault diagnosis method as described above.
The invention provides a fault diagnosis method, a fault diagnosis device and a server. The method comprises the steps of establishing the topology of a network system; acquiring alarm information in a network system; associating the alarm information with the nodes in the topology to determine a fault link; and determining a fault root node in the fault link according to a preset fault probability calculation rule. The method determines the fault link by associating the alarm information with the topology of the network system, thereby rapidly acquiring the service influence range and degree caused by the alarm, and determining the fault root node in the fault link according to the preset rule, thereby realizing the automatic positioning and accurate positioning of the fault and improving the efficiency of fault processing.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below of the drawings required for the embodiments or the technical solutions in the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a first schematic flow chart of a fault diagnosis method provided by the present invention;
fig. 2 is a schematic flow chart of a fault diagnosis method provided by the present invention;
FIG. 3 is a schematic diagram of a network system according to the present invention;
fig. 4 is a third schematic flow chart of a fault diagnosis method provided by the present invention;
fig. 5 is a schematic structural diagram of a fault diagnosis device provided by the present invention;
fig. 6 is a schematic structural diagram of a server according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
When the network equipment fails, alarm information can be generated, and maintenance personnel can check the alarm information and find and process the failure in the system. In the existing alarm processing system, usually, each alarm generated on each device is sent out an alarm message of the device, and if the alarm is frequently given out on each device due to the repeated reason, a plurality of repeated alarms are combined to send out an alarm message of the device.
For a network system with a complex structure and a large amount of equipment, because the alarm information sent by each equipment is not associated with each other, the operation condition of other equipment associated with the equipment cannot be determined through the alarm information of one equipment, and when a maintenance worker locates a fault, the maintenance worker often needs to check a large amount of alarm information of a plurality of equipment to determine the root cause of the fault, which is relatively low in efficiency. To solve the above problems, the present invention provides a fault diagnosis method.
Fig. 1 is a first schematic flow chart of a fault diagnosis method provided by the present invention. The execution subject of the method is a fault diagnosis device, which can be implemented by software and/or hardware, for example, the device can be a server. As shown in fig. 1, the method includes:
s101, establishing the topology of the network system.
The topology of the network system is presented in a multi-branch tree structure for visually showing the connection relationship between the devices in the network system. The network element information of each device in the network system includes the type of the device, the level of the device in the network system, the association relationship between the device and other devices, and the like, and according to the acquired network element information of each device, the devices with the same type and the same level can be placed in the same layer, and the devices are associated layer by layer to form a topology of a multi-branch tree structure. In practical application, because the structure of the network system changes dynamically, the network element information of the device can be acquired according to a certain time slice to establish the topology, and the time slice can be set according to actual needs.
S102, acquiring alarm information in the network system.
S103, associating the alarm information with the nodes in the topology, and determining a fault link.
The network system may have an alarm acquisition system for acquiring fault alarm information generated on each device, and the fault diagnosis apparatus may acquire the alarm information for a period of time from the alarm acquisition system. Then the alarm information is associated with the nodes in the topology, and the alarm nodes in the topology are determined; and determining the link formed by the alarm node and the father node and/or the child node of the alarm node as a fault link. During a period of time, one or more devices generating alarms may associate the alarm information with nodes in the topology, and the nodes of the alarm devices and other nodes associated with the alarm devices together form a failed link, and a certain node in the failed link may be a root cause node of the failure.
And S104, determining a fault root node in the fault link according to a preset fault probability calculation rule.
The failed link includes the node that generated the alarm and the parent and/or child nodes associated with the alarm node, and there may be one of the nodes that is the root cause node of the failure. And according to the obtained alarm information of each node, the fault probability of each node can be calculated according to a preset fault probability rule, and a fault root node is determined according to the obtained fault probability. The failure probability rule can be set according to the actual condition of the network system.
The fault diagnosis method provided by the embodiment comprises the steps of establishing a topology of a network system; acquiring alarm information in a network system; associating the alarm information with the nodes in the topology to determine a fault link; and determining a fault root node in the fault link according to a preset fault probability calculation rule. The method determines the fault link by associating the alarm information with the topology of the network system, thereby rapidly acquiring the service influence range and degree caused by the alarm, and realizes the automatic positioning and accurate positioning of the fault by determining the fault root node in the fault link according to the preset rule, thereby improving the efficiency of fault processing.
On the basis of the embodiment shown in fig. 1, a method for establishing a topology of a network system in S101 is described with reference to a specific example. Fig. 2 is a schematic flow diagram of a fault diagnosis method provided by the present invention. As shown in fig. 2, in S101, a topology of the network system is established, which includes:
s201, network element information of each device in the network system in a preset time period is obtained, wherein the network element information of each device comprises a hierarchy of each device in the network system.
The present embodiment may further include a resource acquisition system, or referred to as a resource system, where the resource acquisition system is configured to acquire, from the physical device, Network element information of each device, such as a port, an IP, a service type, a device, a bandwidth rate, and the like, and data types of the Network element information of different types of devices are different, and the Network element information is data in a Simple Network Management Protocol (SNMP) format. And the fault diagnosis device acquires the network element information of each device in a preset time period from the resource acquisition system. In the specific implementation, the network element information acquired from the resource acquisition system is placed in a cache according to the time slice, the data stored in the cache is analyzed, whether the data accords with the preset rule of the equipment network element information is judged, the information which accords with the rule is temporarily stored in a List set, the List is transmitted into a classification function as a parameter, whether elements exist in the List set is judged firstly, when the elements exist, the data type of each element is judged according to the data format agreed with the resource acquisition system, and the network element information data classified according to the data type is packaged and stored in Map.
For example, there are four types of devices in a monitoring system, namely, a SWITCH (SWITCH), an Optical Line Terminal (OLT), a Message Decoder Unit (MDU), and a CAMERA (CAMERA). The SWITCH uses hardware to complete the tasks of filtering, learning and forwarding processes by using software by a network bridge, and can also disassemble the network into network branches, segment network data streams and isolate faults occurring in the branches, so that the data information flow of each network branch can be reduced, each network is more effective, and the efficiency of the whole network is improved. The OLT is used to connect the terminal equipment of the fiber trunk. The MDU allows two or more physical links to be established between two switching devices, and it can bind all physical connections between two switching devices to a virtual transmission link, and the data exchange between the switches is performed by the virtual transmission link.
The resource acquisition system can acquire the network element information of each device in the monitoring system, wherein the SWITCH is at the highest level of the monitoring system, the OLT is connected below the SWITCH, the MDU is connected below the OLT, and the CAMERA is connected below the MDU. And the fault diagnosis device classifies and encapsulates the network element information according to the method in the process after acquiring the network element information of the four types of equipment from the resource acquisition system. The following is an example of a code for acquiring network element information data and performing classification and encapsulation on the data:
s202, performing upper and lower level association on the network element information of the different types of equipment.
And S203, generating a visual topology according to the associated network element information.
The network element information of the equipment comprises the level of the equipment in the network system and other equipment associated with the equipment, and the next-level equipment of each equipment is determined layer by layer from the equipment at the highest level to the lowest level by adopting a traversal method so as to perform upper and lower level association on the network element information of different types of equipment.
The structure of the network topology is that a chain of a lower-layer multi-branch tree structure extends upwards, and a multi-branch tree structure model is obtained by traversing and hanging data information on the chain. In concrete implementation, the network element information data in the cache is read into a RootNode instance, when a topology is created, one RootNode instance is copied, and a direct traversal and/or parallel traversal method is used according to the size of the data volume: when the data volume is small and the data structure is simple, direct traversal is used, and the consumption caused by the program is reduced; when the data volume is large and the data structure is complex, parallel traversal is used, and the traversal rate is improved. And when the time slice and the thread lock are used for performing data traversal on the network element information in the acquired time slice, firstly checking whether the thread lock exists, if so, skipping, temporarily not updating the multi-branch tree structure, and performing data streaming processing. And transmitting the data classified and packaged according to different types as parameters into corresponding hooking functions, wherein the hooking functions are used for associating the network element information data of different types, restoring nodes of the multi-branch tree structure and generating a visual topology after finishing.
Illustratively, the number of CAMERAs in the monitoring system is 18000, and when the MDU is matched with the CAMERA, parallel traversal is used; the number of the MDUs exceeds 100, and when the MDUs are matched with the OLT, parallel traversal is used; when the OLT is matched with the SWITCH, the direct traversal is used due to the small installation scale. The traversal of the whole tree structure is carried out by taking the optimal data volume and program operation as the reference to select a traversal method, and finally the hitching of SWITCHES → OLTs → MDUs → CAMERAs is completed. And after traversing and hitching are completed, a visual topological graph can be obtained. Fig. 3 is a schematic topology diagram of a network system provided in the present invention. Fig. 3 is only illustrated with 2 switches and child nodes of the 2 switches, and the structure of the topology obtained in practical application is determined by actual devices in the network system. An example of code for restoring a topology is as follows:
according to the fault diagnosis method provided by the embodiment, the network element information in the preset time period is acquired from the resource acquisition system, and the network element information of different types of equipment is connected in a traversing method, so that the topology of the network system in the preset time period can be accurately generated.
Based on the embodiment shown in fig. 1, the determination of the failure root node in the failed link according to the preset failure probability calculation rule in S104 is further described. Fig. 4 is a third schematic flow chart of a fault diagnosis method provided by the present invention. As shown in fig. 4, determining a failure root node in a failed link according to a preset failure probability calculation rule in S104 includes:
s401, according to the alarm information of each node in the fault link, the fault probability of the child node of each node and the number of the child nodes of each node, calculating the fault probability of each node.
The fault probability of each node in the topology is related to not only the alarm information of the node itself, but also the child nodes of the node, and it can be understood that if a plurality of child nodes of the node all generate alarm information, the probability that the node generates a fault is also high. The failure probability of each node in the failed link can be specifically calculated by the following formula:
wherein, P is the failure probability of each node; a is 0 or 1, wherein if each node has alarm information, A is 1, and if each node has no alarm information, A is 0; sum is the sum of the failure probabilities of the child nodes of each node; the count is the number of child nodes of each node.
When the fault probability of each node in a fault link is calculated, the calculation is started from the node of the lowest level, the fault probability of each node is reported to a father node after the fault probability of each node is determined, and the fault probability of the father node is determined according to the alarm information of the father node, the fault probability of the child nodes and the number of the child nodes until all the nodes are calculated.
For example, it is assumed that in the topology shown in fig. 3, alarm information is generated on each of the OLT1, the MDU1, and the MDU2, and no alarm information is generated by other nodes. The links formed by the SWTCH1 and its child nodes are faulty links, the probability of CAMERA fault at the lowest level in the links is 0, the probability of fault at MDU1 and MDU2 is 100, and the probability of fault at other MDUs is 0. The probability of failure of OLT1 isOLT2 has a failure probability of 0 and SWITCH1 has a failure probability of 0The ratio was 140/3.
S402, respectively judging whether the fault probability of each node is larger than or equal to the fault threshold corresponding to each node.
And S403, if the fault probability of the first node reaches the fault threshold corresponding to the first node, adding the first node into a candidate list.
S404, determining the node with the highest level in the candidate list as the fault root node.
Nodes of different levels in the topology, namely different types of devices, have different failure threshold values, it is determined whether the failure probability of each node in the failed nodes is greater than or equal to the failure threshold value of the level where the node is located, if the failure probability of the first node is greater than or equal to the failure threshold value of the level where the node is located, namely the failure probability of the first node is higher, the first node is added into a candidate list, the nodes in the candidate list are candidates of a failed root node, and then the node with the highest level in the candidate list is determined to be the failed root node, and the failed root node can also be called as a maximum common point.
Assuming that the failure threshold of the SWITCH is 70, the failure threshold of the OLT is 120, and the failure threshold of the MDU is 100, the candidate list in the above example includes MDU1, MDU2, and OLT1, where OLT1 is the highest node in the hierarchy, and thus OLT1 is determined as the failure root node.
According to the fault diagnosis method provided by the embodiment, the fault probability of each node is calculated according to the alarm information of each node in the fault link, the fault probability of the child node of each node and the number of the child nodes of each node, the fault probability of each node is compared and judged, and the fault root node is determined, so that the root cause of the fault is quickly positioned in a complex network system, and the fault maintenance efficiency is improved.
The following is an example of a code for hooking alarm information to topology and calculating fault probability to determine the maximum common point:
1. the alarm information is matched with the topology hanging connection, and whether the equipment ID receiving the alarm is on the topology is judged.
2. Calculating fault probability and reporting buffer warning node
3. And traversing the candidate list, wherein if the father node of the current node has no alarm, the current node is the root.
On the basis of the above embodiment, the fault diagnosis method further includes: and combining the alarm information of the fault root node and the child nodes of the fault root node to carry out associated alarm. For example, the alarm information includes that the single board software is not operated normally, a cluster stack member fails, a member link delay difference of Mp-group exceeds a threshold value, an ethernet physical interface (ETPI) LOSs of signal (LOS), a system power failure, and the like. The parent alarm and the child alarm are associated, the alarm hierarchical relation can be developed and presented layer by layer, and the alarm association relation among the cross-equipment can be displayed in an associated manner.
The fault root node determined in the above embodiment is the OLT1, and the child nodes MDU1 and MDU2 also generate alarm information, and perform a correlated alarm on the alarm information of the OLT1, MDU1 and MDU2, so that the root cause of the fault and the influence range of the fault can be displayed more intuitively, and a maintenance person can perform fault maintenance in time. Meanwhile, through the correlated alarm, the alarm number in the network system can be reduced, so that maintenance personnel can conveniently browse alarm information, and the maintenance efficiency is improved.
Fig. 5 is a schematic structural diagram of a fault diagnosis device provided by the present invention. As shown in fig. 5, the failure diagnosis apparatus 50 includes:
an establishing module 501, configured to establish a topology of a network system;
an obtaining module 502, configured to obtain alarm information in a network system;
the association module 503 is configured to associate the alarm information with a node in the topology, and determine a faulty link;
a determining module 504, configured to determine a failure root node in a failed link according to a preset failure probability calculation rule.
Optionally, the associating module 503 is configured to:
the alarm information is associated with the nodes in the topology, and the alarm nodes in the topology are determined;
and determining links formed by the alarm node and the father node and/or the child node of the alarm node as fault links.
Optionally, the determining module 504 is configured to:
calculating the fault probability of each node according to the alarm information of each node in the fault link, the fault probability of the child node of each node and the number of the child nodes of each node;
respectively judging whether the fault probability of each node is greater than or equal to the fault threshold corresponding to each node;
if the fault probability of the first node is larger than or equal to the fault threshold corresponding to the first node, adding the first node into a candidate list;
and determining the node with the highest level in the candidate list as the fault root node.
Optionally, the determining module 504 is specifically configured to:
calculating the failure probability of each node in the failed link according to the following formula:
wherein, P is the failure probability of each node; a is 0 or 1, wherein if each node has alarm information, A is 1, and if each node has no alarm information, A is 0; sum is the sum of the failure probabilities of the child nodes of each node; the count is the number of child nodes of each node.
Optionally, the establishing module 501 is configured to:
acquiring network element information of each device in a network system within a preset time period, wherein the network element information of each device comprises a hierarchy of each device in the network system;
performing upper and lower level association on the network element information of different types of equipment;
and generating a visualized topology according to the associated network element information.
Optionally, the establishing module 501 is specifically configured to:
and determining the next-level equipment of each equipment layer by layer from the equipment at the highest level to the lowest level by adopting a traversal method so as to perform upper-level and lower-level association on the network element information of different types of equipment.
Optionally, the apparatus 50 further comprises:
and an alarm module 505, configured to combine the alarm information of the failed root node and the alarm information of the child nodes of the failed root node to perform a related alarm.
The apparatus of this embodiment may be used to execute the technical solutions of the method embodiments shown in fig. 1, fig. 2, or fig. 4, and the implementation principles and technical effects thereof are similar and will not be described herein again.
Fig. 6 is a schematic structural diagram of a server according to the present invention. As shown in fig. 6, the server 60 includes: a memory 601 and a processor 602; the memory 601 is connected to the processor 602.
A memory 601 for storing a computer program;
a processor 602 for implementing the fault diagnosis method as described above when the computer program is executed.
The present invention provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the fault diagnosis method as described above.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and these modifications or substitutions do not depart from the spirit of the corresponding technical solutions of the embodiments of the present invention.
Claims (7)
1. A fault diagnosis method, comprising:
establishing the topology of a network system;
acquiring alarm information in a network system;
associating the alarm information with the nodes in the topology to determine the alarm nodes in the topology;
determining links formed by the alarm nodes and father nodes and/or child nodes of the alarm nodes as fault links;
calculating the fault probability of each node according to the alarm information of each node in the fault link, the fault probability of the child node of each node and the number of the child nodes of each node;
respectively judging whether the fault probability of each node is greater than or equal to a fault threshold corresponding to each node;
if the fault probability of a first node is larger than or equal to a fault threshold corresponding to the first node, adding the first node into a candidate list;
determining a node with the highest level in the candidate list as a fault root node;
wherein, the calculating the fault probability of each node according to the alarm information of each node in the fault link, the fault probability of the child node of each node and the number of the child nodes comprises:
calculating the failure probability of each node in the failed link according to the following formula:
wherein P is the fault probability of each node; a is 0 or 1, wherein if each node has alarm information, A is 1, and if each node has no alarm information, A is 0; sum is the sum of the failure probabilities of the child nodes of each node; the count is the number of child nodes of each node.
2. The method of claim 1, wherein establishing the topology of the network system comprises:
acquiring network element information of each device in a network system within a preset time period, wherein the network element information of each device comprises a hierarchy of each device in the network system;
performing upper and lower level association on the network element information of different types of equipment;
and generating a visualized topology according to the associated network element information.
3. The method of claim 2, wherein the performing the upper and lower level association on the network element information of the different types of devices comprises:
and determining the next-level equipment of each equipment layer by layer from the equipment at the highest level to the lowest level by adopting a traversal method so as to associate the network element information of different types of equipment at the upper level and the lower level.
4. The method according to any one of claims 1-3, further comprising:
and merging the alarm information of the fault root node and the child nodes of the fault root node to carry out associated alarm.
5. A failure diagnosis device characterized by comprising:
the establishing module is used for establishing the topology of the network system;
the acquisition module is used for acquiring alarm information in a network system;
the correlation module is used for correlating the alarm information with nodes in the topology to determine a fault link;
the determining module is used for determining a fault root node in a fault link according to a preset fault probability calculation rule;
the determining module is specifically configured to: calculating the fault probability of each node according to the alarm information of each node in the fault link, the fault probability of the child node of each node and the number of the child nodes of each node;
respectively judging whether the fault probability of each node is greater than or equal to a fault threshold corresponding to each node;
if the fault probability of the first node is larger than or equal to the fault threshold corresponding to the first node, adding the first node into a candidate list;
determining a node with the highest level in the candidate list as the fault root node;
the association module is specifically configured to:
the alarm information is associated with the nodes in the topology, and the alarm nodes in the topology are determined;
determining a link formed by the alarm node and a father node and/or a child node of the alarm node as a fault link;
the determining module is specifically configured to:
calculating the failure probability of each node in the failed link according to the following formula:
wherein, P is the failure probability of each node; a is 0 or 1, wherein if each node has alarm information, A is 1, and if each node has no alarm information, A is 0; sum is the sum of the failure probabilities of the child nodes of each node; the count is the number of child nodes of each node.
6. A server, comprising: a memory and a processor; the memory is connected with the processor;
the memory for storing a computer program;
the processor, when being executed by a computer program, is adapted to implement the fault diagnosis method of any one of the preceding claims 1-4.
7. A storage medium having stored thereon a computer program for implementing the method of fault diagnosis according to any one of claims 1-4 when executed by a processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910757406.7A CN110493042B (en) | 2019-08-16 | 2019-08-16 | Fault diagnosis method and device and server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910757406.7A CN110493042B (en) | 2019-08-16 | 2019-08-16 | Fault diagnosis method and device and server |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110493042A CN110493042A (en) | 2019-11-22 |
CN110493042B true CN110493042B (en) | 2022-09-13 |
Family
ID=68551384
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910757406.7A Active CN110493042B (en) | 2019-08-16 | 2019-08-16 | Fault diagnosis method and device and server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110493042B (en) |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112953738B (en) * | 2019-11-26 | 2022-06-10 | 中国移动通信集团山东有限公司 | Root cause alarm positioning system, method and device and computer equipment |
CN110995482B (en) * | 2019-11-27 | 2022-06-21 | 深圳市商汤科技有限公司 | Alarm analysis method and device, computer equipment and computer readable storage medium |
CN111082994A (en) * | 2019-12-25 | 2020-04-28 | 北京同有飞骥科技股份有限公司 | Distributed resource state rapid tracking method and system |
CN111107158B (en) * | 2019-12-26 | 2023-02-17 | 远景智能国际私人投资有限公司 | Alarm method, device, equipment and medium for Internet of things equipment cluster |
CN111342997B (en) * | 2020-02-06 | 2022-08-09 | 烽火通信科技股份有限公司 | Construction method of deep neural network model, fault diagnosis method and system |
CN113347654B (en) * | 2020-03-03 | 2023-04-07 | 中国移动通信集团贵州有限公司 | Method and device for determining fault type of out-of-service base station |
CN113395108B (en) * | 2020-03-12 | 2022-12-27 | 华为技术有限公司 | Fault processing method, device and system |
CN111722952A (en) * | 2020-05-25 | 2020-09-29 | 中国建设银行股份有限公司 | Fault analysis method, system, equipment and storage medium of business system |
CN111858123B (en) * | 2020-07-29 | 2023-09-26 | 中国工商银行股份有限公司 | Fault root cause analysis method and device based on directed graph network |
CN114095335B (en) * | 2020-08-03 | 2023-11-03 | 中国移动通信集团山东有限公司 | Network alarm processing method and device and electronic equipment |
CN114285730A (en) * | 2020-09-18 | 2022-04-05 | 华为技术有限公司 | Method and device for determining fault root cause and related equipment |
CN112468400A (en) * | 2020-11-09 | 2021-03-09 | 青岛海信网络科技股份有限公司 | Fault positioning method, device, equipment and medium |
CN114500244A (en) * | 2020-11-13 | 2022-05-13 | 中兴通讯股份有限公司 | Network fault diagnosis method and device, computer equipment and readable medium |
CN112583644B (en) * | 2020-12-14 | 2022-10-18 | 华为技术有限公司 | Alarm processing method, device, equipment and readable storage medium |
CN112543126A (en) * | 2020-12-22 | 2021-03-23 | 武汉联影医疗科技有限公司 | Cloud platform monitoring method and device, computer equipment and storage medium |
CN115086154A (en) * | 2021-03-11 | 2022-09-20 | 中国电信股份有限公司 | Fault delimitation method and device, storage medium and electronic equipment |
CN112988525B (en) * | 2021-03-22 | 2022-07-22 | 新华三技术有限公司 | Method and device for matching alarm association rules |
CN113037570B (en) * | 2021-04-29 | 2022-12-13 | 中国联合网络通信集团有限公司 | Alarm processing method and equipment |
CN115542067A (en) * | 2021-06-30 | 2022-12-30 | 华为技术有限公司 | Fault detection method and device |
US20230239206A1 (en) * | 2022-01-24 | 2023-07-27 | Rakuten Mobile, Inc. | Topology Alarm Correlation |
CN115442255B (en) * | 2022-03-11 | 2024-02-06 | 北京罗克维尔斯科技有限公司 | Ethernet detection method, system, device, electronic equipment and storage medium |
CN114710532B (en) * | 2022-04-02 | 2023-10-03 | 中国科学院水生生物研究所 | Method and device for suppressing security electricity utilization alarm of museum |
CN114710396B (en) * | 2022-04-08 | 2023-06-23 | 中国联合网络通信集团有限公司 | Network alarm processing method and server |
CN115086143A (en) * | 2022-04-28 | 2022-09-20 | 阿里巴巴(中国)有限公司 | Fault early warning method and device |
CN115102844A (en) * | 2022-06-09 | 2022-09-23 | 摩拜(北京)信息技术有限公司 | Fault monitoring and processing method and device and electronic equipment |
CN116017516B (en) * | 2023-03-24 | 2023-06-27 | 广州世炬网络科技有限公司 | Node connection configuration method and device based on link interference |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107679716A (en) * | 2017-09-19 | 2018-02-09 | 西南交通大学 | Consider the risk assessment of interconnected network cascading failure and the alarm method of communication fragile degree |
CN108521346A (en) * | 2018-04-07 | 2018-09-11 | 中南大学 | Method for positioning abnormal nodes of telecommunication bearer network based on terminal data |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101707537B (en) * | 2009-11-18 | 2012-01-25 | 华为技术有限公司 | Positioning method of failed link and alarm root cause analyzing method, equipment and system |
CN104796273B (en) * | 2014-01-20 | 2018-11-16 | 中国移动通信集团山西有限公司 | A kind of method and apparatus of network fault root diagnosis |
CN106603317A (en) * | 2017-02-20 | 2017-04-26 | 山东浪潮商用系统有限公司 | Alarm monitoring strategy analysis method based on data mining technology |
CN107633307B (en) * | 2017-09-08 | 2021-08-31 | 国家计算机网络与信息安全管理中心 | Power supply and distribution system root alarm detection method, device, terminal and computer storage medium |
CN108494591A (en) * | 2018-03-16 | 2018-09-04 | 北京京东金融科技控股有限公司 | system alarm processing method and device |
-
2019
- 2019-08-16 CN CN201910757406.7A patent/CN110493042B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107679716A (en) * | 2017-09-19 | 2018-02-09 | 西南交通大学 | Consider the risk assessment of interconnected network cascading failure and the alarm method of communication fragile degree |
CN108521346A (en) * | 2018-04-07 | 2018-09-11 | 中南大学 | Method for positioning abnormal nodes of telecommunication bearer network based on terminal data |
Also Published As
Publication number | Publication date |
---|---|
CN110493042A (en) | 2019-11-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110493042B (en) | Fault diagnosis method and device and server | |
US20200106662A1 (en) | Systems and methods for managing network health | |
US9571334B2 (en) | Systems and methods for correlating alarms in a network | |
US11348023B2 (en) | Identifying locations and causes of network faults | |
US9608900B2 (en) | Techniques for flooding optimization for link state protocols in a network topology | |
WO2015090098A1 (en) | Method and apparatus for realizing fault location | |
CN112564964A (en) | Fault link detection and recovery method based on software defined network | |
CN112769605B (en) | Heterogeneous multi-cloud operation and maintenance management method and hybrid cloud platform | |
CN108989128B (en) | Fault positioning method and device based on networking structure | |
US9059899B2 (en) | Method and system for interrupt throttling and prevention of frequent toggling of protection groups in a communication network | |
CN101252477B (en) | Determining method and analyzing apparatus of network fault root | |
CN102281103A (en) | Optical network multi-fault recovering method based on fuzzy set calculation | |
CN111835595B (en) | Flow data monitoring method, device, equipment and computer storage medium | |
CN107005440B (en) | method, device and system for positioning link fault | |
CN102792636A (en) | Methods, apparatus and communication network for providing restoration survivability | |
CN109964450B (en) | Method and device for determining shared risk link group | |
CN116299129A (en) | All-fiber current transformer state detection and analysis method, device and medium | |
US8566634B2 (en) | Method and system for masking defects within a network | |
US10432451B2 (en) | Systems and methods for managing network health | |
CN114172796A (en) | Fault positioning method and related device for communication network | |
CN114338441A (en) | Analysis method for intelligently identifying service link based on service flow | |
JP2013046250A (en) | Failure link specification system and monitoring path setting method of the same | |
CN114911654A (en) | Fault classification method, device and system | |
CN105306135A (en) | Link polling detection method and device | |
CN109067603B (en) | Method and system for determining VLAN configuration problem of transformer substation network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |