CN115913911A - Network fault detection method, equipment and storage medium - Google Patents

Network fault detection method, equipment and storage medium Download PDF

Info

Publication number
CN115913911A
CN115913911A CN202111163538.0A CN202111163538A CN115913911A CN 115913911 A CN115913911 A CN 115913911A CN 202111163538 A CN202111163538 A CN 202111163538A CN 115913911 A CN115913911 A CN 115913911A
Authority
CN
China
Prior art keywords
fault
network
network topology
alarm information
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111163538.0A
Other languages
Chinese (zh)
Inventor
刘雪峰
胡艳艳
陈爱东
郑灵武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN202111163538.0A priority Critical patent/CN115913911A/en
Publication of CN115913911A publication Critical patent/CN115913911A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the invention provides a network fault detection method, equipment and a storage medium, belonging to the field of communication. The method comprises the following steps: acquiring network topology information of a virtualization platform and alarm information of a plurality of network faults; determining a network topology path corresponding to each network fault according to the network topology information and the alarm information of each network fault; determining correlation information among the network topology paths, and aggregating the network topology paths according to the correlation information among the network topology paths to obtain at least one target network topology path; and acquiring a plurality of suspected fault nodes from at least one target network topology path, and performing fault positioning analysis on the plurality of suspected fault nodes to obtain a fault positioning analysis result. The technical scheme of the embodiment of the invention can improve the positioning of the fault node in the network fault.

Description

Network fault detection method, equipment and storage medium
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a network fault detection method, device, and storage medium.
Background
With the continuous increase of the scale of the data center and the scale of the network, the probability of faults occurring in the operation process of the network also increases. At present, human intervention is needed for analyzing network faults in most cases, namely professional technicians detect the faults after the faults occur, but due to the difficulty in fault definition, a large number of technicians are needed for fault definition, a large amount of time is consumed, and customer satisfaction is seriously reduced. When the operation and maintenance system is monitored, most of the monitoring indexes of the operation and maintenance system are set by experience, network characteristics in different areas and different time periods are different, and an alarm threshold value cannot be matched with all local points, so that monitoring failure or alarm misinformation is caused. Therefore, how to efficiently and accurately detect a network fault is a problem to be solved urgently at present.
Disclosure of Invention
Embodiments of the present invention mainly aim to provide a method, a device, and a storage medium for detecting a network fault, which aim to improve efficiency and accuracy of detecting a network fault node.
In a first aspect, an embodiment of the present invention provides a network fault detection method, including:
acquiring network topology information of a virtualization platform and alarm information of a plurality of network faults;
determining a network topology path corresponding to each network fault according to the network topology information and the alarm information of each network fault;
determining correlation information among the network topology paths, and aggregating the network topology paths according to the correlation information among the network topology paths to obtain at least one target network topology path;
and acquiring a plurality of suspected fault nodes from at least one target network topology path, and performing fault location analysis on the plurality of suspected fault nodes to obtain a fault location analysis result.
In a second aspect, the embodiment of the present invention further provides a computer device, which includes a processor, a memory, a computer program stored on the memory and executable by the processor, and a data bus for implementing connection communication between the processor and the memory, wherein when the computer program is executed by the processor, the steps of any one of the network fault detection methods provided in the present specification are implemented.
In a third aspect, an embodiment of the present invention further provides a storage medium for computer-readable storage, where the storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement the steps of any one of the methods for detecting a network fault provided in the present specification.
The embodiment of the invention provides a network fault detection method, computer equipment and a storage medium, and the embodiment of the invention obtains network topology information of a virtualization platform and alarm information of a plurality of network faults; according to the network topology information and the alarm information of each network fault, the network topology path corresponding to each network fault can be accurately determined; then determining correlation information among the network topology paths, and aggregating the network topology paths according to the correlation information among the network topology paths to obtain at least one target network topology path; the method comprises the steps of obtaining a plurality of suspected fault nodes from at least one target network topology path, carrying out fault location analysis on the suspected fault nodes, and accurately obtaining a fault location analysis result. According to the scheme, the network topology paths are established through the network topology information and the alarm information of each network fault, at least one target network topology path is obtained through aggregation of the network topology paths, the fault node determination efficiency can be improved, and fault location analysis results can be accurately obtained through fault location analysis of suspected fault nodes.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a network fault detection method according to an embodiment of the present invention;
FIG. 2 is a flow diagram illustrating sub-steps of the network fault detection method of FIG. 1;
fig. 3 is a schematic view of a scenario for grouping alarm information according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a scenario for generating a network topology path according to an embodiment of the present invention;
fig. 5 is a schematic view of a scenario for aggregating a plurality of network topology paths according to an embodiment of the present invention;
fig. 6 is a schematic diagram of another scenario for aggregating multiple network topology paths according to an embodiment of the present invention;
fig. 7 is a schematic diagram of another scenario for aggregating multiple network topology paths according to an embodiment of the present invention;
fig. 8 is a schematic diagram of a scenario for determining a suspected fault node according to an embodiment of the present invention;
fig. 9 is a schematic diagram of another scenario for determining a suspected fault node according to an embodiment of the present invention;
fig. 10 is a block diagram schematically illustrating a structure of a computer device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The flowcharts shown in the figures are illustrative only and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
The embodiment of the invention provides a network fault detection method, computer equipment and a storage medium. The network fault detection method can be applied to computer equipment, and the computer equipment can be a server, a mobile phone, a tablet computer, a notebook computer, a desktop computer and edge detection equipment. For example, the method is applied to a server, and the server acquires network topology information of a virtualization platform and alarm information of a plurality of network faults; determining a network topology path corresponding to each network fault according to the network topology information and the alarm information of each network fault; determining correlation information among network topology paths, and aggregating a plurality of network topology paths according to the correlation information among the network topology paths to obtain at least one target network topology path; and acquiring a plurality of suspected fault nodes from at least one target network topology path, and performing fault positioning analysis on the plurality of suspected fault nodes to obtain a fault positioning analysis result.
Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Referring to fig. 1, fig. 1 is a schematic flow chart of a network fault detection method according to an embodiment of the present invention.
As shown in fig. 1, the network failure detection method includes steps S101 to S104.
Step S101, network topology information of a virtualization platform and alarm information of a plurality of network faults are obtained.
The network topology graph of the virtualization platform comprises attribute information of the host, attribute information of the virtual machine, attribute information of the switch, and connection relations among the host, the virtual machine and the switch, and the alarm information is generated by network faults generated in the virtualization platform.
In an embodiment, network topology information of a virtualization platform is acquired, and when a network fault occurs in an operating process of a device in the virtualization platform, corresponding alarm information generated by the network fault is acquired. The generation of the warning information may be set according to an actual situation, which is not specifically limited in this embodiment.
Illustratively, the virtualization platform includes a host 1, a host 2, a virtual machine 1, a virtual machine 2, a virtual machine 3, a virtual machine 4, a virtual machine 5, a switch 1, and a switch 2, the network topology information includes a connection relationship that the host 1 connects the virtual machine 1, the virtual machine 1 connects the virtual machine 2, the virtual machine 2 connects the switch 1, and the switch 1 connects the host 2, and the network topology information includes a connection relationship that the host 1 connects the virtual machine 3, the virtual machine 3 connects the switch 2, the switch 2 connects the host 2, and the host 2 connects the virtual machine 5. When the host 1 in the virtualization platform fails, alarm information of network failure is generated.
In an embodiment, the method for processing the operation statistical data by using the preset alarm detection model to obtain the alarm detection result of the operation statistical data may be as follows: and processing the operation statistical data through a preset mathematical model to obtain an alarm detection result of the operation statistical data. Wherein. The mathematical model may be selected according to an actual situation, which is not specifically limited in this embodiment, for example, the mathematical model may be a box plot algorithm or an extremum theory-based anomaly detection algorithm (spot algorithm).
In an embodiment, the method for processing the operation statistical data by using the preset alarm detection model to obtain the alarm detection result of the operation statistical data may be as follows: the preset alarm detection model can be a neural network model, and the operation statistical data is input into the neural network model for processing to obtain an alarm detection result of the operation statistical data. The alarm detection result of the operation statistical data can be accurately determined through the neural network model.
In one embodiment, the neural network model may be established by: obtaining historical operation statistical data as sample data, obtaining an alarm detection result corresponding to the historical operation statistical data, labeling the historical operation statistical data according to the alarm detection result to construct the sample data, and performing iterative training on the neural network model based on the sample data until the neural network model converges, thereby obtaining the neural network model of the alarm detection model. The neural network model comprises a convolutional neural network model, a cyclic neural network model and a cyclic convolutional neural network model, and of course, other network models can be adopted for training to obtain the alarm detection model. This embodiment is not particularly limited thereto.
In one embodiment, operation statistical data of a virtualization platform are obtained, and data cleaning is carried out on the operation statistical data to obtain target operation statistical data; inputting the target operation statistical data into a preset alarm detection model to obtain an alarm detection result of the target operation statistical data; and determining whether the target operation statistical data is abnormal according to the alarm detection result, and generating alarm information when the target operation statistical data is abnormal. The data cleaning method may be selected according to actual conditions, which is not specifically limited in this embodiment, and for example, the data cleaning method may be to fill in and remove a high pulse. And the data cleaning is carried out on the operation statistical data, so that the accuracy of fault detection can be improved.
In an embodiment, the alarm detection result includes an abnormal deviation score and an abnormal duration score, and the manner of determining whether the operation statistic data is abnormal according to the alarm detection result may be: acquiring a first weight value and a second weight value, multiplying the first weight value by the abnormal deviation score to obtain a first parameter, multiplying the second weight value by the abnormal duration score to obtain a second parameter, and adding the first parameter and the second parameter to obtain an abnormal degree score; and determining whether the abnormal degree score is larger than a preset abnormal degree score, if the abnormal degree score is larger than the preset abnormal degree score, determining that the operation statistical data is abnormal, and if the abnormal degree score is smaller than or equal to the preset abnormal degree score, determining that the operation statistical data is not abnormal. The preset abnormal degree score may be set according to an actual situation, which is not specifically limited in this embodiment. Whether the operation statistical data are abnormal or not can be accurately known according to the alarm detection result.
Step S102, according to the network topology information and the alarm information of each network fault, determining a network topology path corresponding to each network fault.
In an embodiment, when the alarm information of a plurality of network faults is acquired, the alarm information is subjected to deduplication processing to obtain the alarm information after deduplication. By carrying out duplicate removal processing on a plurality of alarm information, the frequency of processing the alarm information can be reduced, and the failure node determination efficiency is improved.
In an embodiment, the alarm information is subjected to deduplication processing, and the manner of obtaining the deduplicated alarm information may be: acquiring identification information of each alarm message and fault occurrence time of each alarm message, taking the alarm messages with the same identification information and the same fault occurrence time or with the difference of the fault occurrence time less than preset time as repeated alarm messages, and removing the repeated alarm messages and keeping only one same alarm message. The identification information may be set according to an actual situation, for example, the identification information is an identification of a node corresponding to the warning information, the preset time may be set according to the actual situation, which is not specifically limited in this embodiment, for example, the preset time may be set to 1 second. By carrying out duplicate removal processing on the alarm information, the efficiency of fault node detection can be improved.
Illustratively, there are alarm information 1, alarm information 2, alarm information 3, alarm information 4 and alarm information 5, wherein the identification information of the alarm information 1 is identification 1, the fault occurrence time is 2021 year 5 month 10 day 10 hour 05 minute 20 second, the identification information of the alarm information 2 is identification 2, the fault occurrence time is 2021 year 5 month 10 day 10 hour 05 minute 20 second, the identification information of the alarm information 3 is identification 1, the fault occurrence time is 2021 year 5 month 10 day 10 hour 05 minute 20 second, the identification information of the alarm information 4 is identification 2, the fault occurrence time is 2021 year 5 month 10 day 10 hour 05 minute 20 second, the identification information of the alarm information 5 is identification 3, the fault occurrence time is 2021 year 5 month 10 day 10 hour 05 minute 25 second, the alarm information of the same identification information and the fault occurrence time is taken as repeated alarm information, the alarm information is determined as alarm information according to the identification information and the fault occurrence time of the alarm information 1, the alarm information 2, the alarm information 3, the alarm information 4 and the alarm information 5, the alarm information 1 and the alarm information are determined as repeated alarm information, and the alarm information 1 and the alarm information are removed as the repeated alarm information, and the alarm information.
In one embodiment, as shown in fig. 2, step S102 includes sub-steps S1021 through S1022.
And a substep S1021, grouping the alarm information of the plurality of network faults according to the alarm code in the alarm information of each network fault to obtain a plurality of alarm information groups.
The alarm codes are identification codes generated by network elements in the virtualization platform when network faults occur, and the alarm codes generated by the network elements are different.
In one embodiment, the alarm codes in each alarm message are determined, and the alarm messages corresponding to the same alarm codes are divided into one alarm message group to obtain a plurality of alarm message groups. The alarm information is grouped according to the alarm code, and the efficiency of detecting the network fault points can be improved.
In one embodiment, the time of occurrence of the fault in the alarm information of each network fault is obtained, and the alarm information of a plurality of network faults is grouped according to a preset time window, an alarm code and the time of occurrence of the fault in each alarm information to obtain a plurality of alarm information groups. The preset time window may be set according to an actual situation, which is not specifically limited in this embodiment, for example, the preset time window may be set to 1 second. By grouping the alarm information, the efficiency of network fault node detection can be improved.
In an embodiment, the alarm information of the network fault is grouped according to the preset time window, the alarm code and the fault occurrence time in each alarm information, and the manner of obtaining the plurality of alarm information groups may be: and dividing the alarm information with the same alarm code into an alarm information group to obtain a plurality of alarm information groups. By dividing the alarm information corresponding to the same alarm code in the preset time window into a group, the efficiency of detecting the alarm information is improved.
For example, as shown in fig. 3, the virtual machine generates an alarm information a at time t1, an alarm information B at time t2, an alarm information C at time t3, an alarm information D at time t4, an alarm information E at time t5, and an alarm information F at time t6, where the alarm codes of the alarm information a, the alarm information B, the alarm information C, the alarm information D, the alarm information E, and the alarm information F are the same alarm code, the alarm information a, the alarm information B, and the alarm information C at the time of the fault occurrence within the preset time window P0 are divided into an alarm information group to obtain an alarm information group M, and the alarm information D, the alarm information E, and the alarm information F at the time of the fault occurrence within the preset time window P0 are divided into an alarm information group to obtain an alarm information group M. The 6 alarm information are grouped to obtain two alarm information groups, so that the efficiency of alarm information processing can be improved.
And a substep S1022, determining a network topology path corresponding to the network fault to which the alarm code belongs in each alarm information group according to the network topology information and the alarm information in each alarm information group.
In one embodiment, according to a node parameter in an alarm information group of each network fault, acquiring a plurality of topology nodes corresponding to each network fault and a connection relation between the plurality of topology nodes from network topology information; and generating a network topology path corresponding to each network fault according to the plurality of topology nodes corresponding to each network fault and the connection relationship among the plurality of topology nodes. The node parameters are equipment information and address information in the virtualization platform, host name, virtual machine name, switch name, internet protocol address, MAC address and other information. The network topology path construction is carried out on the plurality of topology nodes through the connection relation among the plurality of topology nodes, and the efficiency of fault node detection can be improved.
Illustratively, the node parameters of the alarm information group include a node parameter 1, a node parameter 2, a node parameter 3, and a node parameter 4, the plurality of topology nodes corresponding to the network fault are obtained from the network topology information according to the node parameter 1, the node parameter 2, the node parameter 3, and the node parameter 4, and the plurality of topology nodes include a host 1, a host 2, a virtual machine 1, a virtual machine 2, and a switch 1, and the connection relationships among the plurality of topology nodes are as follows: the host 1 is connected with the virtual machine 1, the virtual machine 1 is connected with the switch 1, the switch 1 is connected with the virtual machine 2, the virtual machine 2 is connected with the host 2, and the host 1, the host 2, the virtual machine 1, the virtual machine 2 and the switch 1 are connected according to the connection relationship among a plurality of topology nodes to obtain a network topology path as shown in fig. 4.
Step S103, determining correlation information among the network topology paths, and aggregating the plurality of network topology paths according to the correlation information among the network topology paths to obtain at least one target network topology path.
The correlation information comprises first correlation information and second correlation information, the first correlation information is used for indicating that at least one same topological node exists between two network topological paths, and the first correlation information is used for indicating that the same topological node does not exist between the two network topological paths.
In one embodiment, according to the correlation information among the network topology paths, dividing the network topology paths into at least one network topology path group, wherein each network topology path in the network topology path group comprises the same topology node; and aggregating all network topology paths in the network topology path group to obtain a target network topology path corresponding to the network topology path group. The correlation information among the network topology paths in the network topology path group is first correlation information. By aggregating the network topology paths, the efficiency of determining the network fault node can be improved.
Illustratively, as shown in fig. 5, a network topology path 1 includes topology nodes sequentially connecting a host 1, a virtual machine 1, a switch 1, a virtual machine 2 and a host 2, a network topology path 2 includes topology nodes sequentially connecting a host 3, a virtual machine 4, a switch 1 and a host 4, a network topology path 3 includes topology nodes sequentially connecting a host 5, a virtual machine 6, a switch 2, a virtual machine 6, a virtual machine 7 and a host 6, a network topology path 4 includes topology nodes sequentially connecting a host 5, a virtual machine 8, a switch 3, a virtual machine 9 and a host 7, the same topology node switch 1 exists in the network topology path 1 and the network topology path 2, the network topology path 1 and the network topology path 2 are divided into a network topology path group 1, the same topology node host 5 exists in the network topology path 3 and the network topology path 4, and the network topology path 3 and the network topology path 4 are divided into a network topology path group 2. The network topology path 1 and the network topology path 2 in the network topology path group 1 are aggregated to obtain the target network topology path 1 shown in fig. 6, and the network topology path 3 and the network topology path 4 in the network topology path group 2 are aggregated to obtain the target network topology path 2 shown in fig. 7.
Step S104, a plurality of suspected fault nodes are obtained from at least one target network topological path, and fault positioning analysis is carried out on the suspected fault nodes to obtain a fault positioning analysis result.
In one embodiment, the repeated occurrence times of each topological node in at least one target network topological path are counted; and acquiring a plurality of suspected fault nodes from at least one target network topology path according to the repeated occurrence times of each topology node in the at least one target network topology path. The topology node determines a plurality of suspected fault nodes in the target network topology path, and the accuracy of determining the fault nodes can be improved.
In an embodiment, the method for acquiring a plurality of suspected failed nodes may be: and taking the topological node with the repeated occurrence times larger than a preset value as a suspected fault node. The preset value may be set according to an actual situation, which is not specifically limited in this embodiment, for example, the preset value may be set to 3.
In an embodiment, the method for acquiring a plurality of suspected failed nodes may be: and sequencing all the topology nodes in the target network topology path according to the repeated occurrence times of the topology nodes to obtain a topology node sequencing queue, and sequentially selecting a preset number of topology nodes from the topology node sequencing queue from large to small as suspected fault nodes. The topology nodes can be sequenced from large to small according to the repeated occurrence times to obtain a topology node sequencing queue, and the topology nodes can also be sequenced from small to large according to the repeated occurrence times to obtain the topology node sequencing queue. The preset number may be set according to actual conditions, which is not specifically limited in this embodiment, for example, the preset number may be set to 3.
Illustratively, as shown in fig. 8, the target network topology path includes a host 1, a virtual machine 2, a virtual machine 3, a virtual machine 4, a switch 1, a switch 2, a virtual machine 5, a virtual machine 6, a host 4, a virtual machine 7, a host 2, a host 3, and a host 5, the number of repeated occurrences of each topology node is counted as that the host 1 repeatedly appears 4 times, the switch 1 repeatedly appears 3 times, and the remaining topology nodes all appear 1 time, the topology node whose number of repeated occurrences of the topology node is greater than or equal to 3 times is used as a suspected fault node, and the host 1 and the switch 1 are obtained as suspected fault nodes.
Illustratively, as shown in fig. 9, the target network topology path includes a host 1, a virtual machine, a switch 1, a switch 2, a switch 3, a switch 4, a switch 5, a switch 6, a switch 7, a switch 8, a switch 9, a host 2, a host 3, a host 4, and a host 5, and the number of occurrences of each topology node is counted as 4 occurrences of the host 1, 4 occurrences of the virtual machine, 3 occurrences of the switch 1, and 1 occurrence of each remaining topology node, and sorting is performed according to the number of occurrences of the topology nodes to obtain a topology node occurrence number queue, where the host 1 and the virtual machine are arranged in the first, the switch 1 is arranged in the second, the remaining topology nodes are arranged in the third, and the topology nodes arranged in the first three are taken as suspected fault nodes, so that the host 1, the virtual machine, and the switch 1 are suspected fault nodes.
In an embodiment, data exchange information in a preset time period at the occurrence time of the fault of each suspected fault node is acquired, and whether the suspected fault node corresponding to each data exchange information is a target fault node is determined according to the data exchange information of each suspected fault node. The data exchange information includes information such as message transmission rate, packet loss rate, central processing unit utilization rate and memory utilization rate. According to the data exchange information of each suspected fault node, whether the suspected fault node is a target fault node to be detected or not can be accurately determined.
In an embodiment, the manner of determining whether the suspected faulty node corresponding to each data exchange information is the target faulty node according to the data exchange information of each suspected faulty node is as follows: and acquiring a preset rule, and verifying the data exchange information of each suspected fault node based on the preset rule so as to determine a target fault node and a non-fault node. The preset rule is a preset data exchange information check rule, and the setting of the preset rule may be set according to an actual situation, which is not specifically limited in this embodiment. Target failure nodes in suspected failure nodes can be accurately determined through a preset rule.
In an embodiment, the method for verifying the data exchange information of each suspected faulty node based on the preset rule to determine the target faulty node and the non-faulty node may be as follows: verifying the data exchange information of each suspected fault node based on the preset rule, wherein if the data exchange information of the suspected fault node passes the verification, the suspected fault node is a non-fault node; and if the data exchange information of the suspected fault node is not verified, the suspected fault node is a fault node. The preset rule is an experience rule obtained in advance according to a result of the statistical analysis of the data exchange information of the suspected fault node, and a specific process for generating the experience rule may be set according to an actual situation, which is not specifically limited in this embodiment. By verifying the data exchange information of the suspected fault node, whether the suspected node is the target fault node can be accurately determined, and the accuracy of determining the network fault node is greatly improved.
In one embodiment, one or more candidate failed nodes are determined from a plurality of suspected failed nodes; acquiring operation statistical data of a virtualization platform, and current alarm information and historical alarm information of the candidate fault node, wherein the historical alarm information is historical alarm information before the current moment; and carrying out fault location analysis on the operation statistical data, the current alarm information and the historical alarm information of the candidate fault node to obtain a fault location analysis result. By carrying out fault positioning analysis on the current alarm information and the historical alarm information, a fault positioning analysis result can be accurately obtained.
In an embodiment, the fault location analysis is performed on the operation statistical data, the current alarm information and the historical alarm information of the candidate fault node, and the manner of obtaining the fault location analysis result may be: obtaining historical operation statistics corresponding to historical alarm information, determining whether an operation error range of the operation statistics and the historical operation statistics is smaller than a preset threshold value, if the operation error range of the operation statistics and the historical operation statistics is smaller than or equal to the preset threshold value, the fault node is a target fault node, and if the operation error range of the operation statistics and the normal operation statistics is larger than the preset threshold value, the fault node is a non-fault node. The preset threshold may be set according to a time condition, which is not specifically limited in this embodiment. By carrying out fault positioning analysis on the current alarm information and the historical alarm information, a fault positioning analysis result can be accurately obtained.
In the network fault detection method in the embodiment, network topology information of a virtualization platform and alarm information of a plurality of network faults are acquired; according to the network topology information and the alarm information of each network fault, the network topology path corresponding to each network fault can be accurately determined; then determining correlation information among the network topology paths, and aggregating the network topology paths according to the correlation information among the network topology paths to obtain at least one target network topology path; the method comprises the steps of obtaining a plurality of suspected fault nodes from at least one target network topology path, carrying out fault location analysis on the suspected fault nodes, and accurately obtaining a fault location analysis result. According to the scheme, the network topology paths are established through the network topology information and the alarm information of each network fault, at least one target network topology path is obtained through aggregation of the network topology paths, the fault node determination efficiency can be improved, and fault location analysis results can be accurately obtained through fault location analysis of suspected fault nodes.
Referring to fig. 10, fig. 10 is a schematic block diagram of a computer device according to an embodiment of the present invention.
As shown in fig. 10, the computer device 200 includes a processor 201 and a memory 202, and the processor 201 and the memory 202 are connected by a bus 203 such as an I2C (Inter-integrated Circuit) bus.
In particular, the processor 201 is used to provide computing and control capabilities, supporting the operation of the entire computer device. The Processor 201 may be a Central Processing Unit (CPU), and the Processor 201 may also be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Specifically, the Memory 202 may be a Flash chip, a Read-Only Memory (ROM) magnetic disk, an optical disk, a usb disk, or a removable hard disk.
Those skilled in the art will appreciate that the configuration shown in fig. 10 is a block diagram of only a portion of the configuration associated with an embodiment of the present invention and does not constitute a limitation on the computing device to which an embodiment of the present invention may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
The processor is configured to run a computer program stored in the memory, and when executing the computer program, implement any one of the network failure detection methods provided by the embodiments of the present invention.
In an embodiment, the processor is configured to run a computer program stored in the memory and to implement the following steps when executing the computer program:
acquiring network topology information of a virtualization platform and alarm information of a plurality of network faults;
determining a network topology path corresponding to each network fault according to the network topology information and the alarm information of each network fault;
determining correlation information among the network topology paths, and aggregating the network topology paths according to the correlation information among the network topology paths to obtain at least one target network topology path;
and acquiring a plurality of suspected fault nodes from at least one target network topology path, and performing fault location analysis on the plurality of suspected fault nodes to obtain a fault location analysis result.
In an embodiment, when the processor determines, according to the network topology information and the alarm information of each network fault, a network topology path corresponding to each network fault, the processor is configured to:
grouping a plurality of alarm information of the network faults according to the alarm codes in the alarm information of each network fault to obtain a plurality of alarm information groups;
and determining a network topology path of the network fault corresponding to the alarm code of each alarm information group according to the network topology information and the alarm information in each alarm information group.
In an embodiment, when the processor implements grouping the alarm information of the plurality of network faults according to the alarm code in the alarm information of each network fault to obtain a plurality of alarm information groups, the processor is configured to implement:
acquiring the fault occurrence time in the alarm information of each network fault;
and dividing the alarm information with the same alarm code into an alarm information group, wherein the fault occurrence time is positioned in the same preset time window.
In an embodiment, when the determining, according to the network topology information and the alarm information in each alarm information group, a network topology path of a network fault to which an alarm code belongs corresponding to each alarm information group is implemented, the processor is configured to implement:
acquiring a plurality of topology nodes corresponding to each network fault and a connection relation between the plurality of topology nodes from the network topology information according to the node parameter in the alarm information group of each network fault;
and generating a network topology path corresponding to each network fault according to the plurality of topology nodes corresponding to each network fault and the connection relationship between the plurality of topology nodes.
In an embodiment, when the processor aggregates the plurality of network topology paths according to the correlation information between the network topology paths to obtain at least one target network topology path, the processor is configured to:
dividing a plurality of network topology paths into at least one network topology path group according to correlation information among the network topology paths, wherein each network topology path in the network topology path group comprises the same topology node;
and aggregating all the network topology paths in the network topology path group to obtain a target network topology path corresponding to the network topology path group.
In an embodiment, the processor, when implementing the obtaining of the plurality of suspected-to-be-failed nodes from at least one of the target network topology paths, is configured to implement:
counting the repeated occurrence times of each topological node in at least one target network topological path;
and acquiring a plurality of suspected fault nodes from at least one target network topology path according to the repeated occurrence times of each topology node in at least one target network topology path.
In an embodiment, when the processor performs the fault location analysis on the plurality of suspected fault nodes to obtain a fault location analysis result, the processor is configured to:
determining one or more candidate failed nodes from the plurality of suspected failed nodes;
acquiring operation statistical data of the virtualization platform, and acquiring current alarm information and historical alarm information of the candidate fault node;
and carrying out fault location analysis on the operation statistical data, the current alarm information and the historical alarm information of the candidate fault node to obtain a fault location analysis result.
In an embodiment, the processor is further configured to:
acquiring operation statistical data of the virtualization platform;
processing the operation statistical data through a preset alarm detection model to obtain an alarm detection result of the operation statistical data;
and determining whether the operation statistical data is abnormal according to the alarm detection result, and generating alarm information when the operation statistical data is abnormal.
It should be noted that, as will be clearly understood by those skilled in the art, for convenience and brevity of description, the specific working process of the computer device described above may refer to the corresponding process in the foregoing network fault detection method embodiment, and details are not described herein again.
Embodiments of the present invention also provide a storage medium for computer-readable storage, where the storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement any of the steps of the method for network fault detection provided in the description of the embodiments of the present invention.
The storage medium may be an internal storage unit of the computer device described in the foregoing embodiment, for example, a hard disk or a memory of the computer device. The storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the computer device.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware embodiment, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
It should be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or system comprising the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for detecting network faults is characterized by comprising the following steps:
acquiring network topology information of a virtualization platform and alarm information of a plurality of network faults;
determining a network topology path corresponding to each network fault according to the network topology information and the alarm information of each network fault;
determining correlation information among the network topology paths, and aggregating the network topology paths according to the correlation information among the network topology paths to obtain at least one target network topology path;
and acquiring a plurality of suspected fault nodes from at least one target network topology path, and performing fault location analysis on the plurality of suspected fault nodes to obtain a fault location analysis result.
2. The method according to claim 1, wherein the determining a network topology path corresponding to each network fault according to the network topology information and the alarm information of each network fault comprises:
grouping a plurality of alarm information of the network faults according to the alarm codes in the alarm information of each network fault to obtain a plurality of alarm information groups;
and determining a network topology path of the network fault corresponding to the alarm code of each alarm information group according to the network topology information and the alarm information in each alarm information group.
3. The method according to claim 2, wherein the grouping the alarm information of the plurality of network faults according to the alarm code in the alarm information of each network fault to obtain a plurality of alarm information groups comprises:
acquiring the fault occurrence time in the alarm information of each network fault;
and dividing the alarm information with the same alarm code into an alarm information group, wherein the fault occurrence time is positioned in the same preset time window.
4. The method according to claim 2, wherein the determining, according to the network topology information and the alarm information in each alarm information group, a network topology path of the network fault to which each alarm information group corresponds an alarm code comprises:
acquiring a plurality of topology nodes corresponding to each network fault and a connection relation between the plurality of topology nodes from the network topology information according to the node parameter in the alarm information group of each network fault;
and generating a network topology path corresponding to each network fault according to the plurality of topology nodes corresponding to each network fault and the connection relationship between the plurality of topology nodes.
5. The method according to claim 1, wherein the aggregating a plurality of the network topology paths according to the correlation information between the network topology paths to obtain at least one target network topology path comprises:
dividing a plurality of network topology paths into at least one network topology path group according to correlation information among the network topology paths, wherein each network topology path in the network topology path group comprises the same topology node;
and aggregating all the network topology paths in the network topology path group to obtain a target network topology path corresponding to the network topology path group.
6. The method according to claim 1, wherein the obtaining a plurality of suspected-faulty nodes from at least one of the target network topology paths comprises:
counting the repeated occurrence times of each topological node in at least one target network topological path;
and acquiring a plurality of suspected fault nodes from at least one target network topology path according to the repeated occurrence times of each topology node in at least one target network topology path.
7. The method according to any one of claims 1 to 6, wherein the performing fault location analysis on the plurality of suspected fault nodes to obtain a fault location analysis result includes:
determining one or more candidate failed nodes from the plurality of suspected failed nodes;
acquiring operation statistical data of the virtualization platform, and acquiring current alarm information and historical alarm information of the candidate fault node;
and carrying out fault positioning analysis on the operation statistical data, the current alarm information and the historical alarm information of the candidate fault node to obtain a fault positioning analysis result.
8. The method of any of claims 1-6, wherein the method further comprises:
acquiring running statistical data of the virtualization platform;
processing the operation statistical data through a preset alarm detection model to obtain an alarm detection result of the operation statistical data;
and determining whether the operation statistical data is abnormal according to the alarm detection result, and generating alarm information when the operation statistical data is abnormal.
9. A computer arrangement comprising a processor, a memory, a computer program stored on the memory and executable by the processor, and a data bus for enabling connection communication between the processor and the memory, wherein the computer program, when executed by the processor, carries out the steps of the network fault detection method according to any of claims 1 to 8.
10. A storage medium for computer readable storage, wherein the storage medium stores one or more programs which are executable by one or more processors to implement the steps of the method of network fault detection of any of claims 1 to 8.
CN202111163538.0A 2021-09-30 2021-09-30 Network fault detection method, equipment and storage medium Pending CN115913911A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111163538.0A CN115913911A (en) 2021-09-30 2021-09-30 Network fault detection method, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111163538.0A CN115913911A (en) 2021-09-30 2021-09-30 Network fault detection method, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115913911A true CN115913911A (en) 2023-04-04

Family

ID=86488521

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111163538.0A Pending CN115913911A (en) 2021-09-30 2021-09-30 Network fault detection method, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115913911A (en)

Similar Documents

Publication Publication Date Title
CN108833184B (en) Service fault positioning method and device, computer equipment and storage medium
JP5874936B2 (en) Operation management apparatus, operation management method, and program
US9836952B2 (en) Alarm causality templates for network function virtualization
US20160170818A1 (en) Adaptive fault diagnosis
US8949676B2 (en) Real-time event storm detection in a cloud environment
US20110276836A1 (en) Performance analysis of applications
CN110149223B (en) Fault positioning method and equipment
AU2019275633B2 (en) System and method of automated fault correction in a network environment
CN112882796A (en) Abnormal root cause analysis method and apparatus, and storage medium
CN111010291A (en) Business process abnormity warning method and device, electronic equipment and storage medium
CN114514141A (en) Charging station monitoring method and device
US9860109B2 (en) Automatic alert generation
WO2018125628A1 (en) A network monitor and method for event based prediction of radio network outages and their root cause
AU2022259730B2 (en) Utilizing machine learning models to determine customer care actions for telecommunications network providers
CN113986595A (en) Abnormity positioning method and device
CN113297042B (en) Method, device and equipment for processing alarm message
CN114095965A (en) Index detection model obtaining and fault positioning method, device, equipment and storage medium
CN111984442A (en) Method and device for detecting abnormality of computer cluster system, and storage medium
CN116804957A (en) System monitoring method and device
JP5971395B2 (en) System analysis apparatus and system analysis method
CN113487182B (en) Device health state evaluation method, device, computer device and medium
CN109818808A (en) Method for diagnosing faults, device and electronic equipment
US10157113B2 (en) Information processing device, analysis method, and recording medium
CN115913911A (en) Network fault detection method, equipment and storage medium
CN115690681A (en) Processing method of abnormity judgment basis, abnormity judgment method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication