CN114710400A - Fault device positioning method, apparatus, electronic device, medium, and program product - Google Patents

Fault device positioning method, apparatus, electronic device, medium, and program product Download PDF

Info

Publication number
CN114710400A
CN114710400A CN202210432873.4A CN202210432873A CN114710400A CN 114710400 A CN114710400 A CN 114710400A CN 202210432873 A CN202210432873 A CN 202210432873A CN 114710400 A CN114710400 A CN 114710400A
Authority
CN
China
Prior art keywords
link
fault
nodes
links
equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210432873.4A
Other languages
Chinese (zh)
Other versions
CN114710400B (en
Inventor
杨飘飘
余学山
赵耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210432873.4A priority Critical patent/CN114710400B/en
Publication of CN114710400A publication Critical patent/CN114710400A/en
Application granted granted Critical
Publication of CN114710400B publication Critical patent/CN114710400B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0811Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present disclosure provides a method, an apparatus, an electronic device, a medium, and a computer program product for locating a faulty device based on a network topology. The method and the device can be used in the technical field of artificial intelligence. The fault equipment positioning method based on the network topology structure comprises the following steps: determining abnormal equipment nodes according to a pre-constructed network topological structure; taking the abnormal equipment node as a root node; determining m links where the root nodes are located in the network topology structure, wherein m is an integer greater than or equal to 1; starting from the link with the largest number of nodes, and sequentially carrying out connectivity detection on the m links according to the descending order of the number of nodes; when the detected link connectivity is failed, the link is taken as a failed link; and fault locating the fault link through a minimum dichotomy until a fault device is located.

Description

Fault device positioning method, apparatus, electronic device, medium, and program product
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for locating a faulty device based on a network topology, an electronic device, a medium, and a computer program product.
Background
With the rapid development of internet technology, emerging services such as big data, artificial intelligence and the like are rapidly raised, and a high-performance RDMA (Remote Direct Memory Access) protocol with high bandwidth, low delay and low CPU resource utilization rate is increasingly selected by financial institutions. And with the rapid increase of data center traffic, the network scale is gradually enlarged. The existing RDMA network structure generally adopts a tree structure, several devices with larger forwarding capacity are placed in the core, and in order to ensure enough port quantity, multiple layers of devices need to be hung, tens of or even hundreds of network devices are cascaded together, so that multilayer cascade is realized. In order to meet the increasing network flow and service requirements of the data center, network equipment needs to be increased continuously, and once a fault occurs, the fault equipment is difficult to find out from hundreds of equipment rapidly.
In the prior art, when a network fails, failure feedback is often obtained from an application side first, and operation and maintenance personnel know about the network failure phenomenon from the application personnel first; and then checking some logs and port traffic of the equipment, finding out a lost packet or an IP address which is not communicated, carrying out network circulation on the equipment through which the fault traffic passes, and finding out the fault equipment. Firstly, an application person can only feed back a local surface phenomenon and cannot reflect the fault condition of the whole network; secondly, when a network fault occurs, monitoring some logs and port flows of the equipment, but more information is insufficient, if no log or alarm is output under extreme conditions or when the equipment is in fault, the problem cannot be found in time; when searching for a lost packet or an invalid IP address for network circulation through ping or Traceroute tracing, all devices through which fault traffic passes need to be circulated.
Disclosure of Invention
In view of the above, the present disclosure provides a fast and high-quality network topology-based faulty device positioning method, apparatus, electronic device, computer-readable storage medium, and computer program product.
One aspect of the present disclosure provides a method for locating a faulty device based on a network topology, including: determining abnormal equipment nodes according to a pre-constructed network topological structure; taking the abnormal equipment node as a root node; determining m links where the root nodes are located in the network topology structure, wherein m is an integer greater than or equal to 1; starting from the link with the largest number of nodes, and sequentially carrying out connectivity detection on the m links according to the descending order of the number of nodes; when the detected link connectivity is failed, the link is taken as a failed link; and fault locating the fault link through a minimum dichotomy until a fault device is located.
According to the fault equipment positioning method based on the network topology structure, the fault link is determined, and the fault equipment is positioned in the fault link through the minimum bisection method. The method and the device for detecting the connectivity of the links start from the link with the largest number of nodes, and sequentially carry out connectivity detection on the m links according to the descending order of the number of the nodes, so that the failure troubleshooting times can be reduced, and the quick positioning of the failed links can be realized. When the connectivity of the fault link is checked by adopting the minimum dichotomy, the link quality of each link from end to end of the network can be detected, so that the fault equipment is accurately positioned, and the pressure of operation and maintenance personnel is reduced.
In some embodiments, the connectivity detection is detected by a ping explorer.
In some embodiments, the performing, in order from the link with the largest number of nodes, connectivity detection on the m links in descending order according to the number of nodes specifically includes: sorting the m links in a descending order according to the number of the nodes; and sequentially carrying out connectivity detection on the m links according to the sequencing sequence.
In some embodiments, the performing fault location on the faulty link through a least-squares method until locating a faulty device specifically includes: the method comprises the following steps: dividing the fault link into two sub-links from an intermediate node; step two: detecting connectivity of one of the two sublinks; step three: when the sub-link has a communication fault, determining the sub-link as a fault link; and step four: and repeating the first step to the third step until the fault equipment is positioned.
In some embodiments, the performing fault location on the faulty link through a least-squares method until a faulty device is located specifically further includes: step five: when the sub-link has no communication fault, determining the other sub-link in the two sub-links as a fault link; step six: and repeating the first step to the fifth step until the fault equipment is positioned.
Another aspect of the present disclosure provides a network topology based fault device locating apparatus, including: the first determination module is used for determining abnormal equipment nodes according to a pre-constructed network topological structure; a second determination module to perform the taking of the abnormal device node as a root node; a third determining module, configured to perform determining m links where the root node is located in the network topology, where m is an integer greater than or equal to 1; the detection module is used for executing connectivity detection on the m links in sequence from the link with the largest number of nodes according to the descending order of the number of nodes; a fourth determining module, configured to perform, until the detected link connectivity failure, taking the link as a failed link; and the positioning module is used for performing fault positioning on the fault link through a minimum bisection method until fault equipment is positioned.
Another aspect of the present disclosure provides an electronic device comprising one or more processors and one or more memories, wherein the memories are configured to store executable instructions that, when executed by the processors, implement the method as described above.
Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed.
Another aspect of the disclosure provides a computer program product comprising a computer program comprising computer executable instructions for implementing the method as described above when executed.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
fig. 1 schematically illustrates an exemplary system architecture to which the methods, apparatus, and methods may be applied, in accordance with an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a method for network topology based fault device location in accordance with an embodiment of the present disclosure;
FIG. 3 schematically illustrates a schematic diagram of a network topology according to an embodiment of the present disclosure;
fig. 4 schematically shows a flowchart of connectivity detection for m links in descending order of the number of nodes, starting from the link including the largest number of nodes according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow diagram for fault location of a faulty link through a minimum dichotomy until a faulty device is located, in accordance with an embodiment of the disclosure;
FIG. 6 schematically illustrates a flow chart for fault locating a faulty link through a least bisection method until a faulty device is located, according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a block diagram of a network topology based fault device locating apparatus according to an embodiment of the present disclosure;
FIG. 8 schematically illustrates a flow chart of a method for network topology based fault device location in accordance with an embodiment of the present disclosure;
FIG. 9 schematically illustrates a block diagram of a network topology based fault device locating apparatus according to an embodiment of the present disclosure;
FIG. 10 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, necessary security measures are taken, and the customs of the public order is not violated. In the technical scheme of the disclosure, the data acquisition, collection, storage, use, processing, transmission, provision, disclosure, application and other processing are all in accordance with the regulations of relevant laws and regulations, necessary security measures are taken, and the public order and good custom are not violated.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features.
With the rapid development of internet technology, emerging services such as big data, artificial intelligence and the like are rapidly rising, and high-performance RDMA (remote direct Memory Access) protocols with high bandwidth, low delay and low CPU resource utilization rate are increasingly selected by financial institutions. And with the rapid increase of data center traffic, the network scale is gradually enlarged. The existing RDMA network structure generally adopts a tree structure, several devices with larger forwarding capacity are placed in the core, and in order to ensure enough port quantity, multiple layers of devices need to be hung, tens of or even hundreds of network devices are cascaded together, so that multilayer cascade is realized. In order to meet the increasing network flow and service requirements of the data center, network equipment needs to be increased continuously, and once a fault occurs, the fault equipment is difficult to find out from hundreds of equipment rapidly.
In the prior art, when a network fails, failure feedback is often obtained from an application side first, and operation and maintenance personnel know about the network failure phenomenon from the application personnel first; and then checking some logs and port traffic of the equipment, finding out a lost packet or an IP address which is not communicated, carrying out network circulation on the equipment through which the fault traffic passes, and finding out the fault equipment. Firstly, an application person can only feed back a local surface phenomenon and cannot reflect the fault condition of the whole network; secondly, when a network fault occurs, monitoring some logs and port flows of the equipment, but more information is insufficient, if no log or alarm is output under extreme conditions or when the equipment is in fault, the problem cannot be found in time; when searching for a lost packet or an unavailable IP address for network circulation through ping or Traceroute tracking work, all devices through which fault traffic passes need to be circulated.
Because each layer in the tree network topology structure has a plurality of devices, the traffic is quite large, the time consumed in the positioning process is too long, and the difficulty of searching for the fault device is increased. In addition, the positioning method cannot detect the link quality of each link from end to end of the network, and when the network quality is deteriorated, which link has a problem cannot be positioned.
Embodiments of the present disclosure provide a method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product for locating a faulty device based on a network topology. The fault equipment positioning method based on the network topology structure comprises the following steps: determining abnormal equipment nodes according to a pre-constructed network topological structure; taking the abnormal equipment node as a root node; determining m links where a root node in a network topology structure is located, wherein m is an integer greater than or equal to 1; starting from the link with the largest number of nodes, sequentially carrying out connectivity detection on the m links according to the descending order of the number of the nodes; when the detected link connectivity is failed, the link is taken as a failed link; and fault locating the fault link through a minimum dichotomy until the fault equipment is located.
It should be noted that the method, the apparatus, the electronic device, the computer-readable storage medium, and the computer program product for locating a faulty device based on a network topology according to the present disclosure may be used in the field of artificial intelligence, and may also be used in any fields other than the field of artificial intelligence, such as the field of finance, and the field of the present disclosure is not limited herein.
Fig. 1 schematically illustrates an exemplary system architecture 100 to which network topology based fault device location methods, apparatus, electronic devices, computer readable storage media, and computer program products may be applied, according to embodiments of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 101, 102, 103 to interact with a server 105 over a network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the network topology-based fault device location method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the network topology based fault device locating apparatus provided by the embodiments of the present disclosure may be generally disposed in the server 105. The network topology based fault device locating method provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Correspondingly, the network topology-based fault device locating apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The network topology based fault device positioning method according to the embodiment of the present disclosure will be described in detail below with reference to fig. 2 to 6 based on the scenario described in fig. 1.
Fig. 2 schematically shows a flow chart of a method for network topology based fault device localization according to an embodiment of the present disclosure.
As shown in fig. 2, the method for locating a faulty device based on a network topology according to this embodiment includes operations S210 to S260.
In operation S210, an abnormal device node is determined according to a pre-constructed network topology. It should be noted that the network topology structure may be a tree structure, and when an abnormality of a certain device is detected, the faulty device causing the abnormality may be the abnormal device, and may also be any device on a branch diverged by the abnormal device.
In operation S220, the abnormal device node is taken as a root node.
In operation S230, m links where a root node is located in a network topology are determined, where m is an integer greater than or equal to 1.
In operation S240, connectivity detection is performed on the m links in descending order according to the number of nodes, starting from the link including the largest number of nodes. With reference to the network topology of fig. 3, for example, it is assumed that a node B is detected to be abnormal, and therefore the node B is determined as an abnormal device node, and with the node B as a root node, it is determined that 5 links including the node B are: A-B-E-J-P, A-B-E-K, A-B-F, A-B-G-L and A-B-G-M.
As a possible implementation manner, as shown in fig. 4, operation S240 sequentially performs connectivity detection on m links in descending order according to the number of nodes, starting from the link with the largest number of nodes, specifically including operation S241 and operation S242.
In operation S241, the m links are sorted in descending order according to the number of nodes.
In operation S242, connectivity checks are sequentially performed on the m links according to the sorting order.
The results of sorting the above 5 links A-B-E-J-P, A-B-E-K, A-B-F, A-B-G-L and A-B-G-M in descending order according to the number of nodes are A-B-E-J-P (including 5 nodes), A-B-E-K (including 4 nodes), A-B-G-L (including 4 nodes), A-B-G-M (including 4 nodes), and A-B-F (including 3 nodes). Therefore, connectivity detection can be performed on A-B-E-J-P, A-B-E-K, A-B-G-L, A-B-G-M and A-B-F in sequence. The operations S241 and S242 may facilitate performing connectivity detection on the m links in descending order according to the number of nodes, starting from the link including the largest number of nodes.
In some specific examples, the connectivity detection is detected by a ping explorer. It can be understood that whether the node a is connected to the node P, whether the node a is connected to the node K, whether the node a is connected to the node L, whether the node a is connected to the node M, and whether the node a is connected to the node F can be detected through the ping explorer, thereby facilitating the connectivity detection of the M links in sequence.
Until the detected link connectivity fails, the link is treated as a failed link in operation S250. Continuing with the network topology of fig. 3 as an example, assuming that it is detected that the link cannot be connected from node a to node M, i.e., there is a connectivity failure in link a-B-G-M, link a-B-G-M is taken as the failed link.
In operation S260, the failed link is fault-located by the least-squares method until the failed device is located.
As a possible implementation manner, as shown in fig. 5, operation S260 performs fault location on the faulty link through least bisection until the faulty device is located, specifically including operation S261 to operation S264.
In operation S261, step one: the failed link is divided into two sub-links from the intermediate node. Taking the failed link a-B-G-M as an example, since the failed link has 4 nodes, the intermediate node may be a second node B or a third node G, and taking the node B as the intermediate node as an example, the two divided sub-links are a-B and B-G-M, respectively. Of course, if the failed link has an odd number of nodes, e.g., 5 nodes, then the intermediate node is only one, and is the third node.
In operation S262, step two: connectivity of one of the two sublinks is detected.
In operation S263, step three: and when the sub-link has the communication fault, determining the sub-link as a fault link. For example, if the link B-G-M is detected to have a connectivity fault, the link B-G-M is determined to be a fault link.
In operation S264, step four: and repeating the first step to the third step until the fault equipment is positioned. As the fault link B-G-M has 3 nodes, the middle node is a second node G, and the two divided sublinks are B-G and G-M. And performing connectivity detection on the two sub-links B-G and G-M to locate the fault equipment. Specifically, when B-G has no fault and G-M has fault, the fault equipment is positioned as M; when the B-G has a fault and the G-M has no fault, positioning the fault equipment as B; and when B-G has faults and G-M has faults, the fault equipment is positioned as G.
The fault location of the fault link through the minimum dichotomy can be conveniently realized through the operations S261 to S264 until the fault equipment is located.
As a possible implementation manner, as shown in fig. 6, operation S260 performs fault location on the faulty link through a least bisection method until a faulty device is located, and specifically includes operation S265 and operation S266.
In operation S265, step five: when the sub-link has no connection fault, the other sub-link in the two sub-links is determined as a fault link.
In operation S266, step six: and repeating the first step to the fifth step until the fault equipment is positioned. Through operations S261 to S266, it may be convenient to implement fault location on the faulty link through the least bisection method until the faulty device is located.
According to the fault equipment positioning method based on the network topology structure, the fault link is determined, and the fault equipment is positioned in the fault link through the minimum bisection method. According to the method and the device, connectivity detection is performed on the m links in sequence from the link with the largest number of nodes according to the descending order of the number of the nodes, so that the failure troubleshooting times can be reduced, and the failure link can be quickly positioned. When the connectivity of the fault link is checked by adopting the minimum dichotomy, the link quality of each link from end to end of the network can be detected, so that the fault equipment is accurately positioned, and the pressure of operation and maintenance personnel is reduced.
Based on the above method for locating a fault device based on a network topology structure, the present disclosure also provides a device 10 for locating a fault device based on a network topology structure. The network topology based faulty device locating apparatus 10 will be described in detail below with reference to fig. 7.
Fig. 7 schematically shows a block diagram of the network topology based fault device locating apparatus 10 according to an embodiment of the present disclosure.
The network topology based fault device locating apparatus 10 includes a first determining module 1, a second determining module 2, a third determining module 3, a detecting module 4, a fourth determining module 5 and a locating module 6.
A first determining module 1, the first determining module 1 being configured to perform operation S210: and determining abnormal equipment nodes according to a pre-constructed network topology structure.
A second determining module 2, where the second determining module 2 is configured to perform operation S220: and taking the abnormal equipment node as a root node.
A third determining module 3, the second determining module 3 being configured to perform operation S230: determining m links where a root node is located in a network topology structure, wherein m is an integer greater than or equal to 1.
A detection module 4, the detection module 4 being configured to perform operation S240: and starting from the link with the maximum number of nodes, and sequentially carrying out connectivity detection on the m links according to the descending order of the number of the nodes.
A fourth determining module 5, the fourth determining module 5 being configured to perform operation S250: and when the detected link connectivity fails, the link is taken as a failed link.
A positioning module 6, the positioning module 6 being configured to perform operation S260: and carrying out fault location on the fault link through a minimum bisection method until the fault equipment is located.
According to the fault equipment positioning device based on the network topology structure, the fault link is determined, and the fault equipment is positioned in the fault link through the minimum bisection method. According to the method and the device, connectivity detection is performed on the m links in sequence from the link with the largest number of nodes according to the descending order of the number of the nodes, so that the failure troubleshooting times can be reduced, and the failure link can be quickly positioned. When the minimum dichotomy is adopted to carry out connectivity check on a fault link, the link quality of each link section from end to end of the network can be detected, so that fault equipment is accurately positioned, and the pressure of operation and maintenance personnel is relieved.
In addition, according to the embodiment of the present disclosure, any plurality of the first determining module 1, the second determining module 2, the third determining module 3, the detecting module 4, the fourth determining module 5, and the positioning module 6 may be combined into one module to be implemented, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module.
According to an embodiment of the present disclosure, at least one of the first determining module 1, the second determining module 2, the third determining module 3, the detecting module 4, the fourth determining module 5 and the positioning module 6 may be implemented at least partially as a hardware link, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an application specific integrated link (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging links, or implemented by any one of three implementations of software, hardware and firmware, or in a suitable combination of any of them.
Alternatively, at least one of the first determining module 1, the second determining module 2, the third determining module 3, the detecting module 4, the fourth determining module 5 and the positioning module 6 may be at least partly implemented as a computer program module, which when executed may perform a corresponding function.
A network topology based faulty device locating apparatus according to an embodiment of the present disclosure is described in detail below with reference to fig. 8 to 9. It is to be understood that the following description is illustrative only and is not intended to be in any way limiting of the present disclosure.
The disclosure mainly provides a fault equipment positioning method and device based on a network topology structure. According to the method, the equipment with possible faults in the existing equipment is preliminarily screened through network monitoring, then the quality of each link section is determined in the least time by adopting the minimum dichotomy aiming at the target fault equipment, the fault link is finally determined, the fault location is quicker and more efficient, and the pressure of operation and maintenance personnel is reduced.
The method determines k target fault devices in a network topology structure containing n network devices through network monitoring equipment, and determines an edge root node and a longest path aiming at a tree structure of the k target fault devices. And searching the longest path for packet loss or whether the IP address is communicated through ping, and then checking the rest path through bisection to realize quick positioning of the fault link. Wherein n and k are integers more than or equal to 3, and n is more than or equal to k.
The present invention is explained in detail below with reference to fig. 8 and 9.
As shown in fig. 8, a flow chart of the method for locating a faulty device based on the network topology is as follows.
S101: and aiming at the network topology structure, finding k alarm abnormal devices through network monitoring equipment, and determining the alarm abnormal devices as target fault devices.
S102: for k target failure devices, the device with the highest level is determined as the root node (generally, the core device).
S103: after the root node is determined, the longest path (q ≦ k) containing q devices is automatically selected for k target failure devices through a breadth-first search algorithm.
S104: and testing the connectivity of the longest path through ping, and if the communication is abnormal, determining that the path is a fault path. Step S105 is performed, otherwise, step S106 is performed.
S105: and step S104 is carried out on the longest path through the minimum bisection method until the fault link is found out.
S106: and repeating the step S103 for k-q devices out of the longest path until a fault link is determined.
As shown in fig. 9, the fault device locating apparatus based on the network topology structure includes a network monitoring apparatus, a root node and path locating apparatus, and a fault analyzing apparatus.
The network monitoring device: the method is used for acquiring the abnormal network equipment and determining the network topology structure of the target fault equipment.
Root node and path positioning device: and determining the highest-level target fault equipment as a root node according to the topological structure of the target fault equipment, and positioning the longest target fault path by adopting a breadth-first search algorithm.
A failure analysis device: and determining the fault link of the path according to the root node and the path positioning device.
The method has the advantages that the method for determining the root node and the longest path by adopting the breadth-first search algorithm is adopted, the IP address with lost or blocked target fault equipment is searched, the troubleshooting times are reduced, and the fault link is quickly positioned. The link quality of each link section from end to end of the network is detected, the fault link is accurately positioned, and the pressure of operation and maintenance personnel is reduced.
Fig. 10 schematically shows a block diagram of an electronic device adapted to implement the above method according to an embodiment of the present disclosure.
As shown in fig. 10, an electronic apparatus 900 according to an embodiment of the present disclosure includes a processor 901 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. Processor 901 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an application specific integrated link (ASIC)), among others. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. The processor 901 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the programs may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
Electronic device 900 may also include input/output (I/O) interface 905, input/output (I/O) interface 905 also connected to bus 904, according to an embodiment of the present disclosure. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The driver 910 is also connected to an input/output (I/O) interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 902 and/or RAM 903 described above and/or one or more memories other than the ROM 902 and RAM 903.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. The program code is for causing a computer system to perform the methods of the embodiments of the disclosure when the computer program product is run on the computer system.
The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 901. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, and downloaded and installed through the communication section 909 and/or installed from the removable medium 911. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 909 and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the disclosure, and these alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (9)

1. A fault equipment positioning method based on a network topology structure is characterized by comprising the following steps:
determining abnormal equipment nodes according to a pre-constructed network topological structure;
taking the abnormal equipment node as a root node;
determining m links where the root nodes are located in the network topology structure, wherein m is an integer greater than or equal to 1;
starting from the link with the largest number of nodes, and sequentially carrying out connectivity detection on the m links according to the descending order of the number of nodes;
when the detected link connectivity fails, taking the link as a failed link; and
and carrying out fault location on the fault link through a minimum bisection method until fault equipment is located.
2. The method of claim 1, wherein the connectivity check is detected by a ping explorer.
3. The method according to claim 1, wherein the performing connectivity detection on the m links in descending order according to the number of nodes, starting from the link with the largest number of nodes, specifically comprises:
sorting the m links in a descending order according to the number of the nodes; and
and sequentially carrying out connectivity detection on the m links according to the sequencing order.
4. The method according to claim 1, wherein the fault locating the faulty link through a least bisection method until a faulty device is located specifically comprises:
the method comprises the following steps: dividing the fault link into two sub-links from an intermediate node;
step two: detecting connectivity of one of the two sublinks;
step three: when the sub-link has a communication fault, determining the sub-link as a fault link; and
step four: and repeating the first step to the third step until the fault equipment is positioned.
5. The method according to claim 4, wherein the fault locating the faulty link by least bisection until a faulty device is located, further comprising:
step five: when the sub-link has no communication fault, determining the other sub-link in the two sub-links as a fault link;
step six: and repeating the first step to the fifth step until the fault equipment is positioned.
6. A network topology based fault device locating apparatus, comprising:
the first determination module is used for determining abnormal equipment nodes according to a pre-constructed network topological structure;
a second determination module to perform the taking of the abnormal device node as a root node;
a third determining module, configured to perform determining m links where the root node is located in the network topology, where m is an integer greater than or equal to 1;
the detection module is used for executing connectivity detection on the m links in sequence from the link with the largest number of nodes according to the descending order of the number of nodes;
a fourth determining module, configured to execute the link as a failed link until the detected link connectivity failure; and
a location module to perform fault location of the faulty link by a minimum bisection method until a faulty device is located.
7. An electronic device, comprising:
one or more processors;
one or more memories for storing executable instructions that, when executed by the processor, implement the method of any of claims 1-5.
8. A computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, implement a method according to any one of claims 1 to 5.
9. A computer program product comprising a computer program comprising one or more executable instructions which, when executed by a processor, implement a method according to any one of claims 1 to 5.
CN202210432873.4A 2022-04-22 2022-04-22 Fault equipment positioning method, device, electronic equipment and medium Active CN114710400B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210432873.4A CN114710400B (en) 2022-04-22 2022-04-22 Fault equipment positioning method, device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210432873.4A CN114710400B (en) 2022-04-22 2022-04-22 Fault equipment positioning method, device, electronic equipment and medium

Publications (2)

Publication Number Publication Date
CN114710400A true CN114710400A (en) 2022-07-05
CN114710400B CN114710400B (en) 2023-11-07

Family

ID=82175470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210432873.4A Active CN114710400B (en) 2022-04-22 2022-04-22 Fault equipment positioning method, device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN114710400B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102111290A (en) * 2009-12-28 2011-06-29 国际商业机器公司 Method and system for searching for source failure node in traceable network
CN111064614A (en) * 2019-12-17 2020-04-24 腾讯科技(深圳)有限公司 Fault root cause positioning method, device, equipment and storage medium
WO2020211356A1 (en) * 2019-04-18 2020-10-22 中国电力科学研究院有限公司 Relay protection system risk assessment and fault positioning method and apparatus, and device and medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102111290A (en) * 2009-12-28 2011-06-29 国际商业机器公司 Method and system for searching for source failure node in traceable network
WO2020211356A1 (en) * 2019-04-18 2020-10-22 中国电力科学研究院有限公司 Relay protection system risk assessment and fault positioning method and apparatus, and device and medium
CN111064614A (en) * 2019-12-17 2020-04-24 腾讯科技(深圳)有限公司 Fault root cause positioning method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN114710400B (en) 2023-11-07

Similar Documents

Publication Publication Date Title
US10389596B2 (en) Discovering application topologies
US9071535B2 (en) Comparing node states to detect anomalies
US20170250880A1 (en) N-tiered eurt breakdown graph for problem domain isolation
CN108322320B (en) Service survivability analysis method and device
US9929930B2 (en) Reducing an amount of captured network traffic data to analyze
US11656959B2 (en) Disaster recovery region recommendation system and method
CN109542781B (en) Block chain consensus algorithm testing method and device, calculating device and storage medium
CN110896362B (en) Fault detection method and device
CN113918438A (en) Method and device for detecting server abnormality, server and storage medium
CN113342560A (en) Fault processing method, system, electronic equipment and storage medium
CN110519109B (en) Method, device, computing equipment and medium for detecting node association
CN112637010A (en) Equipment checking method and device
CN114710400B (en) Fault equipment positioning method, device, electronic equipment and medium
CN113254245A (en) Fault detection method and system for storage cluster
US9935836B2 (en) Exclusive IP zone support systems and method
CN112491601B (en) Traffic topology generation method and device, storage medium and electronic equipment
CN115941432A (en) Domain name alarm information sending method and device, electronic equipment and computer readable storage medium
CN112579402A (en) Method and device for positioning faults of application system
CN112131077A (en) Fault node positioning method and device and database cluster system
US20200394129A1 (en) Self healing software utilizing regression test fingerprints
CN114095394B (en) Network node fault detection method and device, electronic equipment and storage medium
CN115190008B (en) Fault processing method, fault processing device, electronic equipment and storage medium
CN111290870A (en) Method and device for detecting abnormity
CN114938341B (en) Environment detection method and device, electronic equipment and storage medium
CN116225714A (en) Information processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant