CN107332709B - Fault positioning method and device - Google Patents

Fault positioning method and device Download PDF

Info

Publication number
CN107332709B
CN107332709B CN201710632889.9A CN201710632889A CN107332709B CN 107332709 B CN107332709 B CN 107332709B CN 201710632889 A CN201710632889 A CN 201710632889A CN 107332709 B CN107332709 B CN 107332709B
Authority
CN
China
Prior art keywords
node
detection
message
messages
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710632889.9A
Other languages
Chinese (zh)
Other versions
CN107332709A (en
Inventor
费志军
谢亮
华锦芝
何朔
尹亚伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN201710632889.9A priority Critical patent/CN107332709B/en
Publication of CN107332709A publication Critical patent/CN107332709A/en
Application granted granted Critical
Publication of CN107332709B publication Critical patent/CN107332709B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the invention relates to the technical field of data processing, in particular to a fault positioning method and device, which are used for quickly and accurately positioning a fault of a server cluster system. In the embodiment of the invention, a central node determines M detection messages according to a transaction to be detected and sends the M detection messages to the node to be detected; the transmission paths of the M detection messages cover all nodes to be detected; aiming at each detection message in the M detection messages, the central node determines the detection message with the abnormal node according to the feedback message corresponding to the detection message; the node to be detected in the transmission path of the detection message with the abnormal node is a primary screening node; the central node determines N detection messages according to the primary screening node and sends the N detection messages to the primary screening node; the transmission paths of the N detection messages cover all the transmission paths of the primary screening nodes; and aiming at each detection message in the N detection messages, the central node determines an abnormal node according to a feedback message corresponding to the detection message.

Description

Fault positioning method and device
Technical Field
The invention relates to the technical field of data processing, in particular to a fault positioning method and device.
Background
Currently, many relevant reports exist about the research of fault location technology. Among these, several typical fault location techniques include:
1. manual testing method
When a communication network fails, a worker with certain experience judges the specific position of the failure according to a failure alarm signal from the communication network by using a certain testing instrument.
2. Channel correlation analysis method
The method is that the correlation of the service path is utilized, when the communication network has a fault, the alarm information is transmitted to the upstream network element of the service path through a forward alarm signal (FDI) and a backward alarm signal (BDI), and the upstream network element carries out arbitration according to the alarm information so as to judge the specific position of the fault.
3. Central control network element analysis method
The method is that a main control network element is set in the communication network, when the communication network is in fault, the alarm indication signals of all other network elements are transmitted to the main control network element, and the main control network element judges the specific position of the fault according to the alarm indication signals and the prestored service and topology information about the communication network.
4. Case analysis method
In the method, firstly, the faults which have occurred are arranged into a case, and a fault location information base is established. When the communication network has faults, the method utilizes the fault location information base to locate the faults by searching fault identifications matched with the fault warning signals.
Among the above typical fault location techniques, the manual testing method is not suitable for a large-scale computer cluster system because it cannot protect the damaged service in real time and remove the fault in time; for the channel correlation analysis method and the central control network element analysis method, because a large number of fault alarm signals need to be transmitted in front of the network element to be used as a basis for judging the upstream network element or the main control network element when fault positioning is performed, if the two methods are adopted, the fault positioning time is prolonged, and the two methods are difficult to be applied to a communication network of distributed control by using complicated calculation; although the case analysis method is accurate in positioning, the case analysis method cannot position the fault type which is not in the case base, so that the case analysis method cannot be used in some special occasions. Therefore, a method is needed that can locate faults quickly and accurately.
Disclosure of Invention
The application provides a fault positioning method and device, which are used for quickly and accurately positioning a fault of a server cluster system.
The fault positioning method provided by the embodiment of the invention comprises the following steps:
the central node determines M detection messages according to the transaction to be detected and sends the M detection messages to the node to be detected; the transmission paths of the M detection messages cover all nodes to be detected;
aiming at each detection message in the M detection messages, the central node determines the detection message with an abnormal node according to the feedback message corresponding to the detection message; the feedback message is obtained according to the operation result of the to-be-detected node corresponding to the detection message; the nodes to be detected in the transmission path of the detection message with the abnormal nodes are primary screening nodes;
the central node determines N detection messages according to the primary screening node and sends the N detection messages to the primary screening node; the transmission paths of the N detection messages cover all the transmission paths of the primary screening nodes;
and aiming at each detection message in the N detection messages, the central node determines an abnormal node according to a feedback message corresponding to the detection message.
Optionally, before the central node determines M detection messages according to the transaction to be detected, the method further includes:
the central node divides all nodes to be detected into different clusters based on different transaction types, and the nodes to be detected in the same cluster process messages of the same transaction type;
the central node sends the M detection messages to the nodes to be detected, and the M detection messages comprise:
aiming at a cluster, the central node sends the M detection messages to nodes to be detected in the cluster;
the central node sends the N detection messages to the prescreening node, including:
and aiming at one cluster, the central node sends the N detection messages to the primary screening nodes in the cluster.
Optionally, after the central node determines an abnormal node according to the feedback message corresponding to the detection message, the method further includes:
and setting the state of the abnormal node as an isolation state.
Optionally, after isolating the abnormal node, the method further includes:
and the central node sends a detection message to the abnormal node, and if the abnormal node is determined to be recovered to be normal according to a feedback message corresponding to the detection message, the isolation state of the abnormal node is released.
Optionally, the determining, by the central node, the detection message of the abnormal node according to the feedback message corresponding to the detection message includes:
and aiming at any detection message in the M detection messages, if the central node does not receive the feedback message corresponding to the detection message within the threshold time or the feedback message corresponding to the detection message is an error message, determining that an abnormal node exists.
Optionally, the determining, by the central node, an abnormal node according to the feedback message corresponding to the detection message includes:
and aiming at feedback messages corresponding to all detection messages passing through the first node to be detected, if the central node does not receive the feedback messages within the threshold time or the feedback messages are all error messages, determining that the first node to be detected is an abnormal node.
A fault locating device comprising:
the first sending module is used for determining M detection messages according to the transaction to be detected and sending the M detection messages to the node to be detected; the transmission paths of the M detection messages cover all nodes to be detected;
a determining module, configured to determine, for each detection message in the M detection messages, a detection message in which an abnormal node exists according to a feedback message corresponding to the detection message; the feedback message is obtained according to the operation result of the to-be-detected node corresponding to the detection message; the nodes to be detected in the transmission path of the detection messages with the abnormal nodes are primary screening nodes;
a second sending module, configured to determine N detection messages according to the primary screening node, and send the N detection messages to the primary screening node; the transmission paths of the N detection messages cover all the transmission paths of the primary screening nodes;
and the positioning module is used for determining an abnormal node according to the feedback message corresponding to the detection message aiming at each detection message in the N detection messages.
Optionally, the system further includes a dividing module, configured to divide all nodes to be detected into different clusters based on different transaction types, where the nodes to be detected in the same cluster process messages of the same transaction type;
the first sending module is specifically configured to send, for a cluster, the M detection messages to nodes to be detected in the cluster;
the second sending module is specifically configured to send, for a cluster, the N detection messages to the primary screening node in the cluster.
Optionally, the apparatus further includes a processing module, configured to:
and setting the state of the abnormal node as an isolation state.
Optionally, the processing module is further configured to:
and sending a detection message to the abnormal node, and removing the isolation state of the abnormal node if the abnormal node is determined to be recovered to be normal according to a feedback message corresponding to the detection message.
Optionally, the determining module is specifically configured to:
and aiming at any detection message in the M detection messages, if the feedback message corresponding to the detection message is not received within the threshold time or the feedback message corresponding to the detection message is an error message, determining that an abnormal node exists.
Optionally, the positioning module is specifically configured to:
and if the feedback messages corresponding to all the detection messages passing through the first node to be detected are not received within the threshold time or are all error messages, determining that the first node to be detected is an abnormal node.
In the embodiment of the invention, when a certain node in the nodes to be detected fails or the whole nodes to be detected need to be maintained and screened, the central node determines M detection information according to the transaction to be detected and sends the M detection information to the nodes to be detected. The transmission paths of the M detection messages cover all nodes to be detected. And after the central node sends the detection messages to the nodes to be detected, receiving feedback messages corresponding to each detection message, wherein the feedback messages are fed back by the nodes to be detected which receive the detection messages according to the operation results. The central node determines the detection message with the abnormal node according to the received feedback message, and takes the corresponding to-be-detected node in the transmission path of the detection message with the abnormal node as the primary screening node, so that the abnormal node can be determined to exist in the primary screening node. And the central node determines N detection messages according to the primary screening node and sends the N detection messages to the primary screening node, and the transmission paths of the N detection messages cover all the transmission paths of the primary screening node. And finally, the central node determines an abnormal node according to the feedback messages corresponding to the N detection messages. The embodiment of the invention combines the full coverage of the detection node and the right coverage of the transmission path, and realizes the high-speed and accurate positioning of the fault of the server cluster system.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a diagram illustrating a system architecture suitable for use with an embodiment of the present invention;
fig. 2 is a schematic flow chart of an over-the-air card issuing method according to an embodiment of the present invention;
fig. 3 is a schematic flow chart of an over-the-air card issuing method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an aerial card issuing device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of another aerial card issuing device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a system architecture to which the embodiment of the present invention is applicable includes a central node and a node to be detected. The central node and the nodes to be detected can be network devices such as computers, and the nodes to be detected can be a system formed by a plurality of servers or processes. Preferably, the central node and the nodes to be detected can adopt a cloud computing technology to perform information processing.
The central node is used for calculating a sending path of the detection message, sending the detection message to the nodes to be detected, judging the state of each node to be detected through recording and analyzing a transaction result fed back by the nodes to be detected, sending an isolation instruction to the abnormal node, and sending an isolation removal instruction to the abnormal node which is recovered to be normal. The nodes to be detected are used for receiving the detection information sent by the central node, forwarding the data message according to the sending path and feeding back the operation result to the central node, so that the central node determines the state of each node to be detected according to the operation result.
The central node may communicate with the node to be detected through an INTERNET network, or may communicate with the node to be detected through a Global System for Mobile Communications (GSM), a Long Term Evolution (LTE) System, or other Mobile communication systems. The nodes to be detected can communicate with each other through an INTERNET network, and can also communicate through a Global System for mobile communications (GSM), a Long Term Evolution (LTE) System and other mobile communication systems.
Fig. 2 schematically illustrates a flow chart of a fault location method according to an embodiment of the present invention. As shown in fig. 2, the fault locating method provided in the embodiment of the present invention includes the following steps:
step 201, a central node determines M detection messages according to a transaction to be detected and sends the M detection messages to the node to be detected; and the transmission paths of the M detection messages cover all nodes to be detected.
Step 202, for each detection message in the M detection messages, the central node determines a detection message with an abnormal node according to a feedback message corresponding to the detection message; the feedback message is obtained according to the operation result of the node to be detected corresponding to the detection message; and the nodes to be detected in the transmission path of the detection message with the abnormal nodes are primary screening nodes.
Step 203, the central node determines N detection messages according to the primary screening node and sends the N detection messages to the primary screening node; and the transmission paths of the N detection messages cover all the transmission paths of the primary screening node.
And 204, aiming at each detection message in the N detection messages, the central node determines an abnormal node according to a feedback message corresponding to the detection message.
In the embodiment of the invention, when a certain node in the nodes to be detected fails or the whole nodes to be detected need to be maintained and screened, the central node determines M detection information according to the transaction to be detected and sends the M detection information to the nodes to be detected. The transmission paths of the M detection messages cover all nodes to be detected. And after the central node sends the detection messages to the nodes to be detected, receiving feedback messages corresponding to each detection message, wherein the feedback messages are fed back by the nodes to be detected which receive the detection messages according to the operation results. The central node determines the detection message with the abnormal node according to the received feedback message, and takes the corresponding to-be-detected node in the transmission path of the detection message with the abnormal node as the primary screening node, so that the abnormal node can be determined to exist in the primary screening node. And the central node determines N detection messages according to the primary screening node and sends the N detection messages to the primary screening node, and the transmission paths of the N detection messages cover all the transmission paths of the primary screening node. And finally, the central node determines an abnormal node according to the feedback messages corresponding to the N detection messages. The embodiment of the invention combines the full coverage of the detection node and the right coverage of the transmission path, and realizes the high-speed and accurate positioning of the fault of the server cluster system.
The detection message in the embodiment of the invention is obtained according to the simulation of the normal transaction message, and the normal transaction message is not suitable for being used in the fault detection of the embodiment of the invention because the sending path of the normal transaction message is random, the accuracy and the timeliness of the fault detection cannot be ensured, and the success rate of the normal transaction message needs to be ensured. The content of the simulated detection message can be consistent with that of the normal transaction message, and the fault in the node system to be detected can be positioned by replacing the normal transaction message.
Generally, a transaction message needs to be processed cooperatively by a plurality of server nodes, each server node completes a link of the transaction message in turn, and meanwhile, a plurality of server nodes can process the same link of the transaction message. Therefore, according to the sequence of processing the transaction messages, the server nodes can be divided into a plurality of layers, and the server nodes of each layer complete one link of the transaction messages. For example, the node system to be detected comprises A, B, C layers, as shown in fig. 3. The layer A is provided with 3 nodes to be detected, namely A1, A2 and A3, the layer B is provided with 3 nodes to be detected, namely B1, B2 and B3, and the layer C is provided with 2 nodes to be detected, namely C1 and C2. Any node to be detected in the layer A can process the first link of the transaction message, namely any node to be detected in the layers A1, A2 and A3 receives the detection message sent by the central node, processes the detection message and sends the processed detection message to the node to be detected in the layer B. Any node to be detected in B1, B2 and B3 receives the detection message sent by the layer A, processes the detection message and sends the processed detection message to the node to be detected of the layer C. After the C1 or C2 receives the detection message and successfully processes the detection message, the result is fed back to the central node as it is. After the central node normally receives the feedback message, it can be determined that the states of the 3 nodes to be detected on the detection message sending path are all normal states according to the feedback message.
In the embodiment of the invention, the central node sends the detection message to the node system to be detected in two stages. The first-stage detection message is sent in a mode of covering all nodes to be detected, and whether fault nodes exist in a node system to be detected can be determined according to the feedback of the nodes to be detected. And if the fault node exists in the node system to be detected, the central node sends the detection message of the second stage. And the sending mode of the detection message at the second stage is to cover all transmission paths, and the specific position of the fault node can be determined according to the feedback. The following describes two ways of sending the detection message in detail by taking fig. 3 as an example.
First, all nodes to be detected are covered, and for the nodes to be detected in fig. 3, the central node in the first stage needs to send 3 detection messages. The transmission path of the first detection message is a1-B1-C1, the transmission path of the second detection message is a2-B2-C2, and the transmission path of the third detection message is A3-B3-C1, so that the transmission paths of 3 detection messages cover all nodes to be detected, that is, the number of M in step 201 is 3. The central node may send the 3 detection messages simultaneously or sequentially.
The central node receives the feedback messages corresponding to the 3 detection messages, and determines the detection messages with abnormal nodes according to the receiving condition of the feedback messages.
The central node determines the detection message with the abnormal node according to the feedback message corresponding to the detection message, and the method comprises the following steps:
and aiming at any detection message in the M detection messages, if the central node does not receive the feedback message corresponding to the detection message within the threshold time or the feedback message corresponding to the detection message is an error message, determining that an abnormal node exists.
And aiming at the 3 detection messages, if the central node does not receive the feedback message corresponding to any detection message in the 3 detection messages within the threshold time, or the feedback message corresponding to any detection message in the 3 detection messages is an error message, determining that an abnormal node exists in the node system to be detected. For example, if the feedback message corresponding to the second detection message or the feedback message corresponding to the second detection message is an error message, it is determined that an abnormal node exists in the nodes to be detected and the abnormal node exists on the transmission path of the second detection message, and then the nodes a2, B2, and C2 to be detected on the transmission path of the second detection message are used as the primary screening nodes.
Thereafter, the central node determines the number of transmission second stage test messages and the transmission path according to the three primary screening nodes a2, B2 and C2. The transmission path of the second stage detection message needs to cover all the transmission paths of the primary screening node. For the primary screening node A2, possible transmission paths are A2-B1-C1, A2-B2-C1, A2-B3-C1, A2-B1-C2, A2-B2-C2 and A2-B3-C2. For the primary screening node B2, possible transmission paths are A1-B2-C1, A3-B2-C1, A1-B2-C2, and A3-B2-C2. For the primary screening node C2, possible transmission paths are A1-B1-C2, A2-B1-C2, A3-B1-C2, A1-B2-C2, A3-B2-C2, A1-B3-C2, A2-B3-C2 and A3-B3-C2. Therefore, the number of the second stage detection messages is multiplied by the number of nodes in each layer, and is 3 × 2 — 18, that is, the number of N in step 203 is 18, and the central node needs to send 18 detection messages to the node system to be detected in total.
And for the feedback message of the second-stage detection message, if the messages fed back by the transmission path related to a certain node to be detected are not normally displayed, determining that the node to be detected is an abnormal node.
The central node determines an abnormal node according to the feedback message corresponding to the detection message, and the determining includes:
and aiming at feedback messages corresponding to all detection messages passing through the first node to be detected, if the central node does not receive the feedback messages within the threshold time or the feedback messages are all error messages, determining that the first node to be detected is an abnormal node.
In the first stage, the abnormal node in the node system to be detected can be determined as long as one feedback message corresponding to the detection message is abnormal. In the second stage, the existence of the abnormal node may cause that all detection messages related to the abnormal node in the path cannot be processed normally, so that multiple feedback messages are abnormal correspondingly. Therefore, the central node can determine the corresponding detection message according to the abnormal feedback message, and then accurately position the abnormal node according to the node on the detection message transmission path. For example, if the feedback message corresponding to the detection message of the transmission path a1-B2-C1, a2-B2-C1, A3-B2-C1, a1-B2-C2, a2-B2-C2, or A3-B2-C2 is abnormal, the central node may determine that the node B2 to be detected is an abnormal node.
It should be noted that, in step 203, the abnormal node may not be confirmed by one full path detection, because the system allows partial packet loss and an abnormal retry mechanism, multiple rounds of full path detection are required after the abnormal node is suspected to exist, the state of each node in the full path detection is recorded, and the state of the node is confirmed to be normal only when a successful transaction passes through a certain node in each round of full path detection. After multiple rounds of full path detection, node state determination is performed, for example, 5 rounds of full path detection are performed, each node has 5 state records, and the node state can be finally determined according to a certain rule. If the node A records as (normal, abnormal), if the specified abnormal times is larger than the normal times, the node A is judged to be abnormal, and the node A is isolated; if all the conditions are abnormal, the A is judged to be abnormal, and then the A is judged to be normal. The mechanism can enhance the robustness of the judgment algorithm, so that the system is more stable, otherwise, the condition that the system node is judged by mistake and isolated frequently occurs.
In order to reduce consumption of system resources and enhance pertinence of fault detection, the embodiment of the invention classifies all the node machines to be detected based on different transaction types. Before the step 201, the method further includes:
the central node divides all the nodes to be detected into different clusters based on different transaction types, and the nodes to be detected in the same cluster process messages of the same transaction type.
The central node sends the M detection messages to the nodes to be detected, and the M detection messages comprise:
aiming at a cluster, the central node sends the M detection messages to nodes to be detected in the cluster;
the central node sends the N detection messages to the prescreening node, including:
and aiming at one cluster, the central node sends the N detection messages to the primary screening nodes in the cluster.
The normal transaction messages can be classified into query messages, consumption messages, operation messages and the like according to types, the nodes to be detected are divided into different clusters according to different transaction types, and different clusters are detected respectively, so that the detection strategy is more flexible, the number of the detection messages is reduced as much as possible, and the consumption of system resources is reduced. Still taking the node to be detected in fig. 3 as an example, the node to be detected system can process two kinds of detection messages, namely query and consumption, wherein 3 nodes to be detected in the a layer and 2 nodes to be detected in the C layer can process the two kinds of detection messages, while the node to be detected B1 in the B layer only processes the query type detection messages, and B2 and B3 only process the consumption type detection messages. Therefore, for the query class detection message, all nodes to be detected in the layers a and C and the node B to be detected in the layer B1 need to be detected; for the consumption detection message, all nodes to be detected of the layer A and the layer C and nodes to be detected B2 and B3 of the layer B need to be detected. Therefore, the nodes to be detected are divided into two clusters based on different transaction types, the cluster corresponding to the query type detection message comprises the nodes to be detected A1, A2, A3, B1, C1, C2 and C3, and the cluster corresponding to the consumption type detection message comprises the nodes to be detected A1, A2, A3, B2, B3, C1, C2 and C3. The embodiment of the invention respectively sends the detection messages to the two clusters to carry out fault detection and positioning.
In the embodiment of the invention, after the abnormal node is positioned, the detected abnormal node is processed. Step 204, after the central node determines an abnormal node according to the feedback message corresponding to the detection message, the method further includes:
and setting the state of the abnormal node as an isolation state.
In addition, after the isolated abnormal node returns to normal, the isolation can be released. Namely after isolating the abnormal node, the method further comprises the following steps:
and the central node sends a detection message to the abnormal node, and if the abnormal node is determined to be recovered to be normal according to a feedback message corresponding to the detection message, the isolation state of the abnormal node is released.
The embodiment of the invention realizes the functions of high-speed accurate positioning and quick isolation of the fault of the server cluster system; under the condition that the isolated node recovers the normal function, the isolated node can be quickly found and automatically released.
In order to more clearly understand the present invention, the following detailed description of the above process is provided by using a specific embodiment, and the specific steps are shown in fig. 4, and include:
step 401, the central node divides the nodes to be detected into clusters based on different transaction types, and generates a cluster network topological graph.
For a detection message with a transaction type of x, the corresponding network topology is as follows:
Gx={T1{A1,...,AmA},T2{B1,...,BmB},...,Tn{C1,...,CmC} … … equation 1
Where T { T1, T2., Tn } is the set of steps that need to be performed to process the check message, and when i > j, Ti is performed prior to Tj, { C1., CmCAnd the node to be detected is a node set which can execute the step Tn. The network topology graph Gx represents the largest set of clusters covered by transaction x.
Step 402, the central node generates a detection message sending plan covering all the nodes to be detected. Wherein, the transmission plan covering all nodes is as follows:
Figure RE-GDA0001417687060000121
where Pi is a transmission plan of the detection message Id covering all nodes, and M is max (M)A,mB,...,mC) That is, M is the maximum value of the number of nodes to be detected in each step of processing the detection message. In formula 2, when i > mxWhen the value of i is larger than the number of nodes in the cluster of the nodes to be detected, the value of the subscript i of the nodes to be detected is replaced by i mod mx+1。
And step 403, the central node sends the detection message to each node to be detected according to the detection message sending plan covering the whole nodes, and receives a feedback result.
And step 404, the central node judges whether an abnormal node exists in the node system to be detected according to the feedback result, if so, step 405 is executed, otherwise, the detection is finished.
Step 405, the central node determines the node to be detected in the transmission path of the detection message with the abnormal node as the primary screening node, and generates a detection message sending plan covering all the transmission paths of the primary screening node. Wherein, the transmission plan covering the whole path is as follows:
Figure RE-GDA0001417687060000131
wherein, Q is a transmission plan of the detection message Id covering the whole path, and N is mA*mB*...*mC
And step 406, the central node sends the detection message to each node to be detected according to the detection message sending plan covering the full path, and receives a feedback result.
And step 407, the central node determines the position of the abnormal node according to the feedback result in the step 406.
And step 408, isolating the abnormal node by the central node.
Fig. 5 schematically illustrates a structural diagram of a fault location device according to an embodiment of the present invention.
As shown in fig. 5, a fault locating device provided in an embodiment of the present invention includes:
a first sending module 501, configured to determine M detection messages according to a transaction to be detected, and send the M detection messages to a node to be detected; the transmission paths of the M detection messages cover all nodes to be detected;
a determining module 502, configured to determine, for each detection message in the M detection messages, a detection message with an abnormal node according to a feedback message corresponding to the detection message; the feedback message is obtained according to the operation result of the to-be-detected node corresponding to the detection message; the nodes to be detected in the transmission path of the detection message with the abnormal nodes are primary screening nodes;
a second sending module 503, configured to determine N detection messages according to the prescreening node, and send the N detection messages to the prescreening node; the transmission paths of the N detection messages cover all the transmission paths of the primary screening node;
a positioning module 504, configured to determine, for each detection message in the N detection messages, an abnormal node according to a feedback message corresponding to the detection message.
Optionally, the system further includes a dividing module 505, configured to divide all the nodes to be detected into different clusters based on different transaction types, where the nodes to be detected in the same cluster process messages of the same transaction type;
the first sending module 501 is specifically configured to send, for one cluster, the M detection messages to the node to be detected in the cluster;
the second sending module 503 is specifically configured to send, for a cluster, the N detection messages to the primary screening node in the cluster.
Optionally, a processing module 506 is further included, configured to:
and setting the state of the abnormal node as an isolation state.
Optionally, the processing module 506 is further configured to:
and sending a detection message to the abnormal node, and removing the isolation state of the abnormal node if the abnormal node is determined to be recovered to be normal according to a feedback message corresponding to the detection message.
Optionally, the determining module 502 is specifically configured to:
and aiming at any detection message in the M detection messages, if the feedback message corresponding to the detection message is not received within the threshold time or the feedback message corresponding to the detection message is an error message, determining that an abnormal node exists.
Optionally, the positioning module 504 is specifically configured to:
and if the feedback messages corresponding to all the detection messages passing through the first node to be detected are not received within the threshold time or are all error messages, determining that the first node to be detected is an abnormal node.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A method of fault location, comprising:
the central node determines M detection messages according to the transaction to be detected and sends the M detection messages to the node to be detected; the transmission paths of the M detection messages cover all nodes to be detected;
aiming at each detection message in the M detection messages, the central node determines the detection message with an abnormal node according to the feedback message corresponding to the detection message; the feedback message is obtained according to the operation result of the to-be-detected node corresponding to the detection message; the nodes to be detected in the transmission path of the detection message with the abnormal nodes are primary screening nodes;
the central node determines N detection messages according to the primary screening node and sends the N detection messages to the primary screening node; the transmission paths of the N detection messages cover all the transmission paths of the primary screening nodes;
aiming at each detection message in the N detection messages, the central node determines an abnormal node according to a feedback message corresponding to the detection message;
before the central node determines M detection messages according to the transaction to be detected, the method further includes:
the central node divides all nodes to be detected into different clusters based on different transaction types, and the nodes to be detected in the same cluster process messages of the same transaction type;
the central node sends the M detection messages to the nodes to be detected, and the M detection messages comprise:
aiming at a cluster, the central node sends the M detection messages to nodes to be detected in the cluster;
the central node sends the N detection messages to the prescreening node, including:
and aiming at one cluster, the central node sends the N detection messages to the primary screening nodes in the cluster.
2. The method of claim 1, wherein after the central node determines an abnormal node according to the feedback message corresponding to the detection message, the method further comprises:
and setting the state of the abnormal node as an isolation state.
3. The method of claim 2, wherein after isolating the abnormal node, further comprising:
and the central node sends a detection message to the abnormal node, and if the abnormal node is determined to be recovered to be normal according to a feedback message corresponding to the detection message, the isolation state of the abnormal node is released.
4. The method according to any one of claims 1 to 3, wherein the determining, by the central node, the detection message of the abnormal node according to the feedback message corresponding to the detection message includes:
and aiming at any detection message in the M detection messages, if the central node does not receive the feedback message corresponding to the detection message within the threshold time or the feedback message corresponding to the detection message is an error message, determining that an abnormal node exists.
5. The method according to any one of claims 1 to 3, wherein the determining, by the central node, an abnormal node according to the feedback message corresponding to the detection message comprises:
and aiming at feedback messages corresponding to all detection messages passing through the first node to be detected, if the central node does not receive the feedback messages within the threshold time or the feedback messages are all error messages, determining that the first node to be detected is an abnormal node.
6. A fault locating device, comprising:
the first sending module is used for determining M detection messages according to the transaction to be detected and sending the M detection messages to the node to be detected; the transmission paths of the M detection messages cover all nodes to be detected;
a determining module, configured to determine, for each detection message in the M detection messages, a detection message in which an abnormal node exists according to a feedback message corresponding to the detection message; the feedback message is obtained according to the operation result of the to-be-detected node corresponding to the detection message; the nodes to be detected in the transmission path of the detection message with the abnormal nodes are primary screening nodes;
a second sending module, configured to determine N detection messages according to the primary screening node, and send the N detection messages to the primary screening node; the transmission paths of the N detection messages cover all the transmission paths of the primary screening nodes;
a positioning module, configured to determine, for each of the N detection messages, an abnormal node according to a feedback message corresponding to the detection message;
the device also comprises a dividing module, wherein the dividing module is used for dividing all the nodes to be detected into different clusters based on different transaction types, and the nodes to be detected in the same cluster process messages of the same transaction type;
the first sending module is specifically configured to send, for a cluster, the M detection messages to nodes to be detected in the cluster;
the second sending module is specifically configured to send, for a cluster, the N detection messages to the primary screening node in the cluster.
7. The apparatus of claim 6, further comprising a processing module to:
and setting the state of the abnormal node as an isolation state.
8. The apparatus of claim 7, wherein the processing module is further configured to:
and sending a detection message to the abnormal node, determining that the abnormal node is recovered to be normal according to a feedback message corresponding to the detection message, and then removing the isolation state of the abnormal node.
9. The apparatus according to any one of claims 6 to 8, wherein the determining module is specifically configured to:
and aiming at any detection message in the M detection messages, if the feedback message corresponding to the detection message is not received within the threshold time or the feedback message corresponding to the detection message is an error message, determining that an abnormal node exists.
10. The apparatus according to any one of claims 6 to 8, wherein the positioning module is specifically configured to:
and for the feedback messages corresponding to all the detection messages passing through the first node to be detected, if the feedback messages are not received within the threshold time or the feedback messages are all error messages, determining that the first node to be detected is an abnormal node.
CN201710632889.9A 2017-07-28 2017-07-28 Fault positioning method and device Active CN107332709B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710632889.9A CN107332709B (en) 2017-07-28 2017-07-28 Fault positioning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710632889.9A CN107332709B (en) 2017-07-28 2017-07-28 Fault positioning method and device

Publications (2)

Publication Number Publication Date
CN107332709A CN107332709A (en) 2017-11-07
CN107332709B true CN107332709B (en) 2020-08-11

Family

ID=60199507

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710632889.9A Active CN107332709B (en) 2017-07-28 2017-07-28 Fault positioning method and device

Country Status (1)

Country Link
CN (1) CN107332709B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108235800B (en) * 2017-12-19 2021-08-03 达闼机器人有限公司 Network fault detection method, control center equipment and computer storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104993960A (en) * 2015-07-01 2015-10-21 广东工业大学 Location method of network node fault

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100516655B1 (en) * 2003-02-05 2005-09-22 삼성전자주식회사 Optical channel path supervisory and correction apparatus and method for transparent optical cross-connect

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104993960A (en) * 2015-07-01 2015-10-21 广东工业大学 Location method of network node fault

Also Published As

Publication number Publication date
CN107332709A (en) 2017-11-07

Similar Documents

Publication Publication Date Title
CN113328872B (en) Fault repairing method, device and storage medium
CN112134762B (en) Testing method, device, terminal and storage medium for block chain network structure
CN109886290B (en) User request detection method and device, computer equipment and storage medium
CN106610854A (en) Model update method and device
US8245079B2 (en) Correlation of network alarm messages based on alarm time
CN111611100B (en) Transaction fault detection method, device, computing equipment and medium
CA2558671A1 (en) Centrally controlled distributed marking of content
CN104135395A (en) Method and system of monitoring data transmission quality in IDC (Internet Data Center) network
CN110275992B (en) Emergency processing method, device, server and computer readable storage medium
CN106776243B (en) Monitoring method and device for monitoring software
CN107769943A (en) A kind of method and apparatus of active and standby cluster switching
CN113938407A (en) Data center network fault detection method and device based on in-band network telemetry system
CN107332709B (en) Fault positioning method and device
Zhang et al. Service failure diagnosis in service function chain
CN112910995B (en) Resource allocation method and device based on multi-cloud environment, electronic equipment and medium
CN110609761B (en) Method and device for determining fault source, storage medium and electronic equipment
CN115941538B (en) Test system, test method and test device for multiparty security calculation
CN111181796B (en) Block chain consensus protocol testing method and system based on enabler
US20130325203A1 (en) Methods and systems for monitoring a vehicle for faults
CN107770796A (en) Node state detection method and equipment for non-network type alignment system
CN112118156B (en) Filtering method and device for Ethernet protocol test
CN111061258B (en) Function testing method and device based on train control system
CN106649352A (en) Data processing method and apparatus
CN108270481B (en) Fault positioning method and device in transmission network
CN115757303B (en) Index tracking method and device in distributed system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant