CN106301853B - The fault detection method and device of group system interior joint - Google Patents

The fault detection method and device of group system interior joint Download PDF

Info

Publication number
CN106301853B
CN106301853B CN201510306800.0A CN201510306800A CN106301853B CN 106301853 B CN106301853 B CN 106301853B CN 201510306800 A CN201510306800 A CN 201510306800A CN 106301853 B CN106301853 B CN 106301853B
Authority
CN
China
Prior art keywords
node
heartbeat message
neighbor
neighbor nodes
receive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510306800.0A
Other languages
Chinese (zh)
Other versions
CN106301853A (en
Inventor
胡琳
伍湘平
彭佩星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201510306800.0A priority Critical patent/CN106301853B/en
Priority to PCT/CN2016/073606 priority patent/WO2016192408A1/en
Publication of CN106301853A publication Critical patent/CN106301853A/en
Application granted granted Critical
Publication of CN106301853B publication Critical patent/CN106301853B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hardware Redundancy (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The embodiment of the present invention provides the fault detection method and device of a kind of group system interior joint, this method comprises: whether first node judgement receives the first heartbeat message of second node transmission within a preset time, first node is the neighbor node of second node, and the first heartbeat message is the heartbeat message that second node is concurrently sent to each neighbor node of second node;In the case where first node does not receive the heartbeat message of second node transmission, other neighbor nodes into all neighbor nodes of second node in addition to first node send request message;First node receives the response message for carrying reception state of other neighbor nodes transmission;In the case where first node determines that other neighbor nodes do not receive heartbeat message according to reception state, first node determines that second node breaks down.The fault detection method and device of group system interior joint provided in an embodiment of the present invention can be improved the efficiency of node failure detection.

Description

The fault detection method and device of group system interior joint
Technical field
The present embodiments relate to the communication technology more particularly to a kind of fault detection methods and dress of group system interior joint It sets.
Background technique
In distributed cluster system, generally include a central node and multiple ordinary nodes, when central node or After ordinary node breaks down, very big influence will be caused to the reliability of distributed cluster system, therefore, how effectively into The fault detection of row node, is very important.
Fig. 1 is the schematic diagram of the fault detection method of prior art interior joint, as shown in Figure 1, ordinary node (B, C, D, E) Heartbeat message is sent to central node (M) according to heart beat cycle, central node (M) is according to the successive heartbeat received in detection cycle The case where message, come detect ordinary node whether failure, wherein a detection cycle may include multiple heart beat cycles.Meanwhile Central node (M) periodically can also send heartbeat message to ordinary node (B, C, D, E), to notify ordinary node centromere The served as role of point and whether it is in normal condition, once during ordinary node (B, C, D, E) do not receive in detection cycle The heartbeat message that heart node (M) is sent, then judge out central node (M) and break down, at this point, ordinary node can initiate again The operation of vote center node, if electing successfully, ordinary node will perceive new central node, and send heartbeat message to newly Central node, cluster carries out fault detection again.
However, in the prior art, being detected by way of judging whether to receive heartbeat message in detection cycle When whether node breaks down, since in the case where cluster scale is fixed, the heart beat cycle for sending heartbeat message can not change, Therefore the time of detection cycle can not also change, so that node failure detection needs just detect by multiple heart beat cycles Come, the period for causing node failure to detect is longer, and the efficiency for causing node failure to detect is lower.
Summary of the invention
The embodiment of the present invention provides the fault detection method and device of a kind of group system interior joint, for solving existing skill Art there is node failure detection need just detected by multiple heart beat cycles, cause node failure detect period Longer problem, to improve the efficiency of node failure detection.
In a first aspect, the embodiment of the present invention provides a kind of fault detection method of group system interior joint, comprising:
Whether first node judgement receives the first heartbeat message of second node transmission within a preset time;Described first Node is the neighbor node of the second node, and first heartbeat message is the second node concurrently to second section The heartbeat message that each neighbor node of point is sent, the numbers of all neighbor nodes of the second node be two with On;The preset time is greater than or equal to a heart beat cycle, and less than two heart beat cycles;
In the case where the first node does not receive the first heartbeat message that the second node is sent, described first Other neighbor nodes of node into all neighbor nodes of the second node in addition to the first node send request and disappear Breath, the request message is for inquiring whether other described neighbor nodes receive first heartbeat message;
The first node receives the response message for carrying reception state that other described neighbor nodes are sent, described to connect Receipts state is for indicating whether to receive first heartbeat message;
It is taken in the response message that the first node is sent according to other each described neighbor nodes received The reception state of band, in the case where determining that other described neighbor nodes do not receive first heartbeat message, described the One node determines that the second node breaks down.
With reference to first aspect, in the first possible implementation of the first aspect, described in the first node determines After second node breaks down, further includes:
The first node generates the first vote information, and receives the second ballot that other each described neighbor nodes are sent Information, first vote information include the corresponding node identification of node of the first node election;The second ballot letter Breath includes the corresponding node identification of node for sending the neighbor node election of second vote information;
The first node is according to the node identification and each other neighbor nodes hair in first vote information The node identification in the second vote information sent counts the ballot quantity that each node obtains in all nodes elected, and Using the most node of quantity of voting as third node;The third node is for the substitution second node and concurrently to institute All neighbor nodes for stating third node send the node of heartbeat message;All neighbor nodes of the third node include described The neighbor node of third node itself and the neighbor node of the second node.
With reference to first aspect or the first possible implementation of first aspect, second in first aspect are possible In implementation, further includes:
It is taken in the response message that the first node is sent according to other each described neighbor nodes received The reception state of band, in the case where determining that at least one other described neighbor node receives first heartbeat message, institute It states first node and determines that the link between the node and the second node that do not receive first heartbeat message breaks down; The node for not receiving first heartbeat message includes not receiving in the first node and other described neighbor nodes To the node of first heartbeat message.
With reference to first aspect, second of the first of first aspect to first aspect any possible implementation, In a third possible implementation of the first aspect, further includes:
The first node according in the neighbor node and other described neighbor nodes of the third node remove the third Node except node redefines the neighbor node of the first node.
Second aspect, the embodiment of the present invention provide a kind of fault detection method of group system interior joint, the method packet It includes:
Second node concurrently sends the first heartbeat message to first node and other neighbor nodes;The first node is The neighbor node of the second node, other described neighbor nodes are in all neighbor nodes of the second node except described the Node except one node, the number of other neighbor nodes are more than one;
Whether the first node judgement receives first heartbeat message within a preset time;The preset time is big In or be equal to a heart beat cycle, and less than two heart beat cycles;
In the case where the first node does not receive first heartbeat message, the first node is to each described Other neighbor nodes send request message respectively, and the request message is for inquiring whether other each described neighbor nodes receive To first heartbeat message;
The first node receives the response message for carrying reception state that other each described neighbor nodes are sent, institute Reception state is stated for indicating whether to receive first heartbeat message;
In the first node according to the reception state carried in the response message received, determine it is described other In the case that neighbor node does not receive first heartbeat message, the first node determines that event occurs for the second node Barrier.
In conjunction with second aspect, in the first possible implementation of the second aspect, described in the first node determination After second node breaks down, further includes:
The first node generates the first vote information, and receives the second ballot that other each described neighbor nodes are sent Information, first vote information include the corresponding node identification of node of the first node election;The second ballot letter Breath includes the corresponding node identification of node for sending the neighbor node election of second vote information;
The first node is according to the node identification and each other neighbor nodes hair in first vote information The node identification in the second vote information sent counts the ballot quantity that each node obtains in all nodes elected, and Using the most node of quantity of voting as third node;The third node is for the substitution second node and concurrently to institute All neighbor nodes for stating third node send the node of heartbeat message;All neighbor nodes of the third node include described The neighbor node of third node itself and the neighbor node of the second node.
In conjunction with the possible implementation of the first of second aspect or second aspect, second in second aspect is possible In implementation, further includes:
It is taken in the response message that the first node is sent according to other each described neighbor nodes received The reception state of band, in the case where determining that at least one other described neighbor node receives first heartbeat message, then The first node determines that event occurs for the link between the node and the second node that do not receive first heartbeat message Barrier;The node for not receiving first heartbeat message includes not connecing in the first node and other described neighbor nodes Receive the node of the first heartbeat message.
In conjunction with second aspect, second aspect the first to second aspect second of any possible implementation, In the third possible implementation of the second aspect, further includes:
The first node according in the neighbor node and other described neighbor nodes of the third node remove the third Node except node redefines the neighbor node of the first node.
The third aspect, the embodiment of the present invention provide a kind of fault detection means of group system interior joint, comprising:
Judgment module, the first heartbeat message for judging whether to receive second node transmission within a preset time;Institute The neighbor node that first node is the second node is stated, first heartbeat message is the second node concurrently to described The heartbeat message that each neighbor node of second node is sent, the number of all neighbor nodes of the second node are two More than a;The preset time is greater than or equal to a heart beat cycle, and less than two heart beat cycles;
Judge that receiving module does not receive the first heartbeat message that the second node is sent in the judgment module In the case of,
Sending module, for other neighbours in all neighbor nodes to the second node in addition to the first node It occupies node and sends request message, the request message is for inquiring whether other described neighbor nodes receive first heartbeat Message;
The receiving module is also used to receive the response for carrying reception state that other described neighbor nodes are sent and disappears Breath, the reception state is for indicating whether to receive first heartbeat message;
Determining module, the sound that other each described neighbor nodes for being received according to the receiving module are sent Answer the reception state carried in message, it is determined whether other described neighbor nodes do not receive first heartbeat message;
The case where other described neighbor nodes do not receive first heartbeat message is determined in the determining module Under, the determining module is also used to determine that the second node breaks down.
In conjunction with the third aspect, in the first possible implementation of the third aspect, institute is determined in the determining module It states after second node breaks down, further includes:
Generation module, is also used to generate the first vote information, and first vote information includes the first node election The corresponding node identification of node;
The receiving module, is also used to receive the second vote information that other each described neighbor nodes are sent, and described the Two vote informations include the corresponding node identification of node for sending the neighbor node election of second vote information;
The determining module, be also used to according in first vote information node identification and other each described neighbours The node identification in the second vote information that node is sent, counts the votes that each node obtains in all nodes elected Amount, and using the most node of quantity of voting as third node;The third node is to substitute the second node and concurrently The node of heartbeat message is sent to all neighbor nodes of the third node;All neighbor nodes of the third node include The neighbor node of the third node itself and the neighbor node of the second node.
In conjunction with the possible implementation of the first of the third aspect or the third aspect, second in the third aspect is possible In implementation,
The determining module according to the receiving module receive it is each described in other neighbor nodes send described in The reception state carried in response message determines that at least one other described neighbor node receives first heartbeat message In the case where,
The determining module be also used to determine do not receive first heartbeat message node and the second node it Between link break down;The node for not receiving first heartbeat message include the first node and it is described other The node of first heartbeat message is not received in neighbor node.
In conjunction with the third aspect, the third aspect the first to the third aspect second of any possible implementation, In the third possible implementation of the third aspect,
The determining module is also used in neighbor node and other described neighbor nodes according to the third node except institute The node except third node is stated, the neighbor node of the first node is redefined.
Fourth aspect, the embodiment of the present invention provide a kind of fault detection system of group system interior joint, including first segment Point, second node and other neighbor nodes, the first node are the neighbor node of the second node, other neighbours section Point is the node in all neighbor nodes of the second node in addition to the first node, the number of other neighbor nodes Mesh is more than one, comprising:
The second node, for concurrently sending the first heartbeat report to the first node and other described neighbor nodes Text;
The first node, for judging whether receive first heartbeat message within a preset time;It is described default Time is greater than or equal to a heart beat cycle, and less than two heart beat cycles;
In the case where the first node does not receive first heartbeat message, the first node is also used to every One other described neighbor nodes send request message respectively, and the request message is for inquiring other each described neighbor nodes It is no to receive first heartbeat message;And the first node is also used to receive other each described neighbor nodes and sends The response message for carrying reception state, the reception state is for indicating whether to receive first heartbeat message;
It is taken in the response message that the first node is sent according to other each described neighbor nodes received The reception state of band, in the case where determining that other described neighbor nodes do not receive first heartbeat message, described the One node is also used to determine that the second node breaks down.
In conjunction with fourth aspect, in the first possible implementation of the fourth aspect, described in the first node determination After second node breaks down, further includes:
The first node is also used to:
The first vote information is generated, and receives the second vote information that other each described neighbor nodes are sent, described the One vote information includes the corresponding node identification of node of the first node election, and second vote information includes sending institute State the corresponding node identification of node of the neighbor node election of the second vote information;
And according to second of node identification and each other neighbor nodes transmission in first vote information Node identification in vote information, counts the ballot quantity that each node obtains in all nodes for being elected, and by votes Most nodes is measured as third node;The third node is to substitute the second node and concurrently to the third section All neighbor nodes of point send the node of heartbeat message;All neighbor nodes of the third node include the third node The neighbor node of the neighbor node of itself and the second node.
In conjunction with the possible implementation of the first of fourth aspect or fourth aspect, second in fourth aspect is possible In implementation,
It is taken in the response message that the first node is sent according to other each described neighbor nodes received The reception state of band, in the case where determining that at least one other described neighbor node receives first heartbeat message,
The first node be also used to determine do not receive first heartbeat message node and the second node it Between link break down;The node for not receiving first heartbeat message include the first node and it is described other The neighbor node of first heartbeat message is not received in neighbor node.
In conjunction with fourth aspect, fourth aspect the first to fourth aspect second of any possible implementation, In the third possible implementation of the fourth aspect,
The first node is also used in neighbor node and other described neighbor nodes according to the third node except institute The node except third node is stated, the neighbor node of the first node is redefined.
In the fault detection method and device of group system interior joint provided in an embodiment of the present invention, first node judges Whether first heartbeat message of second node transmission is received in preset time, wherein first node is the neighbours of second node Node, the first heartbeat message are the heartbeat message that second node is concurrently sent to each neighbor node of second node, The number of all neighbor nodes of second node is two or more;The preset time is greater than or equal to a heart beat cycle, and small In two heart beat cycles;First node inquires its of the second node in the case where itself does not receive the first heartbeat message Whether his neighbor node receives the first heartbeat message, and does not also receive in other neighbor nodes for determining the second node In the case where first heartbeat message, determine that failure has occurred in second node.Since preset time is greater than or equal to a heartbeat Period, and less than two heart beat cycles, so being avoided existing when carrying out fault detection using technical solution provided by the invention It needs that the phenomenon that whether egress breaks down could be detected by multiple heart beat cycles in technology, fault detection can be shortened Period, to improve the efficiency of node failure detection.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without any creative labor, can be with It obtains other drawings based on these drawings.
Fig. 1 is the structural schematic diagram of the fault detection method of group system interior joint in the prior art;
Fig. 2 is the flow diagram of the fault detection method embodiment one of group system interior joint provided by the invention;
The schematic diagram one of Fig. 3 neighbouring relations between group system interior joint;
The schematic diagram two of Fig. 4 neighbouring relations between group system interior joint;
Fig. 5 is the flow diagram of the fault detection method embodiment two of group system interior joint provided by the invention;
Fig. 6 A is the schematic diagram that neighbouring relations between the front nodal point of node failure are detected in group system;
Fig. 6 B is the schematic diagram that neighbouring relations between node are redefined after detecting node failure in group system;
Fig. 7 is the flow diagram of the fault detection method embodiment three of group system interior joint provided by the invention;
Fig. 8 is the flow diagram of the fault detection method example IV of group system interior joint provided by the invention;
Fig. 9 is the structural schematic diagram of the fault detection means embodiment one of group system interior joint of the present invention;
Figure 10 is that structural schematic diagram Figure 10 of the fault detection system embodiment one of group system interior joint of the present invention is this The structural schematic diagram of invention node embodiment one;
Figure 11 is the structural schematic diagram of node embodiment one of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
The embodiment of the present invention is suitable for group system, and it is particularly applicable to the inspections of the failure of distributed cluster system interior joint The scene of survey.The distributed cluster system includes at least two nodes, which for example can be computer.Optionally, this reality Apply node in the group system in example and existing group system the difference is that: in the group system of the present embodiment, All assign all nodes to identical function, i.e., all nodes reception heartbeat message all having the same and transmission heartbeat report Therefore the ability of text in the group system of the present embodiment, and is not present the differentiation of central node and ordinary node, is also not required to Central node is wanted to manage ordinary node.Optionally, the technical solution of following embodiments is situated between using computer as executing subject It continues.
Fig. 2 is the flow diagram of the fault detection method embodiment one of group system interior joint provided by the invention.This The method that inventive embodiments are related to is suitable for distributed cluster system.The present embodiment using computer as executing subject for be situated between It continues.As shown in Fig. 2, the method for the present embodiment may include:
Whether step 201, first node judgement receive the first heartbeat message of second node transmission within a preset time; First node is the neighbor node of second node, and the first heartbeat message is second node concurrently to each neighbours of second node The heartbeat message that node is sent, the number of all neighbor nodes of second node are two or more;Preset time be greater than or Equal to one heart beat cycle, and less than two heart beat cycles.
In the present embodiment, second node is preset according to the information of nodes all in group system according in group system Rule determine first node, wherein first node be second node any one neighbor node, the neighbours of second node Node is the node relevant with second node.The schematic diagram one of Fig. 3 neighbouring relations between group system interior joint, such as Shown in Fig. 3, in group system, node E can be determined according to the information of all nodes according to default rule in group system Neighbor node there are four it out, is node A, B, C and D respectively.Wherein, first node can be any in node A, B, C and D One.Whether first node receives the first heartbeat message of second node transmission by judgement within a preset time, to detect Whether second node breaks down.It needs to be illustrated, second node is by concurrently to its all neighbor nodes Heartbeat message is sent, therefore, the first heartbeat message is second node concurrently at same moment to each of second node The heartbeat message that neighbor node is sent.In addition, second node can be according to heart beat cycle concurrently to its all neighbours Node sends the first heartbeat message, and therefore, first node, which may determine that, is being greater than or equal to a heart beat cycle, and less than two Whether first heartbeat message of second node transmission is received in the time of heart beat cycle.Such as: assuming that heart beat cycle is 5s, That is second node concurrently will send a heartbeat message to its all neighbor nodes every 5s, for second node the The first heartbeat message that 5s is sent, first node will judge whether received more than or equal to 5s, and within the time for being less than 10s The first heartbeat message sent to second node.Wherein, heart beat cycle can rule of thumb or actual conditions are configured, right In the specific value of heart beat cycle, this is not restricted for the present embodiment.
In addition, second node periodically can send the first heartbeat message to first node by a physical network, But when due to carrying out fault detection based on single physical network, in network failure, such as: event occurs for management plane network Barrier, and when service plane network is normal, it can not often define second node in group system and failure or the second section has occurred Link between point and first node has occurred failure or second node and first node while failure has occurred, and leads as a result, Cause the testing result inaccuracy of failure.In order to solve this problem, it is preferable that at least two nets can also be passed through in the present embodiment Network sends the first heartbeat message, for example, the first heartbeat message can be sent by biplane, such as: management plane and industry Business plane can also send the first heartbeat message by three planes, such as: management plane, service plane and signaling plane.Using The mode of more physical networks sends the first heartbeat message, comes whether detection node breaks down, and the accuracy of detection can be improved. It needs to be illustrated, it is mutually isolated between at least two physical network if the quantity of physical network is at least two, When in this way can be to avoid certain equipment are shared due to existing between Multi net voting, if shared device breaks down, so as to cause node Between can not normal communication the phenomenon that, be conducive to improve detection accuracy.
Step 202, first node do not receive second node transmission the first heartbeat message in the case where, first node Other neighbor nodes into all neighbor nodes of second node in addition to first node send request message, and request message is used In inquiry, whether other neighbor nodes receive the first heartbeat message.
In the prior art, in the case where the heart beat cycle that ordinary node is sent to central node is fixed, because centered on The limitation of the performance of node, group system can not infinitely increase ordinary node, so that the scalability of group system is affected.Needle To this problem, in the embodiment of the present invention, if first node does not receive the first of second node transmission within a preset time Heartbeat message can primarily determine that second node has been likely to occur failure.Since second node is concurrently to all of it The first heartbeat message that neighbor node is sent, therefore, first node is by the neighbor node to second node, in addition to itself Other neighbor nodes send request message, to inquire whether other neighbor nodes receive the first heartbeat report of second node transmission Text.It can be seen that first node can be to second when first node does not receive the first heartbeat message of second node transmission Other neighbor nodes of node send request message, and the non-neighbor node of second node also will no longer be sent to second node Heartbeat message, it is possible thereby to which reduce the quantity of second node processing heartbeat message makes so as to mitigate the burden of second node The scalability for obtaining group system is preferable.
For example, the schematic diagram two of Fig. 4 neighbouring relations between group system interior joint, as shown in figure 4, node E Neighbor node has X, A, D, C and G, and node E will be sent in each heart beat cycle to its all neighbor node X, A, D, C and G Heartbeat message, it is assumed that using node E as second node, using node A as first node, if in some heart beat cycle, the One node A does not receive the first heartbeat message of second node E transmission, then first node A will be to other neighbor nodes X, D, C Request message is sent with G, to inquire whether nodes X, D, C and G receive the first heartbeat message.
Step 203, first node receive the response message for carrying reception state of other neighbor nodes transmission, the reception State is for indicating whether to receive the first heartbeat message.
In the present embodiment, after other neighbor nodes receive the request message that first node is sent, whether itself is connect The reception state carrying for receiving the first heartbeat message is sent to first node in the response message.
Step 204 carries in the response message that first node is sent according to other each neighbor nodes for receiving Reception state, in the case where determining that other neighbor nodes do not receive the first heartbeat message, first node determines the second section Point breaks down.
In the present embodiment, each other neighbor node is after the request message for receiving first node transmission, all The response message for carrying reception state can be returned to first node, first node is according to other each neighbor nodes received The response message for carrying reception state sent, judges whether other neighbor nodes receive the first heartbeat message, is judging In the case that other neighbor nodes are not received by the first heartbeat message that second node is sent out, that is, it can determine that the second section Failure has occurred in point.
Need to be illustrated, the neighbouring relations between node be it is two-way, that is, formed between the node of neighborhood Heartbeat message can be mutually sent, therefore, all neighbor nodes of second node all can individually execute step 201- step 204。
In the fault detection method of group system interior joint provided in an embodiment of the present invention, first node judges when default Interior the first heartbeat message for whether receiving second node transmission, wherein first node is the neighbor node of second node, the One heartbeat message is the heartbeat message that second node is concurrently sent to each neighbor node of second node, second node All neighbor nodes number be two or more;The preset time is greater than or equal to a heart beat cycle, and less than two hearts Hop cycle;First node inquires other neighbours section of the second node in the case where itself does not receive the first heartbeat message Whether point receives the first heartbeat message, and does not also receive first heart in other neighbor nodes for determining the second node In the case where jumping message, determine that failure has occurred in second node.Since preset time is greater than or equal to a heart beat cycle, and it is small In two heart beat cycles, so avoid needs in the prior art when carrying out fault detection using technical solution provided by the invention The phenomenon that whether egress breaks down could be detected by multiple heart beat cycles, shorten the period of fault detection, thus Improve the efficiency of node failure detection.
Fig. 5 is the flow diagram of the fault detection method embodiment two of group system interior joint provided by the invention.? On the basis of embodiment illustrated in fig. 2, after determining that second node breaks down to first node, each node redefines neighbours' section The embodiment of point, elaborates.As shown in figure 5, the method for the present embodiment may include:
Step 501, first node generate the first vote information, and receive the second ballot that other each neighbor nodes are sent Information, the first vote information include the corresponding node identification of node of first node election;Second vote information includes sending the The corresponding node identification of node of the neighbor node election of two vote informations.
In the present embodiment, after the neighbor node of second node determines that second node breaks down, all neighbours Node is occupied to be required to recalculate respective neighbor node.For purposes of illustration only, any one neighbour of second node can be saved Point is used as first node, and first node needs to generate the first vote information, elects in first vote information comprising first node The corresponding node identification of node and ballot foundation.In addition, first node will also receive what other each neighbor nodes were sent Second vote information includes the corresponding node of node for sending the neighbor node election of the second vote information in the second vote information Mark and ballot foundation.In practical applications, ballot is according to related with many factors, such as: loading condition, node serial number Size, nodal cache newness degree and meshed network bandwidth etc., as: first node can be by judging which node is held The load of load is minimum, and will load the corresponding node identification carrying of the smallest node and be sent to other in the first vote information Neighbor node.Likewise, other neighbor nodes can also use similar mode, the second vote information is sent to first node.
Step 502, first node according in the first vote information node identification and other each neighbor nodes send Node identification in second vote information counts the ballot quantity that each node in all nodes elected obtains, and will The most node of quantity of voting is as third node;Third node is for substitution second node and concurrently to the institute of third node There is neighbor node to send the node of heartbeat message;All neighbor nodes of third node include the neighbor node of third node itself With the neighbor node of second node.
In the present embodiment, first node is after receiving the second vote information that other each neighbor nodes are sent, root It, can be true according to the node identification in the node identification in the first vote information of itself generation and the second vote information received Make third node.It, can be according to the section carried in the first vote information and the second vote information during concrete implementation Point identification counts the ballot quantity that each node obtains in all nodes elected, and will obtain by way of vote by ballot The most node of the quantity that must vote is as third node.Third node is used to take over neighbours' section of the second node to break down Incidence relation between point, namely adapter tube second node and other nodes, therefore, third node will substitute second node and simultaneously Row ground sends the node of heartbeat message to all neighbor nodes of third node, wherein all neighbor nodes of third node remove It further include the neighbor node of second node except neighbor node including third node itself.
Step 503, first node according in the neighbor node and other neighbor nodes of third node in addition to third node Node, redefine the neighbor node of first node.
In the present embodiment, all neighbor nodes of second node determined by way of vote by ballot third node it Afterwards, if first node is third node, first node will take over the neighbouring relations of second node, other neighbor nodes can root According to the neighbouring relations after the neighbor node of first node adapter tube second node, respective neighbor node is determined again through calculating; If first node is not third node, after first node will redefine out neighbouring relations to third node, according to third Node in the neighbor node of node and other neighbor nodes in addition to third node, redefines the neighbor node of itself.
For example, Fig. 6 A is the schematic diagram that neighbouring relations between the front nodal point of node failure are detected in group system, Fig. 6 B is the schematic diagram that neighbouring relations between node are redefined after detecting node failure in group system.Such as Fig. 6 A institute Showing, it is assumed that node E is second node, and node A is first node, after first node A determines that second node E breaks down, First node A will generate the first vote information, and the second vote information that receiving node X, D, C and G are sent respectively, first node A determines third node according to the node identification in the node identification and the second vote information in the first vote information, so that the Three nodes substitute second node and concurrently send heartbeat message to all neighbor nodes of third node.As shown in Figure 6B, if By vote by ballot, determine that first node A is third node, then substituted by first node A second node and concurrently to All neighbor nodes of first node A send heartbeat message, at this point, first node A need through other neighbor nodes X, D, C and G redefines the neighbor node of oneself, and nodes X, D, C and G etc. after first nodes A determines oneself neighbor node, Respective neighbor node is redefined according to the neighbor node that first node A is determined.
The fault detection method of group system interior joint provided in an embodiment of the present invention, first node judge in preset time Inside whether receive second node transmission the first heartbeat message, wherein first node be second node neighbor node, first Heartbeat message is the heartbeat message that second node is concurrently sent to each neighbor node of second node, second node The number of all neighbor nodes is two or more;The preset time is greater than or equal to a heart beat cycle, and less than two heartbeats Period;First node inquires other neighbor nodes of the second node in the case where itself does not receive the first heartbeat message The first heartbeat message whether is received, and does not also receive first heartbeat in other neighbor nodes for determining the second node In the case where message, determine that failure has occurred in second node.Since preset time is greater than or equal to a heart beat cycle, and it is less than Two heart beat cycles, so avoiding and needing in the prior art when carrying out fault detection using technical solution provided by the invention The phenomenon that whether egress breaks down could be detected by multiple heart beat cycles, shortens the period of fault detection, to mention The high efficiency of node failure detection.In addition, redefining respective neighbours by after determining that first node breaks down Node, and then continue fault detection, improve the accuracy of fault detection.
Optionally, what is carried in the response message that first node is sent according to other each neighbor nodes received connects Receipts state, in the case where determining that at least one other neighbor node receives the first heartbeat message, described in first node determination The link not received between the node and second node of the first heartbeat message breaks down.
Specifically, first node is in the first heartbeat message for not receiving second node transmission, and to other each nodes Send request message, after inquiring whether other each neighbor nodes receive the first heartbeat message, if according to it is each other The response message that node is sent determines that at least one other neighbor node has received the first heartbeat message, then first node It can determine that second node is normal, and may be second node and first node and do not receive the first heartbeat report Failure has occurred in link between the node and first node of text, wherein the node for not receiving the first heartbeat message includes the The neighbor node of the first heartbeat message is not received in one node and other neighbor nodes.
The fault detection method of group system interior joint provided in an embodiment of the present invention, due to first node determine to In the case that few other neighbor nodes receive the first heartbeat message, first node determination does not receive the first heartbeat message Node and second node between link break down so that fault detection is more comprehensive.
Fig. 7 is the flow diagram of the fault detection method embodiment three of group system interior joint provided by the invention.This The method that inventive embodiments are related to is suitable for distributed cluster system.Still it is using computer as executing subject in the present embodiment Example is introduced.As shown in fig. 7, the method for the present embodiment may include:
Step 701, second node concurrently send the first heartbeat message, first segment to first node and other neighbor nodes Point is the neighbor node of second node;Other neighbor nodes be second node all neighbor nodes in addition to first node Node, the number of other neighbor nodes are more than one.
In the present embodiment, the information of second node node according to included in group system, according to cluster system Default rule determines itself all neighbor node in system, wherein first node is any one neighbour of second node Node, the neighbor node of second node are the node relevant with the second node.Second node determine it is all After neighbor node, the first heartbeat message concurrently can be sent to first node and other neighbor nodes.
Whether step 702, first node judgement receive the first heartbeat message within a preset time;Preset time be greater than or Equal to one heart beat cycle, and less than two heart beat cycles.
In the present embodiment, second node concurrently can send first to its all neighbor nodes according to heart beat cycle Heartbeat message, therefore, first node, which may determine that, is being greater than or equal to a heart beat cycle, and less than two heart beat cycles when Interior the first heartbeat message for whether receiving second node transmission.Such as: assuming that heart beat cycle is 5s, i.e. second node is every Every 5s, a heartbeat message, the first heartbeat sent for second node in 5s concurrently will be sent to its neighbor node Message, first node will judge be more than or equal to 5s, and the of second node transmission whether is received within the time less than 10s One heartbeat message.Wherein, heart beat cycle can rule of thumb or actual conditions are configured, for specifically taking for heart beat cycle Value, this is not restricted for the present embodiment.
In addition, second node periodically can send the first heartbeat message to first node by a physical network, But when due to carrying out fault detection based on single physical network, in network failure, such as: event occurs for management plane network Barrier, and when service plane network is normal, it can not often define second node in group system and failure or the second section has occurred Link between point and first node has occurred failure or second node and first node while failure has occurred, and leads as a result, Cause the testing result inaccuracy of failure.In order to solve this problem, it is preferable that at least two nets can also be passed through in the present embodiment Network sends the first heartbeat message, for example, the first heartbeat message can be sent by biplane, such as: management plane and industry Business plane can also send the first heartbeat message by three planes, such as: management plane, service plane and signaling plane.Using The mode of more physical networks sends the first heartbeat message, comes whether detection node breaks down, and the accuracy of detection can be improved. It needs to be illustrated, it is mutually isolated between at least two physical network if the quantity of physical network is at least two, When in this way can be to avoid certain equipment are shared due to existing between Multi net voting, if shared device breaks down, so as to cause node Between can not normal communication the phenomenon that, be conducive to improve detection accuracy.
Step 703, in the case where first node does not receive the first heartbeat message, first node is to other each neighbours Node sends request message respectively, and request message is for inquiring whether other each neighbor nodes receive the first heartbeat report Text.
In the present embodiment, if first node does not receive the first heartbeat report of second node transmission within a preset time Text can primarily determine that second node has been likely to occur failure.Since second node is concurrently to save to its all neighbours The first heartbeat message that point is sent, therefore, for first node by the neighbor node to second node, other in addition to itself are adjacent It occupies node and sends request message, to inquire whether other neighbor nodes receive the first heartbeat message of second node transmission.
Step 704, first node receive the response message for carrying reception state that other each neighbor nodes are sent, and connect Receipts state is for indicating whether to receive the first heartbeat message.
In the present embodiment, after other each neighbor nodes receive the request message that first node is sent, it is by itself The no reception state carrying for receiving the first heartbeat message is sent to first node in the response message.
Step 705, in first node according to the reception state carried in the response message received, determine other neighbours In the case that node does not receive the first heartbeat message, first node determines that second node breaks down.
In the present embodiment, each other neighbor node is after the request message for receiving first node transmission, all The response message for carrying reception state can be returned to first node, first node is according to other each neighbor nodes received The response message for carrying reception state sent, judges whether other neighbor nodes receive the first heartbeat message, is judging When other neighbor nodes are not received by the first heartbeat message of second node transmission out, that is, it can determine that second node Failure.
In the fault detection method of group system interior joint provided in an embodiment of the present invention, second node by concurrently to First node and other neighbor nodes send the first heartbeat message, and whether first node judgement receives second within a preset time The first heartbeat message that node is sent, wherein first node is the neighbor node of second node, and the first heartbeat message is the second section The heartbeat message that point is concurrently sent to each neighbor node of second node, the number of all neighbor nodes of second node Mesh is two or more;The preset time is greater than or equal to a heart beat cycle, and less than two heart beat cycles;First node is certainly In the case that body does not receive the first heartbeat message, inquire whether other neighbor nodes of the second node receive the first heartbeat Message, and in the case where determining that other neighbor nodes of the second node do not receive first heartbeat message yet, it determines Failure has occurred in second node.Since preset time is greater than or equal to a heart beat cycle, and less than two heart beat cycles, so When carrying out fault detection using technical solution provided by the invention, avoids and needed in the prior art through multiple heart beat cycles The phenomenon that whether egress breaks down can be detected, the period of fault detection is shortened, to improve node failure detection Efficiency.
Fig. 8 is the flow diagram of the fault detection method example IV of group system interior joint provided by the invention.? On the basis of embodiment illustrated in fig. 7, after determining that second node breaks down to first node, each node redefines neighbours' section The embodiment of point, elaborates.As shown in figure 8, the method for the present embodiment may include:
Step 801, first node generate the first vote information, and receive the second ballot that other each neighbor nodes are sent Information, the first vote information include the corresponding node identification of node of first node election;Second vote information includes sending the The corresponding node identification of node of the neighbor node election of two vote informations.
In the present embodiment, after the neighbor node of second node determines that second node breaks down, all neighbours Node is occupied to be required to recalculate respective neighbor node.For purposes of illustration only, any one neighbour of second node can be saved Point is used as first node, and first node needs to generate the first vote information, elects in first vote information comprising first node The corresponding node identification of node and ballot foundation.In addition, first node will also receive what other each neighbor nodes were sent Second vote information includes the corresponding section of node for sending the neighbor node election of the second vote information in second vote information Point identification and ballot foundation.In practical applications, ballot foundation is related with many factors, such as: loading condition, node serial number Size, nodal cache newness degree and meshed network bandwidth etc., as: which node institute first node can be by judging The load undertaken is minimum, and will load the corresponding node identification carrying of the smallest node and be sent to it in the first vote information His neighbor node.Likewise, other neighbor nodes can also use similar mode, the second vote information is sent to first segment Point.
Step 802, first node according in the first vote information node identification and other each neighbor nodes send Node identification in second vote information counts the ballot quantity that each node obtains in all nodes elected, and will throw The most node of poll amount is as third node;Third node is to substitute second node and concurrently to all of third node The node of neighbor node transmission heartbeat message;All neighbor nodes of third node include third node itself neighbor node and The neighbor node of second node.
In the present embodiment, first node is after receiving the second vote information that other each neighbor nodes are sent, root It, can be true according to the node identification in the node identification in the first vote information of itself generation and the second vote information received Make third node.It, can be according to the section carried in the first vote information and the second vote information during concrete implementation Point identification counts the ballot quantity that each node obtains in all nodes elected, and will obtain by way of vote by ballot The most node of the quantity that must vote is as third node.Third node is used to take over neighbours' section of the second node to break down Incidence relation between point, namely adapter tube second node and other nodes, therefore, third node will substitute second node and simultaneously Row ground sends heartbeat message to all neighbor nodes of third node, wherein all neighbor nodes of third node are in addition to including It further include the neighbor node of second node except the neighbor node of third node itself.
Step 803, first node according in the neighbor node and other neighbor nodes of third node in addition to third node Node, redefine the neighbor node of first node.
In the present embodiment, all neighbor nodes of second node determined by way of vote by ballot third node it Afterwards, if first node is third node, first node will take over the neighbouring relations of second node, other neighbor nodes can root According to the neighbouring relations after the neighbor node of first node adapter tube second node, respective neighbor node is determined again through calculating; If first node is not third node, after first node will redefine out neighbouring relations to third node, according to third Node in the neighbor node of node and other neighbor nodes in addition to third node, redefines the neighbor node of itself.
The fault detection method of group system interior joint provided in an embodiment of the present invention, second node pass through concurrently to the One node and other neighbor nodes send the first heartbeat message, and whether first node judgement receives the second section within a preset time The first heartbeat message that point is sent, wherein first node is the neighbor node of second node, and the first heartbeat message is second node The heartbeat message concurrently sent to each neighbor node of second node, the number of all neighbor nodes of second node For two or more;The preset time is greater than or equal to a heart beat cycle, and less than two heart beat cycles;First node is at itself In the case where not receiving the first heartbeat message, inquire whether other neighbor nodes of the second node receive the first heartbeat report Text, and in the case where determining that other neighbor nodes of the second node do not receive first heartbeat message yet, determine Failure has occurred in two nodes.Since preset time is greater than or equal to a heart beat cycle, and less than two heart beat cycles, so adopting When carrying out fault detection with technical solution provided by the invention, avoids and needed in the prior art through multiple heart beat cycle ability The phenomenon that whether detection egress breaks down, shortens the period of fault detection, to improve the effect of node failure detection Rate.In addition, redefining respective neighbor node by after determining that first node breaks down, and then continue event Barrier detection, improves the accuracy of fault detection.
Optionally, what is carried in the response message that first node is sent according to other each neighbor nodes received connects Receipts state, in the case where determining that at least one other neighbor node receives the first heartbeat message, described in first node determination The link not received between the node and second node of the first heartbeat message breaks down.
Specifically, first node is in the first heartbeat message for not receiving second node transmission, and to other each nodes Send request message, after inquiring whether other each neighbor nodes receive the first heartbeat message, if according to it is each other The response message that neighbor node is sent determines that at least one other neighbor node has received the first heartbeat message, then first Node can determine that second node is normal, and may be second node and first node and do not receive first heart Failure has occurred in the link jumped between the node and first node of message, wherein does not receive the node packet of the first heartbeat message Include the neighbor node for not receiving the first heartbeat message in first node and other neighbor nodes.
Optionally, the first node according in the neighbor node and other described neighbor nodes of the third node remove institute The node except third node is stated, the neighbor node of the first node is redefined.
The fault detection method of group system interior joint provided in an embodiment of the present invention, due to first node determine to In the case that few other neighbor nodes receive the first heartbeat message, first node determination does not receive the first heartbeat message Node and second node between link break down so that fault detection is more comprehensive.
Fig. 9 is the structural schematic diagram of the fault detection means embodiment one of group system interior joint of the present invention, such as Fig. 9 institute Show, the fault detection means 10 of group system interior joint provided in an embodiment of the present invention include judgment module 11, sending module 12, Receiving module 13, determining module 14 and generation module 15.
Wherein, judgment module 11 is used to judge whether receiving module 13 to receive second node transmission within a preset time First heartbeat message;The first node is the neighbor node of the second node, and first heartbeat message is described second The heartbeat message sent to each neighbor node of the second node to nodal parallel, all neighbours of the second node The number of node is occupied as two or more;The preset time is greater than or equal to a heart beat cycle, and less than two heart beat cycles; Judge that the receiving module 13 does not receive the first heartbeat message that the second node is sent in the judgment module 11 In the case of, sending module 12 is for other neighbours into all neighbor nodes of the second node in addition to the first node It occupies node and sends request message;The request message is for inquiring whether other described neighbor nodes receive first heartbeat Message;The receiving module 13 is also used to receive the response message for carrying reception state that other described neighbor nodes are sent, The reception state is for indicating whether to receive first heartbeat message;Determining module 14 is used for according to the receiving module The reception state carried in the response message that 13 other each described neighbor nodes received are sent, it is determined whether described Other neighbor nodes do not receive first heartbeat message;Other described neighbor nodes are determined in the determining module 14 In the case where not receiving first heartbeat message, the determining module 14 is also used to determine that event occurs for the second node Barrier.In the first node according to the reception state carried in the response message received, other described neighbours are determined In the case that node does not receive first heartbeat message, determining module 14 is for determining that event occurs for the second node Barrier.
The fault detection means of group system interior joint provided in an embodiment of the present invention, judgment module judge in preset time Interior receiving module whether receive second node transmission the first heartbeat message, the first heartbeat message be second node concurrently to The heartbeat message that each neighbor node of second node is sent, the number of all neighbor nodes of second node be two with On;The preset time is greater than or equal to a heart beat cycle, and less than two heart beat cycles;Receiving module is not receiving first In the case where heartbeat message, sending module sends request message to other neighbor nodes of the second node, to inquire other neighbours It occupies node and whether receives the first heartbeat message, and determine other neighbor nodes of the second node also not in determining module In the case where receiving first heartbeat message, determine that failure has occurred in second node.Since preset time is greater than or equal to one A heart beat cycle, and less than two heart beat cycles, so being avoided when carrying out fault detection using technical solution provided by the invention It needs the phenomenon that whether egress breaks down could be detected by multiple heart beat cycles in the prior art, shortens failure inspection The period of survey, to improve the efficiency of node failure detection.
Optionally, generation module 15 is also used to generate the first vote information, and first vote information includes described first The corresponding node identification of node of node election;
The receiving module 13 is also used to receive the second vote information that other each described neighbor nodes are sent, and described the Two vote informations include the corresponding node identification of node for sending the neighbor node election of second vote information;
The determining module 14 be also used to according in first vote information node identification and each described other are adjacent The node identification in the second vote information of node transmission is occupied, the ballot that each node obtains in all nodes elected is counted Quantity, and using the most node of quantity of voting as third node;The third node is to substitute the second node and parallel Ground sends the node of heartbeat message to all neighbor nodes of the third node;All neighbor node packets of the third node Include the neighbor node of the third node itself and the neighbor node of the second node.
Optionally, other each described neighbor nodes received in the determining module 14 according to the receiving module 13 The reception state that carries in the response message sent determines that at least one other described neighbor node receives described the In the case where one heartbeat message,
The determining module 14 is also used to determine the node and the second node for not receiving first heartbeat message Between link break down;The node for not receiving first heartbeat message include the first node and it is described its The node of first heartbeat message is not received in his neighbor node.
Optionally, the determining module 14 is also used to be saved according to the neighbor node of the third node and other described neighbours Node in point in addition to the third node, redefines the neighbor node of the first node.
The fault detection means of the group system interior joint of the present embodiment can be used for executing any embodiment of that present invention institute The technical solution of the fault detection method of the group system interior joint of offer, it is similar that the realization principle and technical effect are similar, herein not It repeats again.
Figure 10 is the structural schematic diagram of the fault detection system embodiment one of group system interior joint of the present invention, such as Figure 10 institute Show, the fault detection system 20 of group system interior joint provided in an embodiment of the present invention includes first node 21, second node 22 With other neighbor nodes 23, the first node 21 is the neighbor node of the second node 22, other described neighbor nodes 23 For the node in all neighbor nodes of the second node 22 in addition to the first node 21, other described neighbor nodes 23 Number be more than one.
Wherein, the second node 22 is for concurrently sending first to the first node and other described neighbor nodes Heartbeat message;The first node 21 is for judging whether receive first heartbeat message within a preset time;It is described pre- If the time is greater than or equal to a heart beat cycle, and less than two heart beat cycles;Described is not received in the first node In the case where one heartbeat message, the first node 21 is also used to send request respectively to other each described neighbor nodes and disappears Breath, the request message is for inquiring whether other each described neighbor nodes receive first heartbeat message;Described One node 21 is also used to receive the response message for carrying reception state that other each described neighbor nodes are sent, the reception State is for indicating whether to receive first heartbeat message;The first node according to receive it is each it is described other The reception state carried in the response message that neighbor node is sent determines that other described neighbor nodes do not receive institute In the case where stating the first heartbeat message, the first node 21 is also used to determine that the second node breaks down.
In the fault detection system of group system interior joint provided in an embodiment of the present invention, judgment module judges when default Interior receiving module whether receive second node transmission the first heartbeat message, the first heartbeat message be second node concurrently The heartbeat message sent to each neighbor node of second node, the number of all neighbor nodes of second node are two More than;The preset time is greater than or equal to a heart beat cycle, and less than two heart beat cycles;Receiving module is not receiving In the case where one heartbeat message, sending module sends request message to other neighbor nodes of the second node, to inquire other Whether neighbor node receives the first heartbeat message, and determines that other neighbor nodes of the second node are also equal in determining module In the case where not receiving first heartbeat message, determine that failure has occurred in second node.Since preset time is greater than or equal to One heart beat cycle, and less than two heart beat cycles, so being kept away when carrying out fault detection using technical solution provided by the invention Exempt to need that the phenomenon that whether egress breaks down could be detected by multiple heart beat cycles in the prior art, has shortened failure The period of detection, to improve the efficiency of node failure detection.
In the above-described embodiments, after the first node 21 determines that the second node breaks down, further includes: described First node 21 is also used to:
The first vote information is generated, and receives the second vote information that other each described neighbor nodes are sent, described the One vote information includes the corresponding node identification of node of the first node election, and second vote information includes sending institute State the corresponding node identification of node of the neighbor node election of the second vote information;
And according to second of node identification and each other neighbor nodes transmission in first vote information Node identification in vote information, counts the ballot quantity that each node obtains in all nodes for being elected, and by votes Most nodes is measured as third node, the third node is to substitute the second node and concurrently to the third section All neighbor nodes of point send the node of heartbeat message;All neighbor nodes of the third node include the third node The neighbor node of the neighbor node of itself and the second node.
In the above-described embodiments, the institute sent in the first node according to other each described neighbor nodes received The reception state carried in response message is stated, determines that at least one other described neighbor node receives the first heartbeat report In the case where text,
The first node 21 is also used to determine the node and the second node for not receiving first heartbeat message Between link break down;The node for not receiving first heartbeat message include the first node and it is described its The node of first heartbeat message is not received in his neighbor node.
In the above-described embodiments, the first node 21 be also used to according to the neighbor node of the third node and it is described its Node in his neighbor node in addition to the third node, redefines the neighbor node of the first node.
The above system embodiment accordingly can be used for executing the technical solution of embodiment of the method, realization principle and technology effect Seemingly, details are not described herein again for fruit.
Figure 11 is the structural schematic diagram of node embodiment one of the present invention, and as shown in figure 11, the node 600 of the present embodiment includes Processor 601, user interface 603, network interface 604 and memory 605, transmitter 606 and receiver 607, memory 605 can To include operating system 6051, application program 6052 etc..Processor 601 can be central processing unit (Central Processing Unit, CPU).Memory 605 is for storing executable instruction.Processor 601 can execute in memory 605 The executable instruction of storage.Wherein, receiver 607 is used to receive the first heartbeat message of second node transmission;The processor 601 for judging whether the receiver 607 receives the first heartbeat message that second node is sent within a preset time;It is described First heartbeat message is the heartbeat report that the second node is concurrently sent to each neighbor node of the second node Text, the number of all neighbor nodes of the second node are two or more;The preset time is greater than or equal to a heartbeat Period, and less than two heart beat cycles;Judge that the receiver 607 does not receive second section in the processor 601 In the case where the first heartbeat message that point is sent, transmitter 606 is used to remove institute into all neighbor nodes of the second node It states other neighbor nodes except first node and sends request message, the request message is for inquiring other described neighbor nodes Whether first heartbeat message is received, and the first node is the neighbor node of the second node;The receiver 607 It is also used to receive the response message for carrying reception state that other described neighbor nodes are sent, the reception state is for indicating Whether first heartbeat message is received;The processor 601 is used to be received according to the receiver 607 each described The reception state carried in the response message that other neighbor nodes are sent, it is determined whether other described neighbor nodes do not connect Receive first heartbeat message;Determine that other described neighbor nodes do not receive described first in the processor 601 In the case where heartbeat message, the processor 601 is also used to determine that the second node breaks down.
Node provided in this embodiment can be used for executing group system interior joint provided by any embodiment of the invention Fault detection method technical solution, it is similar that the realization principle and technical effect are similar, and details are not described herein again.
Optionally, the processor 601 is also used to generate the first vote information, and first vote information includes described The corresponding node identification of node of one node election;
The receiver 607 is also used to receive the second vote information that other each described neighbor nodes are sent, and described the Two vote informations include the corresponding node identification of node for sending the neighbor node election of second vote information;
The processor 601 be also used to according in first vote information node identification and other each described neighbours The node identification in the second vote information that node is sent, counts the votes that each node obtains in all nodes elected Amount, and using the most node of quantity of voting as third node;The third node is to substitute the second node and concurrently The node of heartbeat message is sent to all neighbor nodes of the third node;All neighbor nodes of the third node include The neighbor node of the third node itself and the neighbor node of the second node.
Optionally, each other neighbor nodes hair received in the processor 601 according to the receiver 607 The reception state carried in the response message sent determines that at least one other described neighbor node receives described first In the case where heartbeat message, the processor 601 be also used to determine do not receive the node of first heartbeat message with it is described Link between second node breaks down;The node for not receiving first heartbeat message includes the first node With the node for not receiving first heartbeat message in other described neighbor nodes.
Optionally, the processor 601 is also used to be saved according to the neighbor node of the third node and other described neighbours Node in point in addition to the third node, redefines the neighbor node of the first node.
Node provided in this embodiment can be used for executing group system interior joint provided by any embodiment of the invention Fault detection method technical solution, it is similar that the realization principle and technical effect are similar, and details are not described herein again.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey When being executed, execution includes the steps that above-mentioned each method embodiment to sequence;And storage medium above-mentioned include: ROM, RAM, magnetic disk or The various media that can store program code such as person's CD.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims (12)

1. a kind of fault detection method of group system interior joint characterized by comprising
Whether first node judgement receives the first heartbeat message of second node transmission within a preset time;The first node For the neighbor node of the second node, first heartbeat message is the second node concurrently to the second node The heartbeat message that each neighbor node is sent, the number of all neighbor nodes of the second node are two or more;Institute It states preset time and is greater than or equal to a heart beat cycle, and less than two heart beat cycles;
In the case where the first node does not receive the first heartbeat message that the second node is sent, the first node Other neighbor nodes into all neighbor nodes of the second node in addition to the first node send request message, institute Request message is stated for inquiring whether other described neighbor nodes receive first heartbeat message;
The first node receives the response message for carrying reception state that other described neighbor nodes are sent, the reception shape State is for indicating whether to receive first heartbeat message;
It is carried in the response message that the first node is sent according to other each described neighbor nodes received Reception state, in the case where determining that other described neighbor nodes do not receive first heartbeat message, the first segment Point determines that the second node breaks down.
2. the method according to claim 1, wherein the first node determines that the second node breaks down Later, further includes:
The first node generates the first vote information, and receives the second ballot letter that other each described neighbor nodes are sent Breath, first vote information include the corresponding node identification of node of the first node election;Second vote information The corresponding node identification of node including sending the neighbor node election of second vote information;
The first node according in first vote information node identification and each described other neighbor nodes send Node identification in second vote information counts the ballot quantity that each node obtains in all nodes elected, and will throw The most node of poll amount is as third node;The third node is to substitute the second node and concurrently to described the All neighbor nodes of three nodes send the node of heartbeat message;All neighbor nodes of the third node include the third The neighbor node of the neighbor node of node itself and the second node.
3. method according to claim 1 or 2, which is characterized in that further include:
It is carried in the response message that the first node is sent according to other each described neighbor nodes received Reception state, in the case where determining that at least one other described neighbor node receives first heartbeat message, described the One node determines that the link between the node and the second node that do not receive first heartbeat message breaks down;It is described The node for not receiving first heartbeat message includes not receiving institute in the first node and other described neighbor nodes State the node of the first heartbeat message.
4. a kind of fault detection method of group system interior joint, which is characterized in that the described method includes:
Second node concurrently sends the first heartbeat message to first node and other neighbor nodes;The first node is described The neighbor node of second node, other described neighbor nodes are that the first segment is removed in all neighbor nodes of the second node Node except point, the number of other neighbor nodes are more than one;
Whether the first node judgement receives first heartbeat message within a preset time;The preset time be greater than or Equal to one heart beat cycle, and less than two heart beat cycles;
In the case where the first node does not receive first heartbeat message, the first node to it is each it is described other Neighbor node sends request message respectively, and the request message is for inquiring whether other each described neighbor nodes receive institute State the first heartbeat message;
The first node receives the response message for carrying reception state that other each described neighbor nodes are sent, described to connect Receipts state is for indicating whether to receive first heartbeat message;
In the first node according to the reception state carried in the response message received, other described neighbours are determined In the case that node does not receive first heartbeat message, the first node determines that the second node breaks down.
5. according to the method described in claim 4, it is characterized in that, the first node determines that the second node breaks down Later, further includes:
The first node generates the first vote information, and receives the second ballot letter that other each described neighbor nodes are sent Breath, first vote information include the corresponding node identification of node of the first node election;Second vote information The corresponding node identification of node including sending the neighbor node election of second vote information;
The first node according in first vote information node identification and each described other neighbor nodes send Node identification in second vote information counts the ballot quantity that each node obtains in all nodes elected, and will throw The most node of poll amount is as third node;The third node is to substitute the second node and concurrently to described the All neighbor nodes of three nodes send the node of heartbeat message;All neighbor nodes of the third node include the third The neighbor node of the neighbor node of node itself and the second node.
6. method according to claim 4 or 5, which is characterized in that further include:
It is carried in the response message that the first node is sent according to other each described neighbor nodes received Reception state, it is in the case where determining that at least one other described neighbor node receives first heartbeat message, then described First node determines that the link between the node and the second node that do not receive first heartbeat message breaks down;Institute Stating and not receiving the node of first heartbeat message includes not receiving in the first node and other described neighbor nodes The node of first heartbeat message.
7. a kind of fault detection means of group system interior joint characterized by comprising
Judgment module, for judging whether receiving module receives the first heartbeat report that second node is sent within a preset time Text;The fault detection means be the second node neighbor node, first heartbeat message be the second node simultaneously The heartbeat message that row ground is sent to each neighbor node of the second node, all neighbor nodes of the second node Number be two or more;The preset time is greater than or equal to a heart beat cycle, and less than two heart beat cycles;
Judge that the receiving module does not receive the first heartbeat message that the second node is sent in the judgment module In the case of,
Sending module, for other neighbours in all neighbor nodes to the second node in addition to the fault detection means It occupies node and sends request message, the request message is for inquiring whether other described neighbor nodes receive first heartbeat Message;
The receiving module is also used to receive the response message for carrying reception state that other described neighbor nodes are sent, institute Reception state is stated for indicating whether to receive first heartbeat message;
Determining module, the response that other each described neighbor nodes for being received according to the receiving module are sent disappear The reception state carried in breath, it is determined whether other described neighbor nodes do not receive first heartbeat message;
In the case where the determining module determines that other described neighbor nodes do not receive first heartbeat message, institute Determining module is stated, is also used to determine that the second node breaks down.
8. device according to claim 7, which is characterized in that determine that event occurs for the second node in the determining module After barrier, further includes:
Generation module, is also used to generate the first vote information, and first vote information includes the fault detection means election The corresponding node identification of node;
The receiving module is also used to receive the second vote information that other each described neighbor nodes are sent, and described second throws Ticket information includes the corresponding node identification of node for sending the neighbor node election of second vote information;
The determining module is also used to according to the node identification and other each described neighbor nodes in first vote information The node identification in the second vote information sent, counts the ballot quantity that each node obtains in all nodes elected, And using the most node of quantity of voting as third node;The third node be substitute the second node and concurrently to All neighbor nodes of the third node send the node of heartbeat message;All neighbor nodes of the third node include institute State the neighbor node of third node itself and the neighbor node of the second node.
9. device according to claim 7 or 8, it is characterised in that:
In the response that the determining module is sent according to other each described neighbor nodes that the receiving module receives The reception state carried in message determines that at least one other described neighbor node receives the feelings of first heartbeat message Under condition,
The determining module is also used to determine between the node and the second node that do not receive first heartbeat message Link breaks down;The node for not receiving first heartbeat message include the fault detection means and it is described other The node of first heartbeat message is not received in neighbor node.
10. a kind of fault detection system of group system interior joint, which is characterized in that including first node, second node and its His neighbor node, the first node are the neighbor node of the second node, other described neighbor nodes are second section Node in all neighbor nodes of point in addition to the first node, the number of other neighbor nodes are more than one, Include:
The second node, for concurrently sending the first heartbeat message to the first node and other described neighbor nodes;
The first node, for judging whether receive first heartbeat message within a preset time;The preset time More than or equal to one heart beat cycle, and less than two heart beat cycles;
In the case where the first node does not receive first heartbeat message, the first node is also used to each institute It states other neighbor nodes and sends request message respectively, the request message is for inquiring whether other each described neighbor nodes connect Receive first heartbeat message;And the first node is also used to receive taking for each other neighbor nodes transmission Response message with reception state, the reception state is for indicating whether to receive first heartbeat message;
It is carried in the response message that the first node is sent according to other each described neighbor nodes received Reception state, in the case where determining that other described neighbor nodes do not receive first heartbeat message, the first segment Point is also used to determine that the second node breaks down.
11. system according to claim 10, which is characterized in that the first node determines that event occurs for the second node After barrier, further includes:
The first node is also used to:
The first vote information is generated, and receives the second vote information that other each described neighbor nodes are sent, described first throws Ticket information includes the corresponding node identification of node of first node election, and second vote information includes sending described the The corresponding node identification of node of the neighbor node election of two vote informations;
And the second ballot according to node identification and each other neighbor nodes transmission in first vote information Node identification in information counts the ballot quantity that each node obtains in all nodes elected, and will vote quantity most For more nodes as third node, the third node is to substitute the second node and concurrently to the third node All neighbor nodes send the node of heartbeat message;All neighbor nodes of the third node include the third node itself Neighbor node and the second node neighbor node.
12. system described in 0 or 11 according to claim 1, it is characterised in that:
It is carried in the response message that the first node is sent according to other each described neighbor nodes received Reception state, in the case where determining that at least one other described neighbor node receives first heartbeat message,
The first node is also used to determine between the node and the second node that do not receive first heartbeat message Link breaks down;The node for not receiving first heartbeat message includes the first node and other described neighbours The node of first heartbeat message is not received in node.
CN201510306800.0A 2015-06-05 2015-06-05 The fault detection method and device of group system interior joint Active CN106301853B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510306800.0A CN106301853B (en) 2015-06-05 2015-06-05 The fault detection method and device of group system interior joint
PCT/CN2016/073606 WO2016192408A1 (en) 2015-06-05 2016-02-05 Fault detection method and apparatus for node in cluster system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510306800.0A CN106301853B (en) 2015-06-05 2015-06-05 The fault detection method and device of group system interior joint

Publications (2)

Publication Number Publication Date
CN106301853A CN106301853A (en) 2017-01-04
CN106301853B true CN106301853B (en) 2019-06-18

Family

ID=57440098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510306800.0A Active CN106301853B (en) 2015-06-05 2015-06-05 The fault detection method and device of group system interior joint

Country Status (2)

Country Link
CN (1) CN106301853B (en)
WO (1) WO2016192408A1 (en)

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108337274A (en) * 2017-01-19 2018-07-27 贵州白山云科技有限公司 A kind of message distributing method and system
CN112468372B (en) * 2017-04-10 2023-10-13 华为技术有限公司 Method and device for detecting equipment state in power line communication network
WO2018214106A1 (en) * 2017-05-25 2018-11-29 深圳市伊特利网络科技有限公司 Update method and system for network connection list
WO2019000954A1 (en) * 2017-06-30 2019-01-03 中兴通讯股份有限公司 Method, device and system for monitoring node survival state
CN109428740B (en) * 2017-08-21 2020-09-08 华为技术有限公司 Method and device for recovering equipment failure
US10547499B2 (en) 2017-09-04 2020-01-28 International Business Machines Corporation Software defined failure detection of many nodes
CN109525408B (en) * 2017-09-18 2021-12-21 杭州海康威视系统技术有限公司 Equipment exception handling method and device and cloud storage system
CN107566219B (en) * 2017-09-27 2020-09-18 华为技术有限公司 Fault diagnosis method applied to cluster system, node equipment and computer equipment
CN107967291B (en) * 2017-10-12 2019-08-13 腾讯科技(深圳)有限公司 Journal entries clone method, device, computer equipment and storage medium
CN109714183A (en) * 2017-10-26 2019-05-03 阿里巴巴集团控股有限公司 Data processing method and device in a kind of cluster
CN107864486A (en) * 2017-12-26 2018-03-30 杭州迪普科技股份有限公司 A kind of offline AP detection methods and device
CN108092857A (en) * 2018-01-15 2018-05-29 郑州云海信息技术有限公司 A kind of distributed system heartbeat detecting method and relevant apparatus
CN110324166B (en) * 2018-03-31 2020-12-15 华为技术有限公司 Method, device and system for synchronizing target information in multiple nodes
CN108683561B (en) * 2018-05-16 2020-10-02 杭州迪普科技股份有限公司 Site state detection method and device
CN109302445B (en) * 2018-08-14 2021-10-12 新华三云计算技术有限公司 Host node state determination method and device, host node and storage medium
CN109218141A (en) * 2018-11-20 2019-01-15 郑州云海信息技术有限公司 A kind of malfunctioning node detection method and relevant apparatus
CN109873719B (en) * 2019-02-03 2019-12-31 华为技术有限公司 Fault detection method and device
CN112219373B (en) 2019-04-29 2022-08-30 华海通信技术有限公司 Submarine cable fault judgment method and device
CN110380934B (en) * 2019-07-23 2021-11-02 南京航空航天大学 Distributed redundancy system heartbeat detection method
CN111181763A (en) * 2019-11-28 2020-05-19 泰康保险集团股份有限公司 Network fault reporting method and device
CN112911520B (en) * 2019-12-04 2022-05-31 哈尔滨海能达科技有限公司 Method, device and storage medium for determining master node in ad hoc network
CN111586110B (en) * 2020-04-22 2021-03-19 广州锦行网络科技有限公司 Optimization processing method for raft in point-to-point fault
CN112398905B (en) * 2020-09-28 2022-05-31 联想(北京)有限公司 Node and information synchronization method
CN112988463B (en) * 2021-02-23 2022-08-30 新华三大数据技术有限公司 Fault node isolation method and device
CN113542052A (en) * 2021-06-07 2021-10-22 新华三信息技术有限公司 Node fault determination method and device and server
CN113783735A (en) * 2021-09-24 2021-12-10 小红书科技有限公司 Method, device, equipment and medium for identifying fault node in Redis cluster
CN113923105B (en) * 2021-12-13 2022-04-22 中机联科技(广东)有限公司 Internet of things equipment fault monitoring method and system based on block chain
CN115102886A (en) * 2022-06-21 2022-09-23 上海驻云信息科技有限公司 Task scheduling method and device for multiple acquisition clients
CN116260705B (en) * 2022-12-21 2023-09-15 广西壮族自治区自然资源信息中心 Geographic information distributed cluster fault processing method, device, medium and equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101159536A (en) * 2007-10-30 2008-04-09 中兴通讯股份有限公司 Media gateway node condition synchronizing method in dual-home network
CN102204169A (en) * 2011-05-12 2011-09-28 华为技术有限公司 Fault detection method, route node and system
CN102612110A (en) * 2012-03-02 2012-07-25 浙江大学 Distributive self-organized routing method in electric carrier wave illumination control system
CN102821011A (en) * 2012-08-28 2012-12-12 北京星网锐捷网络技术有限公司 Opposite terminal state detection method, device and equipment
CN103297396A (en) * 2012-02-28 2013-09-11 国际商业机器公司 Management failure transferring device and method in cluster system
CN103916275A (en) * 2014-03-31 2014-07-09 杭州华三通信技术有限公司 BFD detection device and method
CN104283711A (en) * 2014-09-29 2015-01-14 中国联合网络通信集团有限公司 Fault detection method based on BFD, nodes and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070294596A1 (en) * 2006-05-22 2007-12-20 Gissel Thomas R Inter-tier failure detection using central aggregation point
CN102752143B (en) * 2012-07-05 2015-08-19 杭州华三通信技术有限公司 The BFD detection method of MPLS TE bidirectional tunnel and routing device
CN104104570B (en) * 2013-04-07 2018-09-04 新华三技术有限公司 Aggregation processing method in IRF systems and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101159536A (en) * 2007-10-30 2008-04-09 中兴通讯股份有限公司 Media gateway node condition synchronizing method in dual-home network
CN102204169A (en) * 2011-05-12 2011-09-28 华为技术有限公司 Fault detection method, route node and system
CN103297396A (en) * 2012-02-28 2013-09-11 国际商业机器公司 Management failure transferring device and method in cluster system
CN102612110A (en) * 2012-03-02 2012-07-25 浙江大学 Distributive self-organized routing method in electric carrier wave illumination control system
CN102821011A (en) * 2012-08-28 2012-12-12 北京星网锐捷网络技术有限公司 Opposite terminal state detection method, device and equipment
CN103916275A (en) * 2014-03-31 2014-07-09 杭州华三通信技术有限公司 BFD detection device and method
CN104283711A (en) * 2014-09-29 2015-01-14 中国联合网络通信集团有限公司 Fault detection method based on BFD, nodes and system

Also Published As

Publication number Publication date
CN106301853A (en) 2017-01-04
WO2016192408A1 (en) 2016-12-08

Similar Documents

Publication Publication Date Title
CN106301853B (en) The fault detection method and device of group system interior joint
CN106170971B (en) Arbitration process method, arbitration storage device and system after a kind of cluster fissure
US9015362B2 (en) Monitoring network performance and detecting network faults using round trip transmission times
CN101257388B (en) Lawless exterior joint detecting method, apparatus and system
US9009305B1 (en) Network host inference system
CN109976935A (en) Micro services framework, micro services node and its fusing restoration methods, device
CN104301140B (en) Service request response method, device and system
CN111200526B (en) Monitoring system and method of network equipment
CN105379201B (en) Method, controller and the failover interchanger of path switching
CN106658381B (en) A kind of landslide method for early warning based on wireless sensor network
CN111181800B (en) Test data processing method and device, electronic equipment and storage medium
CN109728981A (en) A kind of cloud platform fault monitoring method and device
CN109039795A (en) A kind of Cloud Server resource monitoring method and system
CN109921925A (en) A kind of dial testing method and device
CN107294743B (en) Network path detection method, controller and network equipment
CN110196780B (en) Method, device, storage medium and electronic device for determining server state
CN107426051B (en) The monitoring method of the working condition of distributed cluster system interior joint, apparatus and system
CN109542627A (en) Node switching method, device, supervisor, node device and distributed system
CN109657005A (en) A kind of data cache method of distributed cluster system, device and equipment
CN107231344B (en) Flow cleaning method and device
CN104125590A (en) Link fault diagnosis device and method thereof
CN108153654A (en) A kind of log collecting method and device
CN114422412B (en) Equipment detection method and device and communication equipment
CN109614289A (en) A kind of memory node monitoring method, system, equipment and computer storage medium
CN110673973B (en) Abnormality determination method and device for application programming interface API

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant