CN107566219A - Method for diagnosing faults, node device and computer equipment applied to group system - Google Patents

Method for diagnosing faults, node device and computer equipment applied to group system Download PDF

Info

Publication number
CN107566219A
CN107566219A CN201710890513.8A CN201710890513A CN107566219A CN 107566219 A CN107566219 A CN 107566219A CN 201710890513 A CN201710890513 A CN 201710890513A CN 107566219 A CN107566219 A CN 107566219A
Authority
CN
China
Prior art keywords
packet
equipment
section point
node
sent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710890513.8A
Other languages
Chinese (zh)
Other versions
CN107566219B (en
Inventor
胡琳
伍湘平
黄凯耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201710890513.8A priority Critical patent/CN107566219B/en
Publication of CN107566219A publication Critical patent/CN107566219A/en
Application granted granted Critical
Publication of CN107566219B publication Critical patent/CN107566219B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

This application provides a kind of method for diagnosing faults, node device and computer equipment applied to group system, it is possible to increase the accuracy of node device fault diagnosis.This method includes:Determine that first node equipment does not receive the heartbeat packet of section point equipment transmission in the first duration since the first moment;The quantity for the packet that the section point equipment that at least one node device receives in the second duration since the second moment is sent is inquired about, the packet includes heartbeat packet and business packet;The quantity of the packet sent according to the section point equipment received in second duration since second moment inquired, diagnosis whether be the section point equipment Agent failure, the Agent is configured to indicate the program that the section point equipment sends the heartbeat packet.

Description

Method for diagnosing faults, node device and computer equipment applied to group system
Technical field
The application is related to virtual machine field, and more particularly, to a kind of fault diagnosis side applied to group system Method, node device and computer equipment.
Background technology
In the case where network function virtualizes (Network Function Virtualization, NFV) framework, following cluster Exponential increase will be presented in the scale of system, and operation layer is to the more aobvious protrusion of problem of management of cluster, and the management to group system is more Tool challenge.
In large-scale cluster, if a certain node device is delayed, extremely (operating system nucleus is hard for machine failure or system extension Deadlock or soft deadlock) failure, node device internal applications (Application, APP) failure is directly resulted in, and APP failures are then Influence business.How in virtual machine node device fails, the equipment fault of checkout and diagnosis virtual machine node, is avoided exactly Influence business pot life, it is a urgent problem to be solved.
The content of the invention
This application provides a kind of method for diagnosing faults, node device and computer equipment applied to group system, energy Enough improve the accuracy of node device fault diagnosis.
In a first aspect, the embodiment of the present application provides a kind of method for diagnosing faults applied to group system, this method bag Include:
Determine that the first node equipment in the group system does not receive the collection in the first duration since the first moment The heartbeat packet that section point equipment in group's system is sent, the packet without business information that the heartbeat packet sends for periodicity;
Inquire about what at least one node device in the second duration since the second moment in the group system received The quantity for the packet that the section point equipment is sent, the packet include the heartbeat packet and business packet, and the business packet is to need Carry out the packet with business information of transmission during business contact;
Sent according to the section point equipment received in second duration since second moment inquired The packet quantity, diagnosis whether be the section point equipment Agent failure, the Agent is configured to indicate The section point equipment sends the program of the heartbeat packet.
Alternatively, at least one node device includes the first node equipment.
Alternatively, the group system includes a host node (Master) and multiple from node (Slave).
Alternatively, should be performed applied to the method for diagnosing faults of group system by Master nodes.
Therefore, in the method for diagnosing faults applied to group system of the embodiment of the present application, it is determined that in group system First node equipment when not receiving the heartbeat packet of section point equipment transmission in the first duration since the first moment, look into Ask the packet (heartbeat packet and business packet) that the section point equipment received in the second duration since the second moment is sent Quantity, avoid merely according to heartbeat packet come determine the failure of section point and caused by failure mistaken diagnosis, and then, can accurately examine It is disconnected whether be section point equipment Agent failure.
Alternatively, in a kind of implementation of first aspect, this method also includes:
At the time of when receiving the packet every time by the interior nuclear equipment of native operating sys-tern record and send the data The mark of the node device of bag;
At least one node device of the inquiry in the second duration since the second moment in the group system receives The section point equipment send the packet quantity, including:
The section point equipment hair received is inquired about in second duration since second moment of local record The quantity of the packet sent.
Therefore, in the method for diagnosing faults applied to group system of the embodiment of the present application, native operating sys-tern it is interior Nuclear equipment records at the time of when receiving packet every time and sent the mark of the node device of packet, it is thus possible to inquire about The section point equipment that receives is sent in the second duration since the second moment of local record packet (heartbeat packet and Business packet) quantity, and then, can with Accurate Diagnosis whether be section point equipment Agent failure.
Alternatively, in a kind of implementation of first aspect, the inquiry should in the second duration since the second moment The quantity for the packet that the section point equipment that at least one node device in group system receives is sent, including:
The first instruction message is sent to the 3rd node device, the 3rd node device belongs at least one node device, The first instruction message includes the mark of the section point equipment, and the first instruction message is used to indicate that the 3rd node is set What the section point equipment received in second duration since second moment for future reference for asking local record was sent should The quantity of packet;
The first response message of the 3rd node device feedback is received, first response message includes the 3rd node device The quantity for the packet that the section point equipment received in second duration since second moment is sent.
It should be understood that the 3rd node device receives the packet every time by the interior nuclear equipment record of native operating sys-tern When at the time of and send the packet node device mark.
Alternatively, the 3rd node device can be the first node equipment.
Therefore, in the method for diagnosing faults applied to group system of the embodiment of the present application, the 3rd node device passes through The interior nuclear equipment of native operating sys-tern records at the time of when receiving packet every time and sent the mark of the node device of packet Know, it is thus possible to the section point equipment received to the inquiry of the 3rd node device in the second duration since the second moment The quantity of the packet of transmission, and then, can with Accurate Diagnosis whether be section point equipment Agent failure.
Alternatively, in a kind of implementation of first aspect, the basis inquire since second moment should The quantity for the packet that the section point equipment received in second duration is sent, whether diagnosis is the section point equipment Agent failure, including:
What the section point equipment received in second duration since second moment inquired was sent When the quantity of the packet is zero, the Agent failure of the non-section point equipment is diagnosed as;Or
What the section point equipment received in second duration since second moment inquired was sent When the quantity of the packet is more than zero, the Agent failure of the section point equipment is diagnosed as.
Alternatively, in a kind of implementation of first aspect, in the Agent failure of the non-section point equipment, The section point equipment can not send the heartbeat packet and the business packet;, should in the Agent failure of the section point equipment Section point equipment normally sends the business packet, but can not send the heartbeat packet.
Alternatively, in a kind of implementation of first aspect, first node equipment in the determination group system from The heartbeat packet that the section point equipment in the group system is sent is not received in the first duration that first moment started, including:
Receive the first node equipment transmission first message, the first message indicate the first node equipment from this first The heartbeat packet of section point equipment transmission is not received in first duration that moment starts;
According to the first message, determine that the first node equipment does not receive in first duration since first moment The heartbeat packet sent to the section point equipment.
Alternatively, in a kind of implementation of first aspect, this method also includes:
The second message is sent to the first node equipment, second message indicates that the failure of the section point equipment is somebody's turn to do to be non- The Agent failure of section point equipment.
Alternatively, in a kind of implementation of first aspect, second moment was later than or equal to first moment, should The finish time of second duration is later than or the finish time equal to first duration.
Alternatively, in a kind of implementation of first aspect, this method also includes:
The failure that the section point equipment that service controller into the group system reports diagnosis to obtain occurs, so that The service controller is controlled to the business of the section point equipment.
Therefore, in the method for diagnosing faults applied to group system of the embodiment of the present application, into the group system Service controller reports the fail result for the section point equipment that diagnosis obtains, so that service controller is to section point equipment Business be controlled, realize effective management to group system.
Second aspect, the embodiment of the present application provide a kind of method for diagnosing faults applied to group system, this method bag Include:
The first instruction message that the first node equipment in the group system is sent is received, the first instruction message includes should The mark of section point equipment in group system, and the first instruction message be used to indicating inquiry local record from first The quantity for the packet that the section point equipment received in the first duration that moment starts is sent, wherein, the packet bag Heartbeat packet and business packet are included, the heartbeat packet is the packet without business information periodically sent, and the business packet is to need to carry out The packet with business information sent during business contact;
According to the first instruction message, inquire about in first duration since first moment of local record and receive The quantity of the packet that sends of the section point equipment, and to the packet that the first node equipment feedback query arrives Quantity so that the first node equipment is judged the failure of the section point equipment.
Alternatively, in a kind of implementation of second aspect, according to the first instruction message, local record is inquired about Before the quantity for the packet that the section point equipment received in first duration since first moment is sent, This method also includes:
At the time of when receiving the packet every time by the interior nuclear equipment of native operating sys-tern record and send the data The mark of the node device of bag.
Therefore, the method for diagnosing faults applied to group system in the embodiment of the present application, native operating sys-tern it is interior Nuclear equipment records at the time of when receiving packet every time and sent the mark of the node device of packet, is receiving first segment After the first instruction message that point device is sent, inquire about in the first duration since the first moment of local record what is received The quantity for the packet that section point equipment is sent, so that first node equipment is judged the failure of section point equipment, And then can be with the failure of a certain node device in Accurate Diagnosis group system.
The third aspect, the embodiment of the present application provide a kind of method for diagnosing faults applied to group system, this method bag Include:
It is determined that the first node equipment not received in the first duration since the first moment in the group system is sent Heartbeat packet, the heartbeat packet for periodically send the packet without business information;
Section point equipment into the group system sends first message, and first message instruction is opened from first moment The heartbeat packet of first node equipment transmission is not received in first duration to begin;
The second message of section point equipment transmission is received, second message indicates that the failure of the first node equipment is The Agent failure of the non-first node equipment or the Agent failure of the first node equipment, Agent configuration To indicate that the first node equipment sends the program of the heartbeat packet;
According to second message, it is determined whether send the business packet to the first node equipment, the business packet for need into The packet with business information sent during row business contact.
Therefore, the method for diagnosing faults applied to group system in the embodiment of the present application, it is determined that from the first moment The heartbeat packet of first node equipment transmission is not received in the first duration started, this message is reported to section point equipment, So that section point equipment is judged the failure of first node equipment, and the message fed back according to section point equipment, really It is fixed whether to first node equipment to send business packet, and then, can with the failure of a certain node device in Accurate Diagnosis group system, And the business of this node device is controlled.
Alternatively, in a kind of implementation of the third aspect, this is according to second message, it is determined whether to the first segment Point device sends the business packet, including:
When second message indicate the first node equipment failure be the non-first node equipment Agent failure, Stop sending the business packet to the first node equipment;Or
When second message indicate the first node equipment failure be the first node equipment Agent failure, just Often the business packet is sent to the first node equipment.
Fourth aspect, the embodiment of the present application provide a kind of node device, can perform first aspect or first aspect The module or unit of method in any optional implementation.
5th aspect, the embodiment of the present application provides a kind of node device, can perform second aspect or second aspect The module or unit of method in any optional implementation.
6th aspect, the embodiment of the present application provides a kind of node device, can perform the third aspect or the third aspect The module or unit of method in any optional implementation.
7th aspect, there is provided a kind of computer equipment, including memory, transceiver and processor, deposit on the memory The program code that can serve to indicate that and perform above-mentioned first aspect or its any optional implementation is contained, transceiver is used for Specific signal transmitting and receiving is performed under the driving of processor, when the code is performed, the processor can be with equipment in implementation method The each operation performed.
Eighth aspect, there is provided a kind of computer equipment, including memory, transceiver and processor, deposit on the memory The program code that can serve to indicate that and perform above-mentioned second aspect or its any optional implementation is contained, transceiver is used for Specific signal transmitting and receiving is performed under the driving of processor, when the code is performed, the processor can be with equipment in implementation method The each operation performed.
9th aspect, there is provided a kind of computer equipment, including memory, transceiver and processor, deposit on the memory The program code that can serve to indicate that and perform the above-mentioned third aspect or its any optional implementation is contained, transceiver is used for Specific signal transmitting and receiving is performed under the driving of processor, when the code is performed, the processor can be with equipment in implementation method The each operation performed.
Tenth aspect, there is provided a kind of group system, including the node device described in above-mentioned each side.
Tenth on the one hand, there is provided a kind of computer program product including instructing, when run on a computer, makes Obtain the method described in the above-mentioned each side of computer execution.
12nd aspect, there is provided a kind of computer-readable storage medium, have program stored therein in the computer-readable storage medium code, The program code can be used for the method described in the above-mentioned each side of computer execution.
Brief description of the drawings
Fig. 1 is shown using a kind of Netfilter frameworks of method for diagnosing faults applied to group system of the application It is intended to.
Fig. 2 is the schematic diagram of Netfilter data interception bags.
Fig. 3 is the schematic diagram of the method for diagnosing faults applied to group system of the application one embodiment.
Fig. 4 is the schematic diagram at the moment of determination first and the first duration according to the embodiment of the present application.
Fig. 5 be according to the embodiment of the present application inquiry packet at the time of and duration schematic diagram.
Fig. 6 is the schematic diagram according to the proc files of the embodiment of the present application.
Fig. 7 is the schematic diagram of the method for diagnosing faults applied to group system of the application another embodiment.
Fig. 8 is the schematic diagram of the method for diagnosing faults applied to group system of the application further embodiment.
Fig. 9 is the schematic diagram according to a kind of group system of the embodiment of the present application.
Figure 10 is the schematic diagram according to another group system of the embodiment of the present application.
Figure 11 is the schematic block diagram according to a kind of node device of the embodiment of the present application.
Figure 12 is the schematic block diagram according to another node device of the embodiment of the present application.
Figure 13 is the schematic block diagram according to another node device of the embodiment of the present application.
Figure 14 shows the schematic block diagram for the node device that the embodiment of the present application provides.
Embodiment
Below in conjunction with accompanying drawing, the technical scheme in the application is described.
Fig. 1 is the network filter using a kind of method for diagnosing faults applied to group system of the application (Netfilter) schematic diagram of framework.As shown in figure 1, the Netfilter frameworks 100 include following flow:Adjudicate preceding path (PRE_POUTING) 110, local input (LOCAL_IN) 120, forward (FORWARD) 130, local output (LOCAL OUT) 140th, rear path (POST_ROUTING) 150, route judgement 160, route judgement 170 are adjudicated.The Netfilter frameworks 100 with Kernel protocol stack connects.
What PRE_POUTING 110 was carried out is some inspection relevant with type, length, version etc. (data of reception Bag).
LOCAL_IN 120 carries out regulation matching screening to packet according to INPUT regulation linkeds, and realization is such as prevented fires The function (packet for destination for the machine of reception) of wall.
FORWARD 130 carries out the screening of rule match according to FORWARD regulation linkeds to packet, for example, carrying out Related processing (packet non-native for destination of reception) during multicast.
LOCAL OUT 140 carry out the screening of rule match according to OUTPUT regulation linkeds to packet, for example, carrying out Error detection, carry out processing (packet that local host is sent) related during multicast.
POST_ROUTING 150 carries out the operation such as network-caching (all packets)
Route judgement 160 judges that packet is intended for local machine and still forwarded.
Route judgement 170 judges that packet is gone out from which interface.
As shown in figure 1, received data packet flow:Packet enters the Netfilter frameworks from PRE_POUTING 110 100, by route judgement 160, if being intended for the machine, kernel protocol stack is sent to by LOCAL_IN 120;If The destination of this packet is not the machine, then passes this packet by FORWARD 130 and POST_ROUTING 150 Export the Netfilter frameworks 100.
It should be understood that each node device in group system is in received data packet or when carrying out packet forwarding, all Above-mentioned flow need to be undergone.
As shown in figure 1, send packet flow:Kernel protocol stack sends packet, first, by route judgement 170, with Determining this packet is gone out from which interface, then, by LOCAL OUT 140 and POST_ROUTING150 by this Packet transmits out the Netfilter frameworks 100.
It should be understood that each node device in group system all needs to undergo above-mentioned flow when sending packet.
It should be understood that the Netfilter frameworks 100 are that Packet Filtering, connection tracking, address turn are carried out in linux kernel The main of operation such as change and realize framework.
Alternatively, as shown in Fig. 2 the Netfilter frameworks 100 realize the institute for intercepting and kernel protocol stack being sent to from network interface card There is packet, and the newest moment of (node equipment identification) progress dynamic refresh received data packet is numbered according to virtual machine.Such as Fig. 2 Shown Agent can realize transmitting-receiving heartbeat packet function, and be reported when heartbeat packet sends overtime.
Alternatively, the packet can be heartbeat packet or business packet, and the heartbeat packet is the nothing periodically sent The packet of business information, the business packet are the packet with business information of transmission when needing to carry out business contact.
It should be understood that in the group system of the embodiment of the present application, correspondence can occur any in group system Between two node devices, correspondence shows as the transmission of packet in this application.
It should also be understood that the group system in the embodiment of the present application includes master's (Master) node device and at least one It is individual to be produced from (Slave) node device, Master node devices by election algorithm, for example, Paxos, Raft etc., may be used also To be that group system is directly specified, the application is not limited in any way to this.
Alternatively, the node device in the group system can be physical machine or virtual machine.
Alternatively, each Slave node devices send heartbeat packet to Master node devices in the group system.
Alternatively, heartbeat packet is sent between Slave node devices in the group system.
Alternatively, can be with by the above-mentioned means, selecting again when the Master node devices in group system break down Lift or specify a Master node device.
Fig. 3 is the schematic diagram of the method for diagnosing faults 200 applied to group system of the application one embodiment, this method 200 executive agent can be the Master node devices in group system.As shown in figure 3, this method 200 includes:
210, determine that the first node equipment in the group system does not receive in the first duration since the first moment The heartbeat packet that section point equipment in the group system is sent, the data without business information that the heartbeat packet sends for periodicity Bag.
It should be understood that the section point equipment can be any one Slave node device in the group system.
Alternatively, the first node equipment can be the Master node devices or the collection in the group system Some Slave node device in group's system, can also be multiple Slave node devices in the group system.
Alternatively, first duration can be the duration that continuous several times (for example, 5-7 times) do not receive the heartbeat packet.
Alternatively, at the time of first moment can not receive the heartbeat packet continuous 5-7 times according to the first node equipment Determined with the continuous 5-7 times duration for not receiving the heartbeat packet.
For example, as shown in figure 4, first node equipment determines not received heartbeat packet continuous 5 times at current time, this When, first node equipment determines continuous 5 times when a length of first durations for not receiving heartbeat packet, and according to current time and first The moment of duration calculation first.
Alternatively, when the first node equipment is non-Master node devices, can be determined as follows this One node device does not receive the heartbeat packet of section point equipment transmission in first duration since first moment:
Receive the first node equipment transmission first message, the first message indicate the first node equipment from this first The heartbeat packet of section point equipment transmission is not received in first duration that moment starts;
According to the first message, determine that the first node equipment does not receive in first duration since first moment The heartbeat packet sent to the section point equipment.
220, inquire about at least one node device in the second duration since the second moment in the group system and receive The quantity for the packet that the section point equipment arrived is sent.
Alternatively, the packet includes the heartbeat packet and business packet, the transmission when business packet is needs to carry out business contact The packet with business information.
Alternatively, second moment was later than or equal to first moment, the finish time of second duration be later than or Equal to the finish time of first duration.
For example, as shown in figure 5, the second moment was later than for the first moment, the finish time of the second duration is later than the first duration The data moment, meanwhile, the second duration is more than the first duration, and a certain section is inquired in the second duration that can start at the second moment The quantity that point device receives the packet of section point equipment transmission is 3.
Alternatively, at least one node device can be Master node devices or Slave node devices. Alternatively, at least one node device includes the first node equipment.
It is alternatively possible to inquired about by the following two kinds mode in the second duration since the second moment in the group system The quantity of the packet that sends of the section point equipment that receives of at least one node device.
Mode one, when at least one node device is Master node devices, Master node devices inquire about oneself The packet that the section point equipment received in second duration since second moment of local record is sent Quantity.Now, when Master node devices receive the packet every time by the interior nuclear equipment record of native operating sys-tern At the time of and send the packet node device mark.
Mode two, when at least one node device is Slave node devices, Master node devices are to the 3rd node Equipment sends the first instruction message, and the 3rd node device belongs at least one node device, and the first instruction message includes The mark of the section point equipment, and this first instruction message be used for indicate the 3rd node device inquiry local record from The quantity for the packet that the section point equipment received in second duration that second moment starts is sent;Receiving should First response message of the 3rd node device feedback, first response message are opened including the 3rd node device from second moment The quantity for the packet that the section point equipment received in second duration to begin is sent.Now, the 3rd node is set The standby interior nuclear equipment by native operating sys-tern records at the time of when receiving the packet every time and sent the section of the packet The mark of point device.
Alternatively, mode one and mode two can exist simultaneously, for example, Master node devices inquire about oneself local record Second duration since second moment in the quantity of the packet that sends of the section point equipment that receives, together When, Master node devices send the first instruction message to the 3rd node device, and the first instruction message includes the section point The mark of equipment, and the first instruction message be used to indicating the 3rd node device inquiry local record from second moment The quantity for the packet that the section point equipment received in second duration started is sent;The 3rd node is received to set First response message of standby feedback, first response message include the 3rd node device since second moment this second The quantity for the packet that the section point equipment received in duration is sent.
Alternatively, the result that Master node devices are inquired about according to mode one and mode two, comprehensive diagnos whether be this The Agent failure of two node devices.
Alternatively, the 3rd node device can be multiple Slave node devices.Alternatively, the 3rd node device can To be the first node equipment.
Alternatively, the interior nuclear equipment can be kernel module (Kernel Module, KM).
Alternatively, at the time of when the interior nuclear equipment can receive the packet every time by proc file records and transmission The mark of the node device of the packet.
Alternatively, second, microsecond, nanosecond are can be as accurate as at the time of proc file records packet receives, the application is to this It is not intended to be limited in any.
For example, as shown in fig. 6, the numbering and the node device of the corresponding node device of every a line of proc file records The due in of heartbeat packet or business packet, wherein, key represents node device numbering (mark), and value_sec correspondingly receives number According to second at the time of bag, microsecond at the time of value_usec correspondingly receives packet.
230, according to the section point equipment received in second duration since second moment inquired The quantity of the packet sent, diagnosis whether be the section point equipment Agent failure, the Agent is configured to Indicate that the section point equipment sends the program of the heartbeat packet.
Alternatively, the Agent (agent) of the section point is a virtual module inside the section point, also may be used To be one section of program code.
Alternatively, in the Agent failure of the non-section point equipment, i.e., event integrally occurs for the section point equipment During barrier, the section point equipment can not send the heartbeat packet and the business packet;In the Agent failure of the section point equipment When, the section point equipment normally sends the business packet, but can not send the heartbeat packet.
Alternatively, the section point equipment received in second duration since second moment inquired When the quantity of the packet sent is zero, the Agent failure of the non-section point equipment is diagnosed as.
Alternatively, the section point equipment received in second duration since second moment inquired When the quantity of the packet sent is more than zero, the Agent failure of the section point equipment is diagnosed as.
Alternatively, it is non-Master node devices in the first node equipment, and determines that the section point equipment occurs During the Agent failure of the non-section point equipment, the second message, second message instruction are sent to the first node equipment The failure of the section point equipment is the Agent failure of the non-section point equipment.
It should be understood that now, the section point equipment, which can not be realized, carries out business packet transmission.
Alternatively, now, the section point equipment may have occurred the failure on hardware.
Alternatively, the service controller into the group system reports the failure knot for the section point equipment that diagnosis obtains Fruit, so that the service controller is controlled to the business of the section point equipment.
For example, when being diagnosed as the Agent failure of the non-section point equipment of section point equipment generation, the industry Business controller according to situation is reported, cancel and the device-dependent business contact of the section point by control.
Therefore, in the method for diagnosing faults applied to group system of the embodiment of the present application, Master node devices exist Determine that the first node equipment in group system does not receive section point equipment hair in the first duration since the first moment During the heartbeat packet sent, the packet that the section point equipment received is sent is inquired about in the second duration since the second moment Quantity, and then, Accurate Diagnosis is section point device fails, or the agency of section point equipment breaks down, and is realized Effective management to group system.
Further, Master node devices can receive number every time by the interior nuclear equipment record of native operating sys-tern According to bag when at the time of and send packet node device mark, it is thus possible to inquire about being opened from the second moment for local record The quantity for the packet that the section point equipment received in the second duration to begin is sent.
Further, the 3rd node device receives packet every time by the interior nuclear equipment record of native operating sys-tern When at the time of and send packet node device mark, so as to, Master node devices can be looked into the 3rd node device Ask the quantity for the packet that the section point equipment received in the second duration since the second moment is sent.
Fig. 7 is the schematic diagram of the method for diagnosing faults 300 applied to group system of the application another embodiment, the party The executive agent of method 300 can be the Slave node devices in group system.As shown in fig. 7, this method 300 includes:
310, the first instruction message that the first node equipment in the group system is sent is received, the first instruction message package Include the mark of the section point equipment in the group system, and the first instruction message be used to indicating inquiry local record from The quantity for the packet that the section point equipment received in the first duration that first moment started is sent.
It should be understood that the first node equipment is Master node devices.
It should also be understood that the section point equipment is Slave node devices.
It should also be understood that at the time of when the local record of Slave node devices receives the packet every time and send should The mark of the node device of packet.
Alternatively, when receiving the first instruction message of first node equipment transmission, represent that the first node is set There occurs failure for standby suspection section point equipment.
Alternatively, the packet includes heartbeat packet and business packet, and the heartbeat packet is periodically to send without business information Packet, the business packet are the packet with business information of transmission when needing to carry out business contact.
Alternatively, Slave node devices receive the packet every time by the interior nuclear equipment record of native operating sys-tern When at the time of and send the packet node device mark.
320, according to the first instruction message, first duration since first moment for inquiring about local record is inscribed The quantity for the packet that the section point equipment that receives is sent, and to the number that the first node equipment feedback query arrives According to the quantity of bag, so that the first node equipment is judged the failure of the section point equipment.
Alternatively, received in first duration since first moment of the local record inquired this When the quantity for the packet that two node devices are sent is equal to zero, the first node equipment can be diagnosed as the non-section point and set Standby Agent failure, the section point equipment can not send the heartbeat packet, can not also send the business packet, the Agent It is configured to indicate the program that the section point equipment sends the heartbeat packet.
Alternatively, received in first duration since first moment of the local record inquired this When the quantity for the packet that two node devices are sent is more than zero, the first node equipment can be diagnosed as the section point equipment Agent failure, the section point equipment can not send the heartbeat packet, can also normally send the business packet, this acts on behalf of journey Sequence is configured to indicate the program that the section point equipment sends the heartbeat packet.
Therefore, the method for diagnosing faults applied to group system in the embodiment of the present application, Slave node devices pass through The interior nuclear equipment of native operating sys-tern records at the time of when receiving packet every time and sent the mark of the node device of packet Know, receive first node equipment transmission first instruction message after, inquire about local record since the first moment The quantity for the packet that the section point equipment received in first duration is sent, so that first node equipment is set to section point Standby failure judged, and then, the failure of a certain node device in group system can be accurately identified, is realized to cluster Effective management of system.
Fig. 8 is the schematic diagram of the method for diagnosing faults 400 applied to group system of the application further embodiment, the party The executive agent of method 400 can be the Slave node devices in group system.As shown in figure 8, this method 400 includes:
410, it is determined that the first node equipment in the group system is not received in the first duration since the first moment The heartbeat packet of transmission, the packet without business information that the heartbeat packet sends for periodicity.
It should be understood that under normal circumstances, the executive agent of this method 400 is can to receive the first node device periodically The heartbeat packet of transmission.
Alternatively, first duration can be that continuous several times (for example, 5-7 times) do not receive the first node equipment and sent The duration of heartbeat packet.
Alternatively, the first node equipment is a Slave node device in the group system.
Alternatively, the executive agent of this method 400 in the first duration since the first moment it is determined that do not receive this During the heartbeat packet that the first node equipment in group system is sent, now, the first node device fails are suspected, further Ground is, it is necessary to which the Master node devices in the group system determine whether the first node equipment can also send business packet to test Demonstrate,prove this suspection.
420, section point equipment into the group system sends first message, first message instruction from this first when Carve the heartbeat packet for not receiving first node equipment transmission in first duration started.
It should be understood that the section point equipment is the Master node devices in the group system.
Alternatively, send the first message to the section point equipment so that the section point equipment judge be this first Node device is there occurs failure, or there occurs failure by the agency of the first node equipment.
430, the second message of section point equipment transmission is received, second message indicates the event of the first node equipment Hinder the Agent failure of the Agent failure or the first node equipment for the non-first node equipment, the Agent It is configured to indicate the program that the first node equipment sends the heartbeat packet.
440, according to second message, it is determined whether send the business packet to the first node equipment, the business packet is to need Carry out the packet with business information of transmission during business contact.
Alternatively, when second message indicates that the failure of the first node equipment acts on behalf of journey for the non-first node equipment During sequence failure, stop sending the business packet to the first node equipment.
Alternatively, when second message indicates that the failure of the first node equipment is the Agent of the first node equipment During failure, normally the business packet is sent to the first node equipment.
Therefore, the method for diagnosing faults applied to group system in the embodiment of the present application, it is determined that from the first moment The heartbeat packet of first node equipment transmission is not received in the first duration started, this message is reported to section point equipment, So that section point equipment is judged the failure of first node equipment, and the message fed back according to section point equipment, really It is fixed whether to first node equipment to send business packet, and then, can with the failure of a certain node device in Accurate Diagnosis group system, And the business of this node device is controlled.
It is alternatively possible to as one embodiment, as shown in figure 9, group system includes 4 node devices, remember respectively For node device -0, node device -1, node device -2 and node device -3, wherein, node device -1 is that Master nodes are set Standby, node device -0, node device -2 and node device -3 are Slave node devices, and each node device leads in group system At the time of mistake when interior nuclear equipment (KM) record of native operating sys-tern (Operating System, OS) receives packet every time With the mark of the node device of transmission packet.Under normal circumstances, AGNET-0 is acted on behalf of in node device -0 into node device -1 Acting on behalf of AGENT-1, periodically (such as 1 second) sends heartbeat packet, and AGNET-2 generations into node device -1 are acted on behalf of in node device -2 Reason AGENT-1 periodically sends heartbeat packet, acts on behalf of AGNET-3 in node device -3 and AGENT-1 is acted on behalf of into node device -1 Periodically send heartbeat packet.And business APP 0-1 and APP 0-2 and business APP 2- in node device -2 in node device -0 In 1 and APP 2-2 or node device -3 there is operational interacting message in business APP 3-1 and APP 3-2, i.e., mutually sends Business packet.
1st step, if heartbeat timeout occurs for AGENT-0 (for example, 5 in AGENT-1 decision nodes equipment -0 in node device -1 Heartbeat packet is not received within~7 seconds then to may determine that as heartbeat timeout), then AGENT-1 to AGENT-2 and AGENT-3 initiate inquiry from The statistics number that node device -0 sends packet (business packet and heartbeat packet) is received in the first duration that first moment started, At the time of first moment is did not received heartbeat packet for the first time;
2nd step, AGENT-2 and AGENT-3 are inquired about from the first moment to native operating sys-tern kernel state equipment (KM) respectively The statistics number that node device -0 sends packet (business packet and heartbeat packet), returning result are received in the first duration started To AGENT-1;
3rd step, last AGENT-1 are counted to diagnose according to the packet number for receiving AGENT-2 and AGENT-3 returns Node device -0 is that AGENT-0 failures or the whole node device of node device -0 break down, and improves the accuracy of diagnosis, i.e., If the packet number statistical value inquired is 0, the overall failure of node device -0 can be diagnosed as, if the number inquired It is more than 0 according to bag number statistical value, then can be diagnosed as AGENT-0 failures.
It is alternatively possible to as one embodiment, as shown in Figure 10, group system includes 4 node devices, remembers respectively For node device -0, node device -1, node device -2 and node device -3, wherein, node device -1 is that Master nodes are set Standby, node device -0, node device -2 and node device -3 are Slave node devices, and each node device leads in group system The interior nuclear equipment (KM) for crossing native operating sys-tern (OS) records at the time of when receiving packet every time and sent the section of packet The mark of point device.Under normal circumstances, AGNET-0 is acted on behalf of in node device -0 and the AGENT-1 cycles is acted on behalf of into node device -1 Property send heartbeat packet (such as 1 second), acted on behalf of in node device -2 AGNET-2 acted on behalf of into node device -1 AGENT-1 periodically Ground sends heartbeat packet, acts on behalf of AGNET-3 in node device -3 AGENT-1 is acted on behalf of into node device -1 and periodically send heartbeat Bag.And in node device -0 there is industry in business APP 0-1 and APP 0-2 and business APP 1-1 and APP 1-2 in node device -1 Interacting message in business, i.e., mutually send business packet.
1st step, if heartbeat timeout occurs for AGENT-0 (for example, 5 in AGENT-1 decision nodes equipment -0 in node device -1 Heartbeat packet is not received within~7 seconds then to may determine that as heartbeat timeout), then AGENT-1 is to native operating sys-tern (OS) kernel state equipment (KM) receive node device -0 in the first duration of the inquiry since the first moment and send packet (business packet and heartbeat packet) Statistics number, the first moment for for the first time do not receive heartbeat packet at the time of;
2nd step, AGENT-1 according to local search to packet number to count come diagnosis node equipment -0 be AGENT-0 Failure or the whole node device of node device -0 break down, and improve the accuracy of diagnosis, i.e., if the packet inquired Number statistical value is 0, then the overall failure of node device -0 can be diagnosed as, if the packet number statistical value inquired is more than 0, then it can be diagnosed as AGENT-0 failures.
Figure 11 is the schematic block diagram according to a kind of node device 500 of the embodiment of the present application.As shown in figure 11, the node Equipment 500 includes:
Determining unit 510, for determining first duration of the first node equipment in group system since the first moment The heartbeat packet that the section point equipment in the group system is sent is not received inside, the heartbeat packet is periodically transmission without business The packet of information;
Query unit 520, it is at least one in the group system in the second duration since the second moment for inquiring about The quantity for the packet that the section point equipment that node device receives is sent, the packet include the heartbeat packet and business Bag, the business packet are the packet with business information of transmission when needing to carry out business contact;
Diagnosis unit 530, for being somebody's turn to do according to what is received in second duration since second moment inquired Section point equipment send the packet quantity, diagnosis whether be the section point equipment Agent failure, the generation Reason program is configured to indicate the program that the section point equipment sends the heartbeat packet.
Alternatively, the query unit 520 is specifically used for:
At the time of when receiving the packet every time by the interior nuclear equipment of native operating sys-tern record and send the data The mark of the node device of bag;
The section point equipment hair received is inquired about in second duration since second moment of local record The quantity of the packet sent.
Alternatively, the query unit 520 is specifically used for:
The first instruction message is sent to the 3rd node device, the 3rd node device belongs at least one node device, The first instruction message includes the mark of the section point equipment, and the first instruction message is used to indicate that the 3rd node is set What the section point equipment received in second duration since second moment for future reference for asking local record was sent should The quantity of packet;
The first response message of the 3rd node device feedback is received, first response message includes the 3rd node device The quantity for the packet that the section point equipment received in second duration since second moment is sent.
It should be understood that the 3rd node device receives the packet every time by the interior nuclear equipment record of native operating sys-tern When at the time of and send the packet node device mark.
Alternatively, the diagnosis unit 530, is additionally operable to:
What the section point equipment received in second duration since second moment inquired was sent When the quantity of the packet is zero, the Agent failure of the non-section point equipment is diagnosed as;Or
What the section point equipment received in second duration since second moment inquired was sent When the quantity of the packet is more than zero, the Agent failure of the section point equipment is diagnosed as.
Alternatively, in the Agent failure of the non-section point equipment, the section point equipment can not send the heart Jump bag and the business packet;In the Agent failure of the section point equipment, the section point equipment normally sends the business Bag, but the heartbeat packet can not be sent.
Alternatively, the determining unit 510 is specifically used for:
Receive the first node equipment transmission first message, the first message indicate the first node equipment from this first The heartbeat packet of section point equipment transmission is not received in first duration that moment starts;
According to the first message, determine that the first node equipment does not receive in first duration since first moment The heartbeat packet sent to the section point equipment.
Alternatively, the node device 500 also includes:
Transmitting element 540, for sending the second message to the first node equipment, second message indicates the section point The failure of equipment is the Agent failure of the non-section point equipment.
Alternatively, second moment was later than or equal to first moment, the finish time of second duration be later than or Equal to the finish time of first duration.
Alternatively, transmitting element 540, report that diagnosis obtains for the service controller into the group system this second The failure that node device occurs, so that the service controller is controlled to the business of the section point equipment.
It should be understood that above and other operation of the unit in a kind of node device 500 of the embodiment of the present application And/or function is respectively in order to realize the corresponding flow of Master node devices in the method 200 in Fig. 3, for sake of simplicity, herein not Repeat again.
Figure 12 is the schematic block diagram according to another node device 600 of the embodiment of the present application.As shown in figure 12, the section Point device 600 includes:
Receiving unit 610, for receiving the first instruction message of the transmission of the first node equipment in the group system, this One instruction message includes the mark of the section point equipment in the group system, and the first instruction message is used to indicate to inquire about The quantity for the packet that the section point equipment received in the first duration since the first moment of local record is sent, Wherein, the packet includes heartbeat packet and business packet, the packet without business information that the heartbeat packet sends for periodicity, the industry The packet with business information of transmission when business bag is needs to carry out business contact;
Query unit 620, for according to this first instruction message, inquire about local record since first moment should The quantity for the packet that the section point equipment received in first duration is sent, and fed back to the first node equipment The quantity of the packet inquired, so that the first node equipment is judged the failure of the section point equipment.
Alternatively, being opened from first moment for local record is inquired about according to the first instruction message in the query unit 620 Before the quantity for the packet that the section point equipment received in first duration to begin is sent, the node device 600 Also include:
Recording unit 630, for being recorded by the interior nuclear equipment of native operating sys-tern when receiving the packet every time The mark of the node device of moment and the transmission packet.
It should be understood that above and other operation of the unit in a kind of node device 600 of the embodiment of the present application And/or function is respectively in order to realize the corresponding flow of Slave node devices in the method 300 in Fig. 7, for sake of simplicity, herein not Repeat again.
Figure 13 is the schematic block diagram according to another node device 700 of the embodiment of the present application.As shown in figure 13, the section Point device 700 includes:
Determining unit 710, for determining not receiving in the group system in the first duration since the first moment The heartbeat packet that first node equipment is sent, the packet without business information that the heartbeat packet sends for periodicity;
Transmitting element 720, first message is sent for the section point equipment into the group system, the first message refers to Show the heartbeat packet for not receiving first node equipment transmission in first duration since first moment;
Receiving unit 730, for receive the section point equipment transmission the second message, second message indicate this first The failure of node device is the Agent event of the Agent failure or the first node equipment of the non-first node equipment Barrier, the Agent are configured to indicate the program that the first node equipment sends the heartbeat packet;
The determining unit 710, is additionally operable to according to second message, it is determined whether sends the business to the first node equipment Bag, the business packet are the packet with business information of transmission when needing to carry out business contact.
Alternatively, the determining unit 710 is specifically used for:
When second message indicate the first node equipment failure be the non-first node equipment Agent failure, Stop sending the business packet to the first node equipment;Or
When second message indicate the first node equipment failure be the first node equipment Agent failure, just Often the business packet is sent to the first node equipment.
It should be understood that above and other operation of the unit in a kind of node device 700 of the embodiment of the present application And/or function is respectively in order to realize the corresponding flow of Slave node devices in the method 400 in Fig. 8, for sake of simplicity, herein not Repeat again.
Figure 14 shows the schematic block diagram for the computer equipment 800 that the embodiment of the present application provides, the computer equipment 800 Including:
Memory 810, for storage program, the program includes code;
Transceiver 820, for being communicated with other node devices;
Processor 830, for performing the program code in memory 810.
Alternatively, when the code is performed, the processor 830 can realize the interior joint equipment of method 200 in Fig. 3, At least one of the interior joint equipment of method 300 in Fig. 7, the interior joint equipment of method 400 in Fig. 8 node device performs each Individual operation, for sake of simplicity, will not be repeated here.Transceiver 820 is used to perform specific packet under the driving of processor 830 Transmitting-receiving.
It should be understood that in the embodiment of the present application, the processor 830 can be CPU (Central Processing Unit, CPU), the processor 830 can also be other general processors, digital signal processor (DSP), specially With integrated circuit (ASIC), ready-made programmable gate array (FPGA) either other PLDs, discrete gate or crystal Pipe logical device, discrete hardware components etc..General processor can be microprocessor or the processor can also be it is any often Processor of rule etc..
The memory 810 can include read-only storage and random access memory, and to processor 830 provide instruction and Data.The a part of of memory 810 can also include nonvolatile RAM.For example, memory 810 can also be deposited Store up the information of device type.
Transceiver 820 can be used to realize that signal sends and receives function, such as frequency modulation(PFM) and demodulation function or cry Up-conversion and frequency down-conversion function.
In implementation process, at least one step of the above method can be patrolled by the integrated of the hardware in processor 830 Collect circuit to complete, or the integrated logic circuit can complete at least one step under the order-driven of software form.Therefore, should Computer equipment 800 can be physical machine or virtual machine.The step of method with reference to disclosed in the embodiment of the present application, can be straight Connect and be presented as that hardware processor performs completion, or completion is performed with the hardware in processor and software module combination.Software mould Block can be located at random access memory, flash memory, read-only storage, programmable read only memory or electrically erasable programmable storage In the ripe storage medium in this areas such as device, register.The storage medium is located at memory, and processor 830 is read in memory Information, with reference to its hardware complete the above method the step of.To avoid repeating, it is not detailed herein.
Alternatively, the embodiment of the present application provides a kind of computer equipment, including the interior joint equipment of method 200 in Fig. 3, The interior joint equipment of method 400 in the interior joint equipment of method 300, Fig. 8 in Fig. 7, at least one in the node device in Figure 14 Kind node device.
Those of ordinary skill in the art are it is to be appreciated that the list of each example described with reference to the embodiments described herein Member and algorithm steps, it can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually Performed with hardware or software mode, application-specific and design constraint depending on technical scheme.Professional and technical personnel Described function can be realized using distinct methods to each specific application, but this realization is it is not considered that exceed Scope of the present application.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, the corresponding process in preceding method embodiment is may be referred to, will not be repeated here.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, can be with Realize by another way.For example, device embodiment described above is only schematical, for example, the unit Division, only a kind of division of logic function, can there is other dividing mode, such as multiple units or component when actually realizing Another system can be combined or be desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown or The mutual coupling discussed or direct-coupling or communication connection can be the indirect couplings by some interfaces, device or unit Close or communicate to connect, can be electrical, mechanical or other forms.
The unit illustrated as separating component can be or may not be physically separate, show as unit The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be selected to realize the scheme of the present embodiment according to the actual needs.
In addition, each functional unit in each embodiment of the application can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or its any combination real It is existing.When implemented in software, can realize in the form of a computer program product whole or in part.The computer program Product includes one or more computer instructions.When loading on computers and performing the computer program instructions, all or Partly produce according to the flow or function described in the embodiment of the present invention.The computer can be all-purpose computer, special meter Calculation machine, computer network or other programmable devices.The computer instruction can be stored in computer-readable recording medium In, or the transmission from a computer-readable recording medium to another computer-readable recording medium, for example, the computer Instruction can from a web-site, computer, server or data center by wired (for example, coaxial cable, optical fiber, number Word user line (DSL)) or wireless (for example, infrared, wireless, microwave etc.) mode to another web-site, computer, server Or data center is transmitted.The computer-readable recording medium can be any usable medium that computer can access or Person is the data storage devices such as server, the data center integrated comprising one or more usable mediums.The usable medium can To be magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium are (for example, solid-state Hard disk Solid State Disk (SSD)) etc..
Described above, the only embodiment of the application, but the protection domain of the application is not limited thereto is any Those familiar with the art can readily occur in change or replacement in the technical scope that the application discloses, and should all contain Cover within the protection domain of the application.Therefore, the protection domain of the application should be based on the protection scope of the described claims.

Claims (27)

1. a kind of method for diagnosing faults applied to group system, it is characterised in that methods described includes:
Determine that the first node equipment in the group system does not receive the collection in the first duration since the first moment The heartbeat packet that section point equipment in group's system is sent, the data without business information that the heartbeat packet sends for periodicity Bag;
Inquire about the institute that at least one node device in the second duration since the second moment in the group system receives The quantity of the packet of section point equipment transmission is stated, the packet includes the heartbeat packet and business packet, the industry The packet with business information of transmission when business bag is needs to carry out business contact;
Sent out according to the section point equipment received in second duration since second moment inquired The quantity of the packet sent, diagnosis whether be the section point equipment Agent failure, the Agent matches somebody with somebody It is set to the program for indicating that the section point equipment sends the heartbeat packet.
2. according to the method for claim 1, it is characterised in that methods described also includes:
At the time of when receiving the packet every time by the interior nuclear equipment of native operating sys-tern record and send the data The mark of the node device of bag;
At least one node device of the inquiry in the second duration since the second moment in the group system receives The section point equipment send the packet quantity, including:
The section point equipment received is inquired about in second duration since second moment of local record The quantity of the packet sent.
3. method according to claim 1 or 2, it is characterised in that second duration of the inquiry since the second moment The packet that the section point equipment that at least one node device in the interior group system receives is sent Quantity, including:
The first instruction message is sent to the 3rd node device, the 3rd node device belongs at least one node device, The first instruction message includes the mark of the section point equipment, for indicating the local note of the 3rd node device inquiry The data that the section point equipment received in second duration since second moment of record is sent The quantity of bag;
The first response message of the 3rd node device feedback is received, first response message is set including the 3rd node The packet that the section point equipment received in standby second duration since second moment is sent Quantity.
4. according to any described method in claims 1 to 3, it is characterised in that the basis inquire from described second The quantity for the packet that the section point equipment received in second duration that moment starts is sent, diagnosis are The no Agent failure for the section point equipment, including:
The section point equipment received in second duration since second moment inquired is sent The quantity of packet when being zero, be diagnosed as the Agent failure of the non-section point equipment;Or
The section point equipment received in second duration since second moment inquired is sent The quantity of packet when being more than zero, be diagnosed as the Agent failure of the section point equipment.
5. according to the method for claim 4, it is characterised in that in the Agent failure of the non-section point equipment When, the section point equipment can not send the heartbeat packet and the business packet;Journey is acted on behalf of in the section point equipment During sequence failure, the section point equipment normally sends the business packet, but can not send the heartbeat packet.
6. the method according to claim 4 or 5, it is characterised in that the first node determined in the group system Equipment does not receive the heartbeat that the section point equipment in the group system is sent in the first duration since the first moment Bag, including:
Receive the first message that the first node equipment is sent, the first message indicates the first node equipment from described The heartbeat packet that the section point equipment is sent is not received in first duration that first moment started;
According to the first message, determine the first node equipment in first duration since first moment not Receive the heartbeat packet that the section point equipment is sent.
7. according to the method for claim 6, it is characterised in that methods described also includes:
The second message is sent to the first node equipment, second message indicates that the failure of the section point equipment is non- The Agent failure of the section point equipment.
8. according to any described method in claim 1 to 7, it is characterised in that second moment is later than or equal to institute Stated for the first moment, the finish time of second duration is later than or the finish time equal to first duration.
9. according to any described method in claim 1 to 8, it is characterised in that methods described also includes:
The failure that the section point equipment that service controller into the group system reports diagnosis to obtain occurs, so that The service controller is controlled to the business of the section point equipment.
10. a kind of method for diagnosing faults applied to group system, it is characterised in that methods described includes:
The first instruction message that the first node equipment in the group system is sent is received, the first instruction message includes institute State the mark of the section point equipment in group system, and the first instruction message be used to indicating inquiry local record from The quantity for the packet that the section point equipment received in the first duration that first moment started is sent, wherein, it is described Packet includes heartbeat packet and business packet, the packet without business information that the heartbeat packet sends for periodicity, the business Wrap the packet with business information of transmission during to need to carry out business contact;
According to the described first instruction message, inquire about in first duration since first moment of local record and receive The quantity for the packet that the section point equipment arrived is sent, and arrived to the first node equipment feedback query The quantity of the packet, so that the first node equipment is judged the failure of the section point equipment.
11. according to the method for claim 10, it is characterised in that according to the described first instruction message, inquire about local note The data that the section point equipment received in first duration since first moment of record is sent Before the quantity of bag, methods described also includes:
At the time of when receiving the packet every time by the interior nuclear equipment of native operating sys-tern record and send the data The mark of the node device of bag.
12. a kind of method for diagnosing faults applied to group system, it is characterised in that methods described includes:
It is determined that what the first node equipment not received in the first duration since the first moment in the group system was sent Heartbeat packet, the packet without business information that the heartbeat packet sends for periodicity;
Section point equipment into the group system sends first message, and the first message was indicated from first moment The heartbeat packet that the first node equipment is sent is not received in first duration started;
The second message that the section point equipment is sent is received, second message indicates the failure of the first node equipment The Agent failure of Agent failure or the first node equipment for the non-first node equipment, the agency Program is configured to indicate the program that the first node equipment sends the heartbeat packet;
According to second message, it is determined whether send the business packet to the first node equipment, the business packet is to need Carry out the packet with business information of transmission during business contact.
13. according to the method for claim 12, it is characterised in that described according to second message, it is determined whether to institute State first node equipment and send the business packet, including:
When second message indicates that the failure of the first node equipment is the Agent event of the non-first node equipment Barrier, stop sending the business packet to the first node equipment;Or
When second message indicates that the failure of the first node equipment is the Agent failure of the first node equipment, Normally the business packet is sent to the first node equipment.
14. a kind of node device, it is characterised in that the node device includes:
Determining unit, for determining that the first node equipment in group system does not receive in the first duration since the first moment The heartbeat packet sent to the section point equipment in the group system, the heartbeat packet are periodically transmission without business information Packet;
Query unit, set for inquiring about at least one node in the second duration since the second moment in the group system The quantity for the packet that the standby section point equipment received is sent, the packet include the heartbeat packet and industry Business bag, the business packet are the packet with business information of transmission when needing to carry out business contact;
Diagnosis unit, for according to described the received in second duration since second moment that inquires Two node devices send the packet quantity, diagnosis whether be the section point equipment Agent failure, institute Agent is stated to be configured to indicate the program that the section point equipment sends the heartbeat packet.
15. node device according to claim 14, it is characterised in that the query unit is specifically used for:
At the time of when receiving the packet every time by the interior nuclear equipment of native operating sys-tern record and send the data The mark of the node device of bag;
The section point equipment received is inquired about in second duration since second moment of local record The quantity of the packet sent.
16. node device according to claim 14, it is characterised in that the query unit is specifically used for:
The first instruction message is sent to the 3rd node device, the 3rd node device belongs at least one node device, The first instruction message includes the mark of the section point equipment, and the first instruction message is for indicating described the Second section received in second duration since second moment of three node devices inquiry local record The quantity for the packet that point device is sent;
The first response message of the 3rd node device feedback is received, first response message is set including the 3rd node The packet that the section point equipment received in standby second duration since second moment is sent Quantity.
17. according to any described node device in claim 14 to 16, it is characterised in that the diagnosis unit, be additionally operable to:
The section point equipment received in second duration since second moment inquired is sent The quantity of packet when being zero, be diagnosed as the Agent failure of the non-section point equipment;Or
The section point equipment received in second duration since second moment inquired is sent The quantity of packet when being more than zero, be diagnosed as the Agent failure of the section point equipment.
18. node device according to claim 17, it is characterised in that in the Agent of the non-section point equipment During failure, the section point equipment can not send the heartbeat packet and the business packet;In the generation of the section point equipment When managing program mal, the section point equipment normally sends the business packet, but can not send the heartbeat packet.
19. the node device according to claim 17 or 18, it is characterised in that the determining unit is specifically used for:
Receive the first message that the first node equipment is sent, the first message indicates the first node equipment from described The heartbeat packet that the section point equipment is sent is not received in first duration that first moment started;
According to the first message, determine the first node equipment in first duration since first moment not Receive the heartbeat packet that the section point equipment is sent.
20. node device according to claim 19, it is characterised in that the node device also includes:
Transmitting element, for sending the second message to the first node equipment, second message indicates the section point The failure of equipment is the Agent failure of the non-section point equipment.
21. according to any described node device in claim 14 to 20, it is characterised in that second moment be later than or Equal to first moment, the finish time of second duration is later than or the finish time equal to first duration.
22. according to any described node device in claim 14 to 21, it is characterised in that the node device also includes:
Transmitting element, the section point equipment for reporting diagnosis to obtain for the service controller into the group system are sent out Raw failure, so that the service controller is controlled to the business of the section point equipment.
23. a kind of node device, it is characterised in that the node device includes:
Receiving unit, the first instruction message sent for receiving the first node equipment in the group system, described first Indicate that message includes the mark of the section point equipment in the group system, and the first instruction message is looked into for instruction Ask the packet that the section point equipment received in the first duration since the first moment of local record is sent Quantity, wherein, the packet includes heartbeat packet and business packet, the number without business information that the heartbeat packet sends for periodicity According to bag, the business packet is the packet with business information of transmission when needing to carry out business contact;
Query unit, for according to the described first instruction message, inquiring about described in since first moment of local record The quantity for the packet that the section point equipment received in first duration is sent, and set to the first node The quantity for the packet that standby feedback query arrives, so that the first node equipment is entered to the failure of the section point equipment Row judges.
24. node device according to claim 23, it is characterised in that in the query unit according to the described first instruction Message, the section point equipment received is inquired about in first duration since first moment of local record Before the quantity of the packet sent, the node device also includes:
Recording unit, at the time of during for receiving the packet every time by the interior nuclear equipment of native operating sys-tern record and Send the mark of the node device of the packet.
25. a kind of node device, it is characterised in that the node device includes:
Determining unit, for determining not receiving the first segment in the group system in the first duration since the first moment The heartbeat packet that point device is sent, the packet without business information that the heartbeat packet sends for periodicity;
Transmitting element, first message, the first message instruction are sent for the section point equipment into the group system The heartbeat packet that the first node equipment is sent is not received in first duration since first moment;
Receiving unit, the second message sent for receiving the section point equipment, the second message instruction described first The failure of node device acts on behalf of journey for the Agent failure of the non-first node equipment or the first node equipment Sequence failure, the Agent are configured to indicate the program that the first node equipment sends the heartbeat packet;
The determining unit, it is additionally operable to according to second message, it is determined whether send the industry to the first node equipment Business bag, the business packet are the packet with business information of transmission when needing to carry out business contact.
26. node device according to claim 25, it is characterised in that the determining unit is specifically used for:
When second message indicates that the failure of the first node equipment is the Agent event of the non-first node equipment Barrier, it is determined that stopping sending the business packet to the first node equipment;Or
When second message indicates that the failure of the first node equipment is the Agent failure of the first node equipment, It is determined that normally send the business packet to the first node equipment.
27. a kind of computer equipment, it is characterised in that including any described node device in claim 14 to 26.
CN201710890513.8A 2017-09-27 2017-09-27 Fault diagnosis method applied to cluster system, node equipment and computer equipment Active CN107566219B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710890513.8A CN107566219B (en) 2017-09-27 2017-09-27 Fault diagnosis method applied to cluster system, node equipment and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710890513.8A CN107566219B (en) 2017-09-27 2017-09-27 Fault diagnosis method applied to cluster system, node equipment and computer equipment

Publications (2)

Publication Number Publication Date
CN107566219A true CN107566219A (en) 2018-01-09
CN107566219B CN107566219B (en) 2020-09-18

Family

ID=60981904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710890513.8A Active CN107566219B (en) 2017-09-27 2017-09-27 Fault diagnosis method applied to cluster system, node equipment and computer equipment

Country Status (1)

Country Link
CN (1) CN107566219B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110380934A (en) * 2019-07-23 2019-10-25 南京航空航天大学 A kind of distribution redundant system heartbeat detecting method
CN113760592A (en) * 2021-07-30 2021-12-07 郑州云海信息技术有限公司 Node kernel detection method and related device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100115338A1 (en) * 2003-08-27 2010-05-06 Rao Sudhir G Reliable Fault Resolution In A Cluster
CN102594596A (en) * 2012-02-15 2012-07-18 华为技术有限公司 Method and device for recognizing available partitions, and clustering network system
CN106170782A (en) * 2013-04-26 2016-11-30 华为技术有限公司 The system and method for highly scalable high availability cluster is created in the MPP cluster of machine in a network
CN106301853A (en) * 2015-06-05 2017-01-04 华为技术有限公司 The fault detection method of group system interior joint and device
CN106656682A (en) * 2017-02-27 2017-05-10 网宿科技股份有限公司 Method, system and device for detecting cluster heartbeat

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100115338A1 (en) * 2003-08-27 2010-05-06 Rao Sudhir G Reliable Fault Resolution In A Cluster
CN102594596A (en) * 2012-02-15 2012-07-18 华为技术有限公司 Method and device for recognizing available partitions, and clustering network system
CN106170782A (en) * 2013-04-26 2016-11-30 华为技术有限公司 The system and method for highly scalable high availability cluster is created in the MPP cluster of machine in a network
CN106301853A (en) * 2015-06-05 2017-01-04 华为技术有限公司 The fault detection method of group system interior joint and device
CN106656682A (en) * 2017-02-27 2017-05-10 网宿科技股份有限公司 Method, system and device for detecting cluster heartbeat

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110380934A (en) * 2019-07-23 2019-10-25 南京航空航天大学 A kind of distribution redundant system heartbeat detecting method
CN113760592A (en) * 2021-07-30 2021-12-07 郑州云海信息技术有限公司 Node kernel detection method and related device
CN113760592B (en) * 2021-07-30 2024-02-27 郑州云海信息技术有限公司 Node kernel detection method and related device

Also Published As

Publication number Publication date
CN107566219B (en) 2020-09-18

Similar Documents

Publication Publication Date Title
US5568471A (en) System and method for a workstation monitoring and control of multiple networks having different protocols
US6529954B1 (en) Knowledge based expert analysis system
CN101313280B (en) Pool-based network diagnostic systems and methods
JP5033856B2 (en) Devices and systems for network configuration assumptions
US6363384B1 (en) Expert system process flow
US7024508B2 (en) Bus station with integrated bus monitor function
US6526044B1 (en) Real-time analysis through capture buffer with real-time historical data correlation
US20090161556A1 (en) Methods and Apparatus for Fault Identification in Border Gateway Protocol Networks
CN107659423A (en) Method for processing business and device
US7818283B1 (en) Service assurance automation access diagnostics
CN102325036B (en) The method for diagnosing faults of a kind of network system, system and device
WO2016095718A1 (en) Method for detecting communication link, base station, network manager, system and storage medium
US7657623B2 (en) Method and apparatus for collecting management information on a communication network
CN107870832A (en) Multipath storage device based on various dimensions Gernral Check-up method
US20110141914A1 (en) Systems and Methods for Providing Ethernet Service Circuit Management
CN111800354B (en) Message processing method and device, message processing equipment and storage medium
CN101027872A (en) Communication network management system for automatic fault repair
EP3316520B1 (en) Bfd method and apparatus
CN107925590B (en) The method and apparatus for analyzing network performance related with one or more parts of network
CN106452952B (en) A kind of method and gateway cluster detecting group system communications status
CN107566219A (en) Method for diagnosing faults, node device and computer equipment applied to group system
CN110519122A (en) A kind of network quality automatic monitoring device and method based on Mtr
Appleby et al. Yemanja-a layered event correlation engine for multi-domain server farms
Steinder et al. Non-deterministic diagnosis of end-to-end service failures in a multi-layer communication system
JP3569827B2 (en) Network system status diagnosis / monitoring device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant