CN107566219A - Method for diagnosing faults, node device and computer equipment applied to group system - Google Patents
Method for diagnosing faults, node device and computer equipment applied to group system Download PDFInfo
- Publication number
- CN107566219A CN107566219A CN201710890513.8A CN201710890513A CN107566219A CN 107566219 A CN107566219 A CN 107566219A CN 201710890513 A CN201710890513 A CN 201710890513A CN 107566219 A CN107566219 A CN 107566219A
- Authority
- CN
- China
- Prior art keywords
- packet
- equipment
- section point
- node
- sent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
This application provides a kind of method for diagnosing faults, node device and computer equipment applied to group system, it is possible to increase the accuracy of node device fault diagnosis.This method includes:Determine that first node equipment does not receive the heartbeat packet of section point equipment transmission in the first duration since the first moment;The quantity for the packet that the section point equipment that at least one node device receives in the second duration since the second moment is sent is inquired about, the packet includes heartbeat packet and business packet;The quantity of the packet sent according to the section point equipment received in second duration since second moment inquired, diagnosis whether be the section point equipment Agent failure, the Agent is configured to indicate the program that the section point equipment sends the heartbeat packet.
Description
Technical field
The application is related to virtual machine field, and more particularly, to a kind of fault diagnosis side applied to group system
Method, node device and computer equipment.
Background technology
In the case where network function virtualizes (Network Function Virtualization, NFV) framework, following cluster
Exponential increase will be presented in the scale of system, and operation layer is to the more aobvious protrusion of problem of management of cluster, and the management to group system is more
Tool challenge.
In large-scale cluster, if a certain node device is delayed, extremely (operating system nucleus is hard for machine failure or system extension
Deadlock or soft deadlock) failure, node device internal applications (Application, APP) failure is directly resulted in, and APP failures are then
Influence business.How in virtual machine node device fails, the equipment fault of checkout and diagnosis virtual machine node, is avoided exactly
Influence business pot life, it is a urgent problem to be solved.
The content of the invention
This application provides a kind of method for diagnosing faults, node device and computer equipment applied to group system, energy
Enough improve the accuracy of node device fault diagnosis.
In a first aspect, the embodiment of the present application provides a kind of method for diagnosing faults applied to group system, this method bag
Include:
Determine that the first node equipment in the group system does not receive the collection in the first duration since the first moment
The heartbeat packet that section point equipment in group's system is sent, the packet without business information that the heartbeat packet sends for periodicity;
Inquire about what at least one node device in the second duration since the second moment in the group system received
The quantity for the packet that the section point equipment is sent, the packet include the heartbeat packet and business packet, and the business packet is to need
Carry out the packet with business information of transmission during business contact;
Sent according to the section point equipment received in second duration since second moment inquired
The packet quantity, diagnosis whether be the section point equipment Agent failure, the Agent is configured to indicate
The section point equipment sends the program of the heartbeat packet.
Alternatively, at least one node device includes the first node equipment.
Alternatively, the group system includes a host node (Master) and multiple from node (Slave).
Alternatively, should be performed applied to the method for diagnosing faults of group system by Master nodes.
Therefore, in the method for diagnosing faults applied to group system of the embodiment of the present application, it is determined that in group system
First node equipment when not receiving the heartbeat packet of section point equipment transmission in the first duration since the first moment, look into
Ask the packet (heartbeat packet and business packet) that the section point equipment received in the second duration since the second moment is sent
Quantity, avoid merely according to heartbeat packet come determine the failure of section point and caused by failure mistaken diagnosis, and then, can accurately examine
It is disconnected whether be section point equipment Agent failure.
Alternatively, in a kind of implementation of first aspect, this method also includes:
At the time of when receiving the packet every time by the interior nuclear equipment of native operating sys-tern record and send the data
The mark of the node device of bag;
At least one node device of the inquiry in the second duration since the second moment in the group system receives
The section point equipment send the packet quantity, including:
The section point equipment hair received is inquired about in second duration since second moment of local record
The quantity of the packet sent.
Therefore, in the method for diagnosing faults applied to group system of the embodiment of the present application, native operating sys-tern it is interior
Nuclear equipment records at the time of when receiving packet every time and sent the mark of the node device of packet, it is thus possible to inquire about
The section point equipment that receives is sent in the second duration since the second moment of local record packet (heartbeat packet and
Business packet) quantity, and then, can with Accurate Diagnosis whether be section point equipment Agent failure.
Alternatively, in a kind of implementation of first aspect, the inquiry should in the second duration since the second moment
The quantity for the packet that the section point equipment that at least one node device in group system receives is sent, including:
The first instruction message is sent to the 3rd node device, the 3rd node device belongs at least one node device,
The first instruction message includes the mark of the section point equipment, and the first instruction message is used to indicate that the 3rd node is set
What the section point equipment received in second duration since second moment for future reference for asking local record was sent should
The quantity of packet;
The first response message of the 3rd node device feedback is received, first response message includes the 3rd node device
The quantity for the packet that the section point equipment received in second duration since second moment is sent.
It should be understood that the 3rd node device receives the packet every time by the interior nuclear equipment record of native operating sys-tern
When at the time of and send the packet node device mark.
Alternatively, the 3rd node device can be the first node equipment.
Therefore, in the method for diagnosing faults applied to group system of the embodiment of the present application, the 3rd node device passes through
The interior nuclear equipment of native operating sys-tern records at the time of when receiving packet every time and sent the mark of the node device of packet
Know, it is thus possible to the section point equipment received to the inquiry of the 3rd node device in the second duration since the second moment
The quantity of the packet of transmission, and then, can with Accurate Diagnosis whether be section point equipment Agent failure.
Alternatively, in a kind of implementation of first aspect, the basis inquire since second moment should
The quantity for the packet that the section point equipment received in second duration is sent, whether diagnosis is the section point equipment
Agent failure, including:
What the section point equipment received in second duration since second moment inquired was sent
When the quantity of the packet is zero, the Agent failure of the non-section point equipment is diagnosed as;Or
What the section point equipment received in second duration since second moment inquired was sent
When the quantity of the packet is more than zero, the Agent failure of the section point equipment is diagnosed as.
Alternatively, in a kind of implementation of first aspect, in the Agent failure of the non-section point equipment,
The section point equipment can not send the heartbeat packet and the business packet;, should in the Agent failure of the section point equipment
Section point equipment normally sends the business packet, but can not send the heartbeat packet.
Alternatively, in a kind of implementation of first aspect, first node equipment in the determination group system from
The heartbeat packet that the section point equipment in the group system is sent is not received in the first duration that first moment started, including:
Receive the first node equipment transmission first message, the first message indicate the first node equipment from this first
The heartbeat packet of section point equipment transmission is not received in first duration that moment starts;
According to the first message, determine that the first node equipment does not receive in first duration since first moment
The heartbeat packet sent to the section point equipment.
Alternatively, in a kind of implementation of first aspect, this method also includes:
The second message is sent to the first node equipment, second message indicates that the failure of the section point equipment is somebody's turn to do to be non-
The Agent failure of section point equipment.
Alternatively, in a kind of implementation of first aspect, second moment was later than or equal to first moment, should
The finish time of second duration is later than or the finish time equal to first duration.
Alternatively, in a kind of implementation of first aspect, this method also includes:
The failure that the section point equipment that service controller into the group system reports diagnosis to obtain occurs, so that
The service controller is controlled to the business of the section point equipment.
Therefore, in the method for diagnosing faults applied to group system of the embodiment of the present application, into the group system
Service controller reports the fail result for the section point equipment that diagnosis obtains, so that service controller is to section point equipment
Business be controlled, realize effective management to group system.
Second aspect, the embodiment of the present application provide a kind of method for diagnosing faults applied to group system, this method bag
Include:
The first instruction message that the first node equipment in the group system is sent is received, the first instruction message includes should
The mark of section point equipment in group system, and the first instruction message be used to indicating inquiry local record from first
The quantity for the packet that the section point equipment received in the first duration that moment starts is sent, wherein, the packet bag
Heartbeat packet and business packet are included, the heartbeat packet is the packet without business information periodically sent, and the business packet is to need to carry out
The packet with business information sent during business contact;
According to the first instruction message, inquire about in first duration since first moment of local record and receive
The quantity of the packet that sends of the section point equipment, and to the packet that the first node equipment feedback query arrives
Quantity so that the first node equipment is judged the failure of the section point equipment.
Alternatively, in a kind of implementation of second aspect, according to the first instruction message, local record is inquired about
Before the quantity for the packet that the section point equipment received in first duration since first moment is sent,
This method also includes:
At the time of when receiving the packet every time by the interior nuclear equipment of native operating sys-tern record and send the data
The mark of the node device of bag.
Therefore, the method for diagnosing faults applied to group system in the embodiment of the present application, native operating sys-tern it is interior
Nuclear equipment records at the time of when receiving packet every time and sent the mark of the node device of packet, is receiving first segment
After the first instruction message that point device is sent, inquire about in the first duration since the first moment of local record what is received
The quantity for the packet that section point equipment is sent, so that first node equipment is judged the failure of section point equipment,
And then can be with the failure of a certain node device in Accurate Diagnosis group system.
The third aspect, the embodiment of the present application provide a kind of method for diagnosing faults applied to group system, this method bag
Include:
It is determined that the first node equipment not received in the first duration since the first moment in the group system is sent
Heartbeat packet, the heartbeat packet for periodically send the packet without business information;
Section point equipment into the group system sends first message, and first message instruction is opened from first moment
The heartbeat packet of first node equipment transmission is not received in first duration to begin;
The second message of section point equipment transmission is received, second message indicates that the failure of the first node equipment is
The Agent failure of the non-first node equipment or the Agent failure of the first node equipment, Agent configuration
To indicate that the first node equipment sends the program of the heartbeat packet;
According to second message, it is determined whether send the business packet to the first node equipment, the business packet for need into
The packet with business information sent during row business contact.
Therefore, the method for diagnosing faults applied to group system in the embodiment of the present application, it is determined that from the first moment
The heartbeat packet of first node equipment transmission is not received in the first duration started, this message is reported to section point equipment,
So that section point equipment is judged the failure of first node equipment, and the message fed back according to section point equipment, really
It is fixed whether to first node equipment to send business packet, and then, can with the failure of a certain node device in Accurate Diagnosis group system,
And the business of this node device is controlled.
Alternatively, in a kind of implementation of the third aspect, this is according to second message, it is determined whether to the first segment
Point device sends the business packet, including:
When second message indicate the first node equipment failure be the non-first node equipment Agent failure,
Stop sending the business packet to the first node equipment;Or
When second message indicate the first node equipment failure be the first node equipment Agent failure, just
Often the business packet is sent to the first node equipment.
Fourth aspect, the embodiment of the present application provide a kind of node device, can perform first aspect or first aspect
The module or unit of method in any optional implementation.
5th aspect, the embodiment of the present application provides a kind of node device, can perform second aspect or second aspect
The module or unit of method in any optional implementation.
6th aspect, the embodiment of the present application provides a kind of node device, can perform the third aspect or the third aspect
The module or unit of method in any optional implementation.
7th aspect, there is provided a kind of computer equipment, including memory, transceiver and processor, deposit on the memory
The program code that can serve to indicate that and perform above-mentioned first aspect or its any optional implementation is contained, transceiver is used for
Specific signal transmitting and receiving is performed under the driving of processor, when the code is performed, the processor can be with equipment in implementation method
The each operation performed.
Eighth aspect, there is provided a kind of computer equipment, including memory, transceiver and processor, deposit on the memory
The program code that can serve to indicate that and perform above-mentioned second aspect or its any optional implementation is contained, transceiver is used for
Specific signal transmitting and receiving is performed under the driving of processor, when the code is performed, the processor can be with equipment in implementation method
The each operation performed.
9th aspect, there is provided a kind of computer equipment, including memory, transceiver and processor, deposit on the memory
The program code that can serve to indicate that and perform the above-mentioned third aspect or its any optional implementation is contained, transceiver is used for
Specific signal transmitting and receiving is performed under the driving of processor, when the code is performed, the processor can be with equipment in implementation method
The each operation performed.
Tenth aspect, there is provided a kind of group system, including the node device described in above-mentioned each side.
Tenth on the one hand, there is provided a kind of computer program product including instructing, when run on a computer, makes
Obtain the method described in the above-mentioned each side of computer execution.
12nd aspect, there is provided a kind of computer-readable storage medium, have program stored therein in the computer-readable storage medium code,
The program code can be used for the method described in the above-mentioned each side of computer execution.
Brief description of the drawings
Fig. 1 is shown using a kind of Netfilter frameworks of method for diagnosing faults applied to group system of the application
It is intended to.
Fig. 2 is the schematic diagram of Netfilter data interception bags.
Fig. 3 is the schematic diagram of the method for diagnosing faults applied to group system of the application one embodiment.
Fig. 4 is the schematic diagram at the moment of determination first and the first duration according to the embodiment of the present application.
Fig. 5 be according to the embodiment of the present application inquiry packet at the time of and duration schematic diagram.
Fig. 6 is the schematic diagram according to the proc files of the embodiment of the present application.
Fig. 7 is the schematic diagram of the method for diagnosing faults applied to group system of the application another embodiment.
Fig. 8 is the schematic diagram of the method for diagnosing faults applied to group system of the application further embodiment.
Fig. 9 is the schematic diagram according to a kind of group system of the embodiment of the present application.
Figure 10 is the schematic diagram according to another group system of the embodiment of the present application.
Figure 11 is the schematic block diagram according to a kind of node device of the embodiment of the present application.
Figure 12 is the schematic block diagram according to another node device of the embodiment of the present application.
Figure 13 is the schematic block diagram according to another node device of the embodiment of the present application.
Figure 14 shows the schematic block diagram for the node device that the embodiment of the present application provides.
Embodiment
Below in conjunction with accompanying drawing, the technical scheme in the application is described.
Fig. 1 is the network filter using a kind of method for diagnosing faults applied to group system of the application
(Netfilter) schematic diagram of framework.As shown in figure 1, the Netfilter frameworks 100 include following flow:Adjudicate preceding path
(PRE_POUTING) 110, local input (LOCAL_IN) 120, forward (FORWARD) 130, local output (LOCAL OUT)
140th, rear path (POST_ROUTING) 150, route judgement 160, route judgement 170 are adjudicated.The Netfilter frameworks 100 with
Kernel protocol stack connects.
What PRE_POUTING 110 was carried out is some inspection relevant with type, length, version etc. (data of reception
Bag).
LOCAL_IN 120 carries out regulation matching screening to packet according to INPUT regulation linkeds, and realization is such as prevented fires
The function (packet for destination for the machine of reception) of wall.
FORWARD 130 carries out the screening of rule match according to FORWARD regulation linkeds to packet, for example, carrying out
Related processing (packet non-native for destination of reception) during multicast.
LOCAL OUT 140 carry out the screening of rule match according to OUTPUT regulation linkeds to packet, for example, carrying out
Error detection, carry out processing (packet that local host is sent) related during multicast.
POST_ROUTING 150 carries out the operation such as network-caching (all packets)
Route judgement 160 judges that packet is intended for local machine and still forwarded.
Route judgement 170 judges that packet is gone out from which interface.
As shown in figure 1, received data packet flow:Packet enters the Netfilter frameworks from PRE_POUTING 110
100, by route judgement 160, if being intended for the machine, kernel protocol stack is sent to by LOCAL_IN 120;If
The destination of this packet is not the machine, then passes this packet by FORWARD 130 and POST_ROUTING 150
Export the Netfilter frameworks 100.
It should be understood that each node device in group system is in received data packet or when carrying out packet forwarding, all
Above-mentioned flow need to be undergone.
As shown in figure 1, send packet flow:Kernel protocol stack sends packet, first, by route judgement 170, with
Determining this packet is gone out from which interface, then, by LOCAL OUT 140 and POST_ROUTING150 by this
Packet transmits out the Netfilter frameworks 100.
It should be understood that each node device in group system all needs to undergo above-mentioned flow when sending packet.
It should be understood that the Netfilter frameworks 100 are that Packet Filtering, connection tracking, address turn are carried out in linux kernel
The main of operation such as change and realize framework.
Alternatively, as shown in Fig. 2 the Netfilter frameworks 100 realize the institute for intercepting and kernel protocol stack being sent to from network interface card
There is packet, and the newest moment of (node equipment identification) progress dynamic refresh received data packet is numbered according to virtual machine.Such as Fig. 2
Shown Agent can realize transmitting-receiving heartbeat packet function, and be reported when heartbeat packet sends overtime.
Alternatively, the packet can be heartbeat packet or business packet, and the heartbeat packet is the nothing periodically sent
The packet of business information, the business packet are the packet with business information of transmission when needing to carry out business contact.
It should be understood that in the group system of the embodiment of the present application, correspondence can occur any in group system
Between two node devices, correspondence shows as the transmission of packet in this application.
It should also be understood that the group system in the embodiment of the present application includes master's (Master) node device and at least one
It is individual to be produced from (Slave) node device, Master node devices by election algorithm, for example, Paxos, Raft etc., may be used also
To be that group system is directly specified, the application is not limited in any way to this.
Alternatively, the node device in the group system can be physical machine or virtual machine.
Alternatively, each Slave node devices send heartbeat packet to Master node devices in the group system.
Alternatively, heartbeat packet is sent between Slave node devices in the group system.
Alternatively, can be with by the above-mentioned means, selecting again when the Master node devices in group system break down
Lift or specify a Master node device.
Fig. 3 is the schematic diagram of the method for diagnosing faults 200 applied to group system of the application one embodiment, this method
200 executive agent can be the Master node devices in group system.As shown in figure 3, this method 200 includes:
210, determine that the first node equipment in the group system does not receive in the first duration since the first moment
The heartbeat packet that section point equipment in the group system is sent, the data without business information that the heartbeat packet sends for periodicity
Bag.
It should be understood that the section point equipment can be any one Slave node device in the group system.
Alternatively, the first node equipment can be the Master node devices or the collection in the group system
Some Slave node device in group's system, can also be multiple Slave node devices in the group system.
Alternatively, first duration can be the duration that continuous several times (for example, 5-7 times) do not receive the heartbeat packet.
Alternatively, at the time of first moment can not receive the heartbeat packet continuous 5-7 times according to the first node equipment
Determined with the continuous 5-7 times duration for not receiving the heartbeat packet.
For example, as shown in figure 4, first node equipment determines not received heartbeat packet continuous 5 times at current time, this
When, first node equipment determines continuous 5 times when a length of first durations for not receiving heartbeat packet, and according to current time and first
The moment of duration calculation first.
Alternatively, when the first node equipment is non-Master node devices, can be determined as follows this
One node device does not receive the heartbeat packet of section point equipment transmission in first duration since first moment:
Receive the first node equipment transmission first message, the first message indicate the first node equipment from this first
The heartbeat packet of section point equipment transmission is not received in first duration that moment starts;
According to the first message, determine that the first node equipment does not receive in first duration since first moment
The heartbeat packet sent to the section point equipment.
220, inquire about at least one node device in the second duration since the second moment in the group system and receive
The quantity for the packet that the section point equipment arrived is sent.
Alternatively, the packet includes the heartbeat packet and business packet, the transmission when business packet is needs to carry out business contact
The packet with business information.
Alternatively, second moment was later than or equal to first moment, the finish time of second duration be later than or
Equal to the finish time of first duration.
For example, as shown in figure 5, the second moment was later than for the first moment, the finish time of the second duration is later than the first duration
The data moment, meanwhile, the second duration is more than the first duration, and a certain section is inquired in the second duration that can start at the second moment
The quantity that point device receives the packet of section point equipment transmission is 3.
Alternatively, at least one node device can be Master node devices or Slave node devices.
Alternatively, at least one node device includes the first node equipment.
It is alternatively possible to inquired about by the following two kinds mode in the second duration since the second moment in the group system
The quantity of the packet that sends of the section point equipment that receives of at least one node device.
Mode one, when at least one node device is Master node devices, Master node devices inquire about oneself
The packet that the section point equipment received in second duration since second moment of local record is sent
Quantity.Now, when Master node devices receive the packet every time by the interior nuclear equipment record of native operating sys-tern
At the time of and send the packet node device mark.
Mode two, when at least one node device is Slave node devices, Master node devices are to the 3rd node
Equipment sends the first instruction message, and the 3rd node device belongs at least one node device, and the first instruction message includes
The mark of the section point equipment, and this first instruction message be used for indicate the 3rd node device inquiry local record from
The quantity for the packet that the section point equipment received in second duration that second moment starts is sent;Receiving should
First response message of the 3rd node device feedback, first response message are opened including the 3rd node device from second moment
The quantity for the packet that the section point equipment received in second duration to begin is sent.Now, the 3rd node is set
The standby interior nuclear equipment by native operating sys-tern records at the time of when receiving the packet every time and sent the section of the packet
The mark of point device.
Alternatively, mode one and mode two can exist simultaneously, for example, Master node devices inquire about oneself local record
Second duration since second moment in the quantity of the packet that sends of the section point equipment that receives, together
When, Master node devices send the first instruction message to the 3rd node device, and the first instruction message includes the section point
The mark of equipment, and the first instruction message be used to indicating the 3rd node device inquiry local record from second moment
The quantity for the packet that the section point equipment received in second duration started is sent;The 3rd node is received to set
First response message of standby feedback, first response message include the 3rd node device since second moment this second
The quantity for the packet that the section point equipment received in duration is sent.
Alternatively, the result that Master node devices are inquired about according to mode one and mode two, comprehensive diagnos whether be this
The Agent failure of two node devices.
Alternatively, the 3rd node device can be multiple Slave node devices.Alternatively, the 3rd node device can
To be the first node equipment.
Alternatively, the interior nuclear equipment can be kernel module (Kernel Module, KM).
Alternatively, at the time of when the interior nuclear equipment can receive the packet every time by proc file records and transmission
The mark of the node device of the packet.
Alternatively, second, microsecond, nanosecond are can be as accurate as at the time of proc file records packet receives, the application is to this
It is not intended to be limited in any.
For example, as shown in fig. 6, the numbering and the node device of the corresponding node device of every a line of proc file records
The due in of heartbeat packet or business packet, wherein, key represents node device numbering (mark), and value_sec correspondingly receives number
According to second at the time of bag, microsecond at the time of value_usec correspondingly receives packet.
230, according to the section point equipment received in second duration since second moment inquired
The quantity of the packet sent, diagnosis whether be the section point equipment Agent failure, the Agent is configured to
Indicate that the section point equipment sends the program of the heartbeat packet.
Alternatively, the Agent (agent) of the section point is a virtual module inside the section point, also may be used
To be one section of program code.
Alternatively, in the Agent failure of the non-section point equipment, i.e., event integrally occurs for the section point equipment
During barrier, the section point equipment can not send the heartbeat packet and the business packet;In the Agent failure of the section point equipment
When, the section point equipment normally sends the business packet, but can not send the heartbeat packet.
Alternatively, the section point equipment received in second duration since second moment inquired
When the quantity of the packet sent is zero, the Agent failure of the non-section point equipment is diagnosed as.
Alternatively, the section point equipment received in second duration since second moment inquired
When the quantity of the packet sent is more than zero, the Agent failure of the section point equipment is diagnosed as.
Alternatively, it is non-Master node devices in the first node equipment, and determines that the section point equipment occurs
During the Agent failure of the non-section point equipment, the second message, second message instruction are sent to the first node equipment
The failure of the section point equipment is the Agent failure of the non-section point equipment.
It should be understood that now, the section point equipment, which can not be realized, carries out business packet transmission.
Alternatively, now, the section point equipment may have occurred the failure on hardware.
Alternatively, the service controller into the group system reports the failure knot for the section point equipment that diagnosis obtains
Fruit, so that the service controller is controlled to the business of the section point equipment.
For example, when being diagnosed as the Agent failure of the non-section point equipment of section point equipment generation, the industry
Business controller according to situation is reported, cancel and the device-dependent business contact of the section point by control.
Therefore, in the method for diagnosing faults applied to group system of the embodiment of the present application, Master node devices exist
Determine that the first node equipment in group system does not receive section point equipment hair in the first duration since the first moment
During the heartbeat packet sent, the packet that the section point equipment received is sent is inquired about in the second duration since the second moment
Quantity, and then, Accurate Diagnosis is section point device fails, or the agency of section point equipment breaks down, and is realized
Effective management to group system.
Further, Master node devices can receive number every time by the interior nuclear equipment record of native operating sys-tern
According to bag when at the time of and send packet node device mark, it is thus possible to inquire about being opened from the second moment for local record
The quantity for the packet that the section point equipment received in the second duration to begin is sent.
Further, the 3rd node device receives packet every time by the interior nuclear equipment record of native operating sys-tern
When at the time of and send packet node device mark, so as to, Master node devices can be looked into the 3rd node device
Ask the quantity for the packet that the section point equipment received in the second duration since the second moment is sent.
Fig. 7 is the schematic diagram of the method for diagnosing faults 300 applied to group system of the application another embodiment, the party
The executive agent of method 300 can be the Slave node devices in group system.As shown in fig. 7, this method 300 includes:
310, the first instruction message that the first node equipment in the group system is sent is received, the first instruction message package
Include the mark of the section point equipment in the group system, and the first instruction message be used to indicating inquiry local record from
The quantity for the packet that the section point equipment received in the first duration that first moment started is sent.
It should be understood that the first node equipment is Master node devices.
It should also be understood that the section point equipment is Slave node devices.
It should also be understood that at the time of when the local record of Slave node devices receives the packet every time and send should
The mark of the node device of packet.
Alternatively, when receiving the first instruction message of first node equipment transmission, represent that the first node is set
There occurs failure for standby suspection section point equipment.
Alternatively, the packet includes heartbeat packet and business packet, and the heartbeat packet is periodically to send without business information
Packet, the business packet are the packet with business information of transmission when needing to carry out business contact.
Alternatively, Slave node devices receive the packet every time by the interior nuclear equipment record of native operating sys-tern
When at the time of and send the packet node device mark.
320, according to the first instruction message, first duration since first moment for inquiring about local record is inscribed
The quantity for the packet that the section point equipment that receives is sent, and to the number that the first node equipment feedback query arrives
According to the quantity of bag, so that the first node equipment is judged the failure of the section point equipment.
Alternatively, received in first duration since first moment of the local record inquired this
When the quantity for the packet that two node devices are sent is equal to zero, the first node equipment can be diagnosed as the non-section point and set
Standby Agent failure, the section point equipment can not send the heartbeat packet, can not also send the business packet, the Agent
It is configured to indicate the program that the section point equipment sends the heartbeat packet.
Alternatively, received in first duration since first moment of the local record inquired this
When the quantity for the packet that two node devices are sent is more than zero, the first node equipment can be diagnosed as the section point equipment
Agent failure, the section point equipment can not send the heartbeat packet, can also normally send the business packet, this acts on behalf of journey
Sequence is configured to indicate the program that the section point equipment sends the heartbeat packet.
Therefore, the method for diagnosing faults applied to group system in the embodiment of the present application, Slave node devices pass through
The interior nuclear equipment of native operating sys-tern records at the time of when receiving packet every time and sent the mark of the node device of packet
Know, receive first node equipment transmission first instruction message after, inquire about local record since the first moment
The quantity for the packet that the section point equipment received in first duration is sent, so that first node equipment is set to section point
Standby failure judged, and then, the failure of a certain node device in group system can be accurately identified, is realized to cluster
Effective management of system.
Fig. 8 is the schematic diagram of the method for diagnosing faults 400 applied to group system of the application further embodiment, the party
The executive agent of method 400 can be the Slave node devices in group system.As shown in figure 8, this method 400 includes:
410, it is determined that the first node equipment in the group system is not received in the first duration since the first moment
The heartbeat packet of transmission, the packet without business information that the heartbeat packet sends for periodicity.
It should be understood that under normal circumstances, the executive agent of this method 400 is can to receive the first node device periodically
The heartbeat packet of transmission.
Alternatively, first duration can be that continuous several times (for example, 5-7 times) do not receive the first node equipment and sent
The duration of heartbeat packet.
Alternatively, the first node equipment is a Slave node device in the group system.
Alternatively, the executive agent of this method 400 in the first duration since the first moment it is determined that do not receive this
During the heartbeat packet that the first node equipment in group system is sent, now, the first node device fails are suspected, further
Ground is, it is necessary to which the Master node devices in the group system determine whether the first node equipment can also send business packet to test
Demonstrate,prove this suspection.
420, section point equipment into the group system sends first message, first message instruction from this first when
Carve the heartbeat packet for not receiving first node equipment transmission in first duration started.
It should be understood that the section point equipment is the Master node devices in the group system.
Alternatively, send the first message to the section point equipment so that the section point equipment judge be this first
Node device is there occurs failure, or there occurs failure by the agency of the first node equipment.
430, the second message of section point equipment transmission is received, second message indicates the event of the first node equipment
Hinder the Agent failure of the Agent failure or the first node equipment for the non-first node equipment, the Agent
It is configured to indicate the program that the first node equipment sends the heartbeat packet.
440, according to second message, it is determined whether send the business packet to the first node equipment, the business packet is to need
Carry out the packet with business information of transmission during business contact.
Alternatively, when second message indicates that the failure of the first node equipment acts on behalf of journey for the non-first node equipment
During sequence failure, stop sending the business packet to the first node equipment.
Alternatively, when second message indicates that the failure of the first node equipment is the Agent of the first node equipment
During failure, normally the business packet is sent to the first node equipment.
Therefore, the method for diagnosing faults applied to group system in the embodiment of the present application, it is determined that from the first moment
The heartbeat packet of first node equipment transmission is not received in the first duration started, this message is reported to section point equipment,
So that section point equipment is judged the failure of first node equipment, and the message fed back according to section point equipment, really
It is fixed whether to first node equipment to send business packet, and then, can with the failure of a certain node device in Accurate Diagnosis group system,
And the business of this node device is controlled.
It is alternatively possible to as one embodiment, as shown in figure 9, group system includes 4 node devices, remember respectively
For node device -0, node device -1, node device -2 and node device -3, wherein, node device -1 is that Master nodes are set
Standby, node device -0, node device -2 and node device -3 are Slave node devices, and each node device leads in group system
At the time of mistake when interior nuclear equipment (KM) record of native operating sys-tern (Operating System, OS) receives packet every time
With the mark of the node device of transmission packet.Under normal circumstances, AGNET-0 is acted on behalf of in node device -0 into node device -1
Acting on behalf of AGENT-1, periodically (such as 1 second) sends heartbeat packet, and AGNET-2 generations into node device -1 are acted on behalf of in node device -2
Reason AGENT-1 periodically sends heartbeat packet, acts on behalf of AGNET-3 in node device -3 and AGENT-1 is acted on behalf of into node device -1
Periodically send heartbeat packet.And business APP 0-1 and APP 0-2 and business APP 2- in node device -2 in node device -0
In 1 and APP 2-2 or node device -3 there is operational interacting message in business APP 3-1 and APP 3-2, i.e., mutually sends
Business packet.
1st step, if heartbeat timeout occurs for AGENT-0 (for example, 5 in AGENT-1 decision nodes equipment -0 in node device -1
Heartbeat packet is not received within~7 seconds then to may determine that as heartbeat timeout), then AGENT-1 to AGENT-2 and AGENT-3 initiate inquiry from
The statistics number that node device -0 sends packet (business packet and heartbeat packet) is received in the first duration that first moment started,
At the time of first moment is did not received heartbeat packet for the first time;
2nd step, AGENT-2 and AGENT-3 are inquired about from the first moment to native operating sys-tern kernel state equipment (KM) respectively
The statistics number that node device -0 sends packet (business packet and heartbeat packet), returning result are received in the first duration started
To AGENT-1;
3rd step, last AGENT-1 are counted to diagnose according to the packet number for receiving AGENT-2 and AGENT-3 returns
Node device -0 is that AGENT-0 failures or the whole node device of node device -0 break down, and improves the accuracy of diagnosis, i.e.,
If the packet number statistical value inquired is 0, the overall failure of node device -0 can be diagnosed as, if the number inquired
It is more than 0 according to bag number statistical value, then can be diagnosed as AGENT-0 failures.
It is alternatively possible to as one embodiment, as shown in Figure 10, group system includes 4 node devices, remembers respectively
For node device -0, node device -1, node device -2 and node device -3, wherein, node device -1 is that Master nodes are set
Standby, node device -0, node device -2 and node device -3 are Slave node devices, and each node device leads in group system
The interior nuclear equipment (KM) for crossing native operating sys-tern (OS) records at the time of when receiving packet every time and sent the section of packet
The mark of point device.Under normal circumstances, AGNET-0 is acted on behalf of in node device -0 and the AGENT-1 cycles is acted on behalf of into node device -1
Property send heartbeat packet (such as 1 second), acted on behalf of in node device -2 AGNET-2 acted on behalf of into node device -1 AGENT-1 periodically
Ground sends heartbeat packet, acts on behalf of AGNET-3 in node device -3 AGENT-1 is acted on behalf of into node device -1 and periodically send heartbeat
Bag.And in node device -0 there is industry in business APP 0-1 and APP 0-2 and business APP 1-1 and APP 1-2 in node device -1
Interacting message in business, i.e., mutually send business packet.
1st step, if heartbeat timeout occurs for AGENT-0 (for example, 5 in AGENT-1 decision nodes equipment -0 in node device -1
Heartbeat packet is not received within~7 seconds then to may determine that as heartbeat timeout), then AGENT-1 is to native operating sys-tern (OS) kernel state equipment
(KM) receive node device -0 in the first duration of the inquiry since the first moment and send packet (business packet and heartbeat packet)
Statistics number, the first moment for for the first time do not receive heartbeat packet at the time of;
2nd step, AGENT-1 according to local search to packet number to count come diagnosis node equipment -0 be AGENT-0
Failure or the whole node device of node device -0 break down, and improve the accuracy of diagnosis, i.e., if the packet inquired
Number statistical value is 0, then the overall failure of node device -0 can be diagnosed as, if the packet number statistical value inquired is more than
0, then it can be diagnosed as AGENT-0 failures.
Figure 11 is the schematic block diagram according to a kind of node device 500 of the embodiment of the present application.As shown in figure 11, the node
Equipment 500 includes:
Determining unit 510, for determining first duration of the first node equipment in group system since the first moment
The heartbeat packet that the section point equipment in the group system is sent is not received inside, the heartbeat packet is periodically transmission without business
The packet of information;
Query unit 520, it is at least one in the group system in the second duration since the second moment for inquiring about
The quantity for the packet that the section point equipment that node device receives is sent, the packet include the heartbeat packet and business
Bag, the business packet are the packet with business information of transmission when needing to carry out business contact;
Diagnosis unit 530, for being somebody's turn to do according to what is received in second duration since second moment inquired
Section point equipment send the packet quantity, diagnosis whether be the section point equipment Agent failure, the generation
Reason program is configured to indicate the program that the section point equipment sends the heartbeat packet.
Alternatively, the query unit 520 is specifically used for:
At the time of when receiving the packet every time by the interior nuclear equipment of native operating sys-tern record and send the data
The mark of the node device of bag;
The section point equipment hair received is inquired about in second duration since second moment of local record
The quantity of the packet sent.
Alternatively, the query unit 520 is specifically used for:
The first instruction message is sent to the 3rd node device, the 3rd node device belongs at least one node device,
The first instruction message includes the mark of the section point equipment, and the first instruction message is used to indicate that the 3rd node is set
What the section point equipment received in second duration since second moment for future reference for asking local record was sent should
The quantity of packet;
The first response message of the 3rd node device feedback is received, first response message includes the 3rd node device
The quantity for the packet that the section point equipment received in second duration since second moment is sent.
It should be understood that the 3rd node device receives the packet every time by the interior nuclear equipment record of native operating sys-tern
When at the time of and send the packet node device mark.
Alternatively, the diagnosis unit 530, is additionally operable to:
What the section point equipment received in second duration since second moment inquired was sent
When the quantity of the packet is zero, the Agent failure of the non-section point equipment is diagnosed as;Or
What the section point equipment received in second duration since second moment inquired was sent
When the quantity of the packet is more than zero, the Agent failure of the section point equipment is diagnosed as.
Alternatively, in the Agent failure of the non-section point equipment, the section point equipment can not send the heart
Jump bag and the business packet;In the Agent failure of the section point equipment, the section point equipment normally sends the business
Bag, but the heartbeat packet can not be sent.
Alternatively, the determining unit 510 is specifically used for:
Receive the first node equipment transmission first message, the first message indicate the first node equipment from this first
The heartbeat packet of section point equipment transmission is not received in first duration that moment starts;
According to the first message, determine that the first node equipment does not receive in first duration since first moment
The heartbeat packet sent to the section point equipment.
Alternatively, the node device 500 also includes:
Transmitting element 540, for sending the second message to the first node equipment, second message indicates the section point
The failure of equipment is the Agent failure of the non-section point equipment.
Alternatively, second moment was later than or equal to first moment, the finish time of second duration be later than or
Equal to the finish time of first duration.
Alternatively, transmitting element 540, report that diagnosis obtains for the service controller into the group system this second
The failure that node device occurs, so that the service controller is controlled to the business of the section point equipment.
It should be understood that above and other operation of the unit in a kind of node device 500 of the embodiment of the present application
And/or function is respectively in order to realize the corresponding flow of Master node devices in the method 200 in Fig. 3, for sake of simplicity, herein not
Repeat again.
Figure 12 is the schematic block diagram according to another node device 600 of the embodiment of the present application.As shown in figure 12, the section
Point device 600 includes:
Receiving unit 610, for receiving the first instruction message of the transmission of the first node equipment in the group system, this
One instruction message includes the mark of the section point equipment in the group system, and the first instruction message is used to indicate to inquire about
The quantity for the packet that the section point equipment received in the first duration since the first moment of local record is sent,
Wherein, the packet includes heartbeat packet and business packet, the packet without business information that the heartbeat packet sends for periodicity, the industry
The packet with business information of transmission when business bag is needs to carry out business contact;
Query unit 620, for according to this first instruction message, inquire about local record since first moment should
The quantity for the packet that the section point equipment received in first duration is sent, and fed back to the first node equipment
The quantity of the packet inquired, so that the first node equipment is judged the failure of the section point equipment.
Alternatively, being opened from first moment for local record is inquired about according to the first instruction message in the query unit 620
Before the quantity for the packet that the section point equipment received in first duration to begin is sent, the node device 600
Also include:
Recording unit 630, for being recorded by the interior nuclear equipment of native operating sys-tern when receiving the packet every time
The mark of the node device of moment and the transmission packet.
It should be understood that above and other operation of the unit in a kind of node device 600 of the embodiment of the present application
And/or function is respectively in order to realize the corresponding flow of Slave node devices in the method 300 in Fig. 7, for sake of simplicity, herein not
Repeat again.
Figure 13 is the schematic block diagram according to another node device 700 of the embodiment of the present application.As shown in figure 13, the section
Point device 700 includes:
Determining unit 710, for determining not receiving in the group system in the first duration since the first moment
The heartbeat packet that first node equipment is sent, the packet without business information that the heartbeat packet sends for periodicity;
Transmitting element 720, first message is sent for the section point equipment into the group system, the first message refers to
Show the heartbeat packet for not receiving first node equipment transmission in first duration since first moment;
Receiving unit 730, for receive the section point equipment transmission the second message, second message indicate this first
The failure of node device is the Agent event of the Agent failure or the first node equipment of the non-first node equipment
Barrier, the Agent are configured to indicate the program that the first node equipment sends the heartbeat packet;
The determining unit 710, is additionally operable to according to second message, it is determined whether sends the business to the first node equipment
Bag, the business packet are the packet with business information of transmission when needing to carry out business contact.
Alternatively, the determining unit 710 is specifically used for:
When second message indicate the first node equipment failure be the non-first node equipment Agent failure,
Stop sending the business packet to the first node equipment;Or
When second message indicate the first node equipment failure be the first node equipment Agent failure, just
Often the business packet is sent to the first node equipment.
It should be understood that above and other operation of the unit in a kind of node device 700 of the embodiment of the present application
And/or function is respectively in order to realize the corresponding flow of Slave node devices in the method 400 in Fig. 8, for sake of simplicity, herein not
Repeat again.
Figure 14 shows the schematic block diagram for the computer equipment 800 that the embodiment of the present application provides, the computer equipment 800
Including:
Memory 810, for storage program, the program includes code;
Transceiver 820, for being communicated with other node devices;
Processor 830, for performing the program code in memory 810.
Alternatively, when the code is performed, the processor 830 can realize the interior joint equipment of method 200 in Fig. 3,
At least one of the interior joint equipment of method 300 in Fig. 7, the interior joint equipment of method 400 in Fig. 8 node device performs each
Individual operation, for sake of simplicity, will not be repeated here.Transceiver 820 is used to perform specific packet under the driving of processor 830
Transmitting-receiving.
It should be understood that in the embodiment of the present application, the processor 830 can be CPU (Central
Processing Unit, CPU), the processor 830 can also be other general processors, digital signal processor (DSP), specially
With integrated circuit (ASIC), ready-made programmable gate array (FPGA) either other PLDs, discrete gate or crystal
Pipe logical device, discrete hardware components etc..General processor can be microprocessor or the processor can also be it is any often
Processor of rule etc..
The memory 810 can include read-only storage and random access memory, and to processor 830 provide instruction and
Data.The a part of of memory 810 can also include nonvolatile RAM.For example, memory 810 can also be deposited
Store up the information of device type.
Transceiver 820 can be used to realize that signal sends and receives function, such as frequency modulation(PFM) and demodulation function or cry
Up-conversion and frequency down-conversion function.
In implementation process, at least one step of the above method can be patrolled by the integrated of the hardware in processor 830
Collect circuit to complete, or the integrated logic circuit can complete at least one step under the order-driven of software form.Therefore, should
Computer equipment 800 can be physical machine or virtual machine.The step of method with reference to disclosed in the embodiment of the present application, can be straight
Connect and be presented as that hardware processor performs completion, or completion is performed with the hardware in processor and software module combination.Software mould
Block can be located at random access memory, flash memory, read-only storage, programmable read only memory or electrically erasable programmable storage
In the ripe storage medium in this areas such as device, register.The storage medium is located at memory, and processor 830 is read in memory
Information, with reference to its hardware complete the above method the step of.To avoid repeating, it is not detailed herein.
Alternatively, the embodiment of the present application provides a kind of computer equipment, including the interior joint equipment of method 200 in Fig. 3,
The interior joint equipment of method 400 in the interior joint equipment of method 300, Fig. 8 in Fig. 7, at least one in the node device in Figure 14
Kind node device.
Those of ordinary skill in the art are it is to be appreciated that the list of each example described with reference to the embodiments described herein
Member and algorithm steps, it can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
Performed with hardware or software mode, application-specific and design constraint depending on technical scheme.Professional and technical personnel
Described function can be realized using distinct methods to each specific application, but this realization is it is not considered that exceed
Scope of the present application.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, the corresponding process in preceding method embodiment is may be referred to, will not be repeated here.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, can be with
Realize by another way.For example, device embodiment described above is only schematical, for example, the unit
Division, only a kind of division of logic function, can there is other dividing mode, such as multiple units or component when actually realizing
Another system can be combined or be desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown or
The mutual coupling discussed or direct-coupling or communication connection can be the indirect couplings by some interfaces, device or unit
Close or communicate to connect, can be electrical, mechanical or other forms.
The unit illustrated as separating component can be or may not be physically separate, show as unit
The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On NE.Some or all of unit therein can be selected to realize the scheme of the present embodiment according to the actual needs.
In addition, each functional unit in each embodiment of the application can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units it is integrated in a unit.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or its any combination real
It is existing.When implemented in software, can realize in the form of a computer program product whole or in part.The computer program
Product includes one or more computer instructions.When loading on computers and performing the computer program instructions, all or
Partly produce according to the flow or function described in the embodiment of the present invention.The computer can be all-purpose computer, special meter
Calculation machine, computer network or other programmable devices.The computer instruction can be stored in computer-readable recording medium
In, or the transmission from a computer-readable recording medium to another computer-readable recording medium, for example, the computer
Instruction can from a web-site, computer, server or data center by wired (for example, coaxial cable, optical fiber, number
Word user line (DSL)) or wireless (for example, infrared, wireless, microwave etc.) mode to another web-site, computer, server
Or data center is transmitted.The computer-readable recording medium can be any usable medium that computer can access or
Person is the data storage devices such as server, the data center integrated comprising one or more usable mediums.The usable medium can
To be magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium are (for example, solid-state
Hard disk Solid State Disk (SSD)) etc..
Described above, the only embodiment of the application, but the protection domain of the application is not limited thereto is any
Those familiar with the art can readily occur in change or replacement in the technical scope that the application discloses, and should all contain
Cover within the protection domain of the application.Therefore, the protection domain of the application should be based on the protection scope of the described claims.
Claims (27)
1. a kind of method for diagnosing faults applied to group system, it is characterised in that methods described includes:
Determine that the first node equipment in the group system does not receive the collection in the first duration since the first moment
The heartbeat packet that section point equipment in group's system is sent, the data without business information that the heartbeat packet sends for periodicity
Bag;
Inquire about the institute that at least one node device in the second duration since the second moment in the group system receives
The quantity of the packet of section point equipment transmission is stated, the packet includes the heartbeat packet and business packet, the industry
The packet with business information of transmission when business bag is needs to carry out business contact;
Sent out according to the section point equipment received in second duration since second moment inquired
The quantity of the packet sent, diagnosis whether be the section point equipment Agent failure, the Agent matches somebody with somebody
It is set to the program for indicating that the section point equipment sends the heartbeat packet.
2. according to the method for claim 1, it is characterised in that methods described also includes:
At the time of when receiving the packet every time by the interior nuclear equipment of native operating sys-tern record and send the data
The mark of the node device of bag;
At least one node device of the inquiry in the second duration since the second moment in the group system receives
The section point equipment send the packet quantity, including:
The section point equipment received is inquired about in second duration since second moment of local record
The quantity of the packet sent.
3. method according to claim 1 or 2, it is characterised in that second duration of the inquiry since the second moment
The packet that the section point equipment that at least one node device in the interior group system receives is sent
Quantity, including:
The first instruction message is sent to the 3rd node device, the 3rd node device belongs at least one node device,
The first instruction message includes the mark of the section point equipment, for indicating the local note of the 3rd node device inquiry
The data that the section point equipment received in second duration since second moment of record is sent
The quantity of bag;
The first response message of the 3rd node device feedback is received, first response message is set including the 3rd node
The packet that the section point equipment received in standby second duration since second moment is sent
Quantity.
4. according to any described method in claims 1 to 3, it is characterised in that the basis inquire from described second
The quantity for the packet that the section point equipment received in second duration that moment starts is sent, diagnosis are
The no Agent failure for the section point equipment, including:
The section point equipment received in second duration since second moment inquired is sent
The quantity of packet when being zero, be diagnosed as the Agent failure of the non-section point equipment;Or
The section point equipment received in second duration since second moment inquired is sent
The quantity of packet when being more than zero, be diagnosed as the Agent failure of the section point equipment.
5. according to the method for claim 4, it is characterised in that in the Agent failure of the non-section point equipment
When, the section point equipment can not send the heartbeat packet and the business packet;Journey is acted on behalf of in the section point equipment
During sequence failure, the section point equipment normally sends the business packet, but can not send the heartbeat packet.
6. the method according to claim 4 or 5, it is characterised in that the first node determined in the group system
Equipment does not receive the heartbeat that the section point equipment in the group system is sent in the first duration since the first moment
Bag, including:
Receive the first message that the first node equipment is sent, the first message indicates the first node equipment from described
The heartbeat packet that the section point equipment is sent is not received in first duration that first moment started;
According to the first message, determine the first node equipment in first duration since first moment not
Receive the heartbeat packet that the section point equipment is sent.
7. according to the method for claim 6, it is characterised in that methods described also includes:
The second message is sent to the first node equipment, second message indicates that the failure of the section point equipment is non-
The Agent failure of the section point equipment.
8. according to any described method in claim 1 to 7, it is characterised in that second moment is later than or equal to institute
Stated for the first moment, the finish time of second duration is later than or the finish time equal to first duration.
9. according to any described method in claim 1 to 8, it is characterised in that methods described also includes:
The failure that the section point equipment that service controller into the group system reports diagnosis to obtain occurs, so that
The service controller is controlled to the business of the section point equipment.
10. a kind of method for diagnosing faults applied to group system, it is characterised in that methods described includes:
The first instruction message that the first node equipment in the group system is sent is received, the first instruction message includes institute
State the mark of the section point equipment in group system, and the first instruction message be used to indicating inquiry local record from
The quantity for the packet that the section point equipment received in the first duration that first moment started is sent, wherein, it is described
Packet includes heartbeat packet and business packet, the packet without business information that the heartbeat packet sends for periodicity, the business
Wrap the packet with business information of transmission during to need to carry out business contact;
According to the described first instruction message, inquire about in first duration since first moment of local record and receive
The quantity for the packet that the section point equipment arrived is sent, and arrived to the first node equipment feedback query
The quantity of the packet, so that the first node equipment is judged the failure of the section point equipment.
11. according to the method for claim 10, it is characterised in that according to the described first instruction message, inquire about local note
The data that the section point equipment received in first duration since first moment of record is sent
Before the quantity of bag, methods described also includes:
At the time of when receiving the packet every time by the interior nuclear equipment of native operating sys-tern record and send the data
The mark of the node device of bag.
12. a kind of method for diagnosing faults applied to group system, it is characterised in that methods described includes:
It is determined that what the first node equipment not received in the first duration since the first moment in the group system was sent
Heartbeat packet, the packet without business information that the heartbeat packet sends for periodicity;
Section point equipment into the group system sends first message, and the first message was indicated from first moment
The heartbeat packet that the first node equipment is sent is not received in first duration started;
The second message that the section point equipment is sent is received, second message indicates the failure of the first node equipment
The Agent failure of Agent failure or the first node equipment for the non-first node equipment, the agency
Program is configured to indicate the program that the first node equipment sends the heartbeat packet;
According to second message, it is determined whether send the business packet to the first node equipment, the business packet is to need
Carry out the packet with business information of transmission during business contact.
13. according to the method for claim 12, it is characterised in that described according to second message, it is determined whether to institute
State first node equipment and send the business packet, including:
When second message indicates that the failure of the first node equipment is the Agent event of the non-first node equipment
Barrier, stop sending the business packet to the first node equipment;Or
When second message indicates that the failure of the first node equipment is the Agent failure of the first node equipment,
Normally the business packet is sent to the first node equipment.
14. a kind of node device, it is characterised in that the node device includes:
Determining unit, for determining that the first node equipment in group system does not receive in the first duration since the first moment
The heartbeat packet sent to the section point equipment in the group system, the heartbeat packet are periodically transmission without business information
Packet;
Query unit, set for inquiring about at least one node in the second duration since the second moment in the group system
The quantity for the packet that the standby section point equipment received is sent, the packet include the heartbeat packet and industry
Business bag, the business packet are the packet with business information of transmission when needing to carry out business contact;
Diagnosis unit, for according to described the received in second duration since second moment that inquires
Two node devices send the packet quantity, diagnosis whether be the section point equipment Agent failure, institute
Agent is stated to be configured to indicate the program that the section point equipment sends the heartbeat packet.
15. node device according to claim 14, it is characterised in that the query unit is specifically used for:
At the time of when receiving the packet every time by the interior nuclear equipment of native operating sys-tern record and send the data
The mark of the node device of bag;
The section point equipment received is inquired about in second duration since second moment of local record
The quantity of the packet sent.
16. node device according to claim 14, it is characterised in that the query unit is specifically used for:
The first instruction message is sent to the 3rd node device, the 3rd node device belongs at least one node device,
The first instruction message includes the mark of the section point equipment, and the first instruction message is for indicating described the
Second section received in second duration since second moment of three node devices inquiry local record
The quantity for the packet that point device is sent;
The first response message of the 3rd node device feedback is received, first response message is set including the 3rd node
The packet that the section point equipment received in standby second duration since second moment is sent
Quantity.
17. according to any described node device in claim 14 to 16, it is characterised in that the diagnosis unit, be additionally operable to:
The section point equipment received in second duration since second moment inquired is sent
The quantity of packet when being zero, be diagnosed as the Agent failure of the non-section point equipment;Or
The section point equipment received in second duration since second moment inquired is sent
The quantity of packet when being more than zero, be diagnosed as the Agent failure of the section point equipment.
18. node device according to claim 17, it is characterised in that in the Agent of the non-section point equipment
During failure, the section point equipment can not send the heartbeat packet and the business packet;In the generation of the section point equipment
When managing program mal, the section point equipment normally sends the business packet, but can not send the heartbeat packet.
19. the node device according to claim 17 or 18, it is characterised in that the determining unit is specifically used for:
Receive the first message that the first node equipment is sent, the first message indicates the first node equipment from described
The heartbeat packet that the section point equipment is sent is not received in first duration that first moment started;
According to the first message, determine the first node equipment in first duration since first moment not
Receive the heartbeat packet that the section point equipment is sent.
20. node device according to claim 19, it is characterised in that the node device also includes:
Transmitting element, for sending the second message to the first node equipment, second message indicates the section point
The failure of equipment is the Agent failure of the non-section point equipment.
21. according to any described node device in claim 14 to 20, it is characterised in that second moment be later than or
Equal to first moment, the finish time of second duration is later than or the finish time equal to first duration.
22. according to any described node device in claim 14 to 21, it is characterised in that the node device also includes:
Transmitting element, the section point equipment for reporting diagnosis to obtain for the service controller into the group system are sent out
Raw failure, so that the service controller is controlled to the business of the section point equipment.
23. a kind of node device, it is characterised in that the node device includes:
Receiving unit, the first instruction message sent for receiving the first node equipment in the group system, described first
Indicate that message includes the mark of the section point equipment in the group system, and the first instruction message is looked into for instruction
Ask the packet that the section point equipment received in the first duration since the first moment of local record is sent
Quantity, wherein, the packet includes heartbeat packet and business packet, the number without business information that the heartbeat packet sends for periodicity
According to bag, the business packet is the packet with business information of transmission when needing to carry out business contact;
Query unit, for according to the described first instruction message, inquiring about described in since first moment of local record
The quantity for the packet that the section point equipment received in first duration is sent, and set to the first node
The quantity for the packet that standby feedback query arrives, so that the first node equipment is entered to the failure of the section point equipment
Row judges.
24. node device according to claim 23, it is characterised in that in the query unit according to the described first instruction
Message, the section point equipment received is inquired about in first duration since first moment of local record
Before the quantity of the packet sent, the node device also includes:
Recording unit, at the time of during for receiving the packet every time by the interior nuclear equipment of native operating sys-tern record and
Send the mark of the node device of the packet.
25. a kind of node device, it is characterised in that the node device includes:
Determining unit, for determining not receiving the first segment in the group system in the first duration since the first moment
The heartbeat packet that point device is sent, the packet without business information that the heartbeat packet sends for periodicity;
Transmitting element, first message, the first message instruction are sent for the section point equipment into the group system
The heartbeat packet that the first node equipment is sent is not received in first duration since first moment;
Receiving unit, the second message sent for receiving the section point equipment, the second message instruction described first
The failure of node device acts on behalf of journey for the Agent failure of the non-first node equipment or the first node equipment
Sequence failure, the Agent are configured to indicate the program that the first node equipment sends the heartbeat packet;
The determining unit, it is additionally operable to according to second message, it is determined whether send the industry to the first node equipment
Business bag, the business packet are the packet with business information of transmission when needing to carry out business contact.
26. node device according to claim 25, it is characterised in that the determining unit is specifically used for:
When second message indicates that the failure of the first node equipment is the Agent event of the non-first node equipment
Barrier, it is determined that stopping sending the business packet to the first node equipment;Or
When second message indicates that the failure of the first node equipment is the Agent failure of the first node equipment,
It is determined that normally send the business packet to the first node equipment.
27. a kind of computer equipment, it is characterised in that including any described node device in claim 14 to 26.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710890513.8A CN107566219B (en) | 2017-09-27 | 2017-09-27 | Fault diagnosis method applied to cluster system, node equipment and computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710890513.8A CN107566219B (en) | 2017-09-27 | 2017-09-27 | Fault diagnosis method applied to cluster system, node equipment and computer equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107566219A true CN107566219A (en) | 2018-01-09 |
CN107566219B CN107566219B (en) | 2020-09-18 |
Family
ID=60981904
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710890513.8A Active CN107566219B (en) | 2017-09-27 | 2017-09-27 | Fault diagnosis method applied to cluster system, node equipment and computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107566219B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110380934A (en) * | 2019-07-23 | 2019-10-25 | 南京航空航天大学 | A kind of distribution redundant system heartbeat detecting method |
CN113760592A (en) * | 2021-07-30 | 2021-12-07 | 郑州云海信息技术有限公司 | Node kernel detection method and related device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100115338A1 (en) * | 2003-08-27 | 2010-05-06 | Rao Sudhir G | Reliable Fault Resolution In A Cluster |
CN102594596A (en) * | 2012-02-15 | 2012-07-18 | 华为技术有限公司 | Method and device for recognizing available partitions, and clustering network system |
CN106170782A (en) * | 2013-04-26 | 2016-11-30 | 华为技术有限公司 | The system and method for highly scalable high availability cluster is created in the MPP cluster of machine in a network |
CN106301853A (en) * | 2015-06-05 | 2017-01-04 | 华为技术有限公司 | The fault detection method of group system interior joint and device |
CN106656682A (en) * | 2017-02-27 | 2017-05-10 | 网宿科技股份有限公司 | Method, system and device for detecting cluster heartbeat |
-
2017
- 2017-09-27 CN CN201710890513.8A patent/CN107566219B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100115338A1 (en) * | 2003-08-27 | 2010-05-06 | Rao Sudhir G | Reliable Fault Resolution In A Cluster |
CN102594596A (en) * | 2012-02-15 | 2012-07-18 | 华为技术有限公司 | Method and device for recognizing available partitions, and clustering network system |
CN106170782A (en) * | 2013-04-26 | 2016-11-30 | 华为技术有限公司 | The system and method for highly scalable high availability cluster is created in the MPP cluster of machine in a network |
CN106301853A (en) * | 2015-06-05 | 2017-01-04 | 华为技术有限公司 | The fault detection method of group system interior joint and device |
CN106656682A (en) * | 2017-02-27 | 2017-05-10 | 网宿科技股份有限公司 | Method, system and device for detecting cluster heartbeat |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110380934A (en) * | 2019-07-23 | 2019-10-25 | 南京航空航天大学 | A kind of distribution redundant system heartbeat detecting method |
CN113760592A (en) * | 2021-07-30 | 2021-12-07 | 郑州云海信息技术有限公司 | Node kernel detection method and related device |
CN113760592B (en) * | 2021-07-30 | 2024-02-27 | 郑州云海信息技术有限公司 | Node kernel detection method and related device |
Also Published As
Publication number | Publication date |
---|---|
CN107566219B (en) | 2020-09-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5568471A (en) | System and method for a workstation monitoring and control of multiple networks having different protocols | |
US6529954B1 (en) | Knowledge based expert analysis system | |
CN101313280B (en) | Pool-based network diagnostic systems and methods | |
JP5033856B2 (en) | Devices and systems for network configuration assumptions | |
US6363384B1 (en) | Expert system process flow | |
US7024508B2 (en) | Bus station with integrated bus monitor function | |
US6526044B1 (en) | Real-time analysis through capture buffer with real-time historical data correlation | |
US20090161556A1 (en) | Methods and Apparatus for Fault Identification in Border Gateway Protocol Networks | |
CN107659423A (en) | Method for processing business and device | |
US7818283B1 (en) | Service assurance automation access diagnostics | |
CN102325036B (en) | The method for diagnosing faults of a kind of network system, system and device | |
WO2016095718A1 (en) | Method for detecting communication link, base station, network manager, system and storage medium | |
US7657623B2 (en) | Method and apparatus for collecting management information on a communication network | |
CN107870832A (en) | Multipath storage device based on various dimensions Gernral Check-up method | |
US20110141914A1 (en) | Systems and Methods for Providing Ethernet Service Circuit Management | |
CN111800354B (en) | Message processing method and device, message processing equipment and storage medium | |
CN101027872A (en) | Communication network management system for automatic fault repair | |
EP3316520B1 (en) | Bfd method and apparatus | |
CN107925590B (en) | The method and apparatus for analyzing network performance related with one or more parts of network | |
CN106452952B (en) | A kind of method and gateway cluster detecting group system communications status | |
CN107566219A (en) | Method for diagnosing faults, node device and computer equipment applied to group system | |
CN110519122A (en) | A kind of network quality automatic monitoring device and method based on Mtr | |
Appleby et al. | Yemanja-a layered event correlation engine for multi-domain server farms | |
Steinder et al. | Non-deterministic diagnosis of end-to-end service failures in a multi-layer communication system | |
JP3569827B2 (en) | Network system status diagnosis / monitoring device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |