CN104133744A - Arbitration system and method oriented to critical applications - Google Patents

Arbitration system and method oriented to critical applications Download PDF

Info

Publication number
CN104133744A
CN104133744A CN201410377840.XA CN201410377840A CN104133744A CN 104133744 A CN104133744 A CN 104133744A CN 201410377840 A CN201410377840 A CN 201410377840A CN 104133744 A CN104133744 A CN 104133744A
Authority
CN
China
Prior art keywords
arbitration
node
algorithm
detection module
failure message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410377840.XA
Other languages
Chinese (zh)
Inventor
周恒钊
刘璧怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201410377840.XA priority Critical patent/CN104133744A/en
Publication of CN104133744A publication Critical patent/CN104133744A/en
Pending legal-status Critical Current

Links

Landscapes

  • Hardware Redundancy (AREA)
  • Multi Processors (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses an arbitration system and method oriented to critical applications, and belongs to the technical field of system arbitration. The arbitration system at least comprises a self-detection module, a heartbeat detection module and an arbitration management module. According to the self-detection module, a detection mechanism arranged in the system and a fault detection algorithm are used for detecting the self faults of a host system of a node, and the detected fault information is transmitted to the arbitration management module. The heartbeat detection module is used for detecting the state information of an opposite end node, and transmitting the detected fault information of other nodes to the arbitration management module. The arbitration management module is used for conducting final voting on double-machine nodes through an arbitration algorithm of the arbitration management module according to the fault information transmitted by the self-detection module and the heartbeat detection module. The invention further discloses the arbitration system oriented to the critical applications. According to the arbitration system and method oriented to the critical applications, a high fault coverage rate is achieved, and the requirement for high usability of the system is met.

Description

A kind of arbitration system and method towards key application
Technical field
The invention belongs to system arbitrament technical field, relate to a kind of arbitration system and method towards key application.
Background technology
As the important means that improves computer system availability, fault-tolerant implication refers in the situation that internal system breaks down, and computing machine still can correctly be carried out assignment algorithm.Fault-tolerant computer system is realized by redundancy, and in the time that in system, node breaks down, system can be found rapidly malfunctioning node and service is switched on other nodes.In fault-tolerant computer system, arbitration mechanism finds that the performance quality of fault, tracing trouble, system reconfiguration directly affects the availability of fault-tolerant computer system.Once that is to say that computing machine breaks down, can system find in time, and making correct diagnosis and taking corresponding action is the key that affects system availability.Tradition fault-tolerant computer generally adopts heartbeat mechanism to detect the other side's state, once the machine can not receive the other side's heartbeat within the time of agreement, thinks that mistake appears in the other side, thereby the other side's service is switched and come.This man-to-man arbitration mechanism realizes simple, but take care to jump break down and machine all when normal, both sides think that mistake has appearred in the other side, now system occurs chaotic, also can cause unnecessary switching, increase handover overhead, reduced the availability of system simultaneously.Therefore the arbitration mechanism of studying fault-tolerant computer system seems particularly important.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of arbitration system and method towards key application, to solve conventional arbitration mechanism fault coverage and the low problem of fault diagnosis success ratio.
In order to solve the problems of the technologies described above, the invention discloses a kind of arbitration system towards key application, at least comprise from detection module, heartbeat detection module and arbitration management module, wherein:
From detection module, the testing mechanism and the fault detection algorithm that utilize system to carry, detect the faults itself of the host computer system of this node, and send the failure message detecting to described arbitration management module;
Heartbeat detection module, the status information of detection peer node, and the failure message of other nodes that detect sends described arbitration management module to;
Arbitration management module, according to the described failure message from detection module and the transmission of heartbeat detection module, carries out final voting by the arbitration algorithm of self to two-shipper node.
Alternatively, in above-mentioned arbitration system, described arbitration management module comprises the arbitration process unit operating on node main frame and the arbitration subsystem operating on arbitration plate, wherein:
Arbitration process unit, the requests for arbitration that receiving node main frame is initiated is also transmitted to described arbitration subsystem, receive the described failure message from detection module and the transmission of heartbeat detection module and be transmitted to described arbitration subsystem, and according to the instruction of arbitration subsystem, call corresponding arbitration algorithm, carry out computing according to described failure message, and operation result is transmitted to described arbitration subsystem;
Arbitration subsystem, receive after described requests for arbitration, the corresponding arbitration algorithm of selecting from arbitration algorithm storehouse according to the failure message receiving, and indicate described arbitration process to arbitrate computing according to selected arbitration algorithm, and the operation result sending according to described arbitration process module is determined the final voting result of two-shipper node.
Alternatively, in above-mentioned arbitration system, described arbitration process unit comprises:
Arbitration process device, drive communicating by letter between node main frame and arbitration plate, the requests for arbitration that receiving node main frame is initiated is also transmitted to described arbitration subsystem, receives the described failure message from detection module and the transmission of heartbeat detection module and is transmitted to described arbitration subsystem;
Submodule is controlled in arbitration, according to the instruction of arbitration subsystem, calls corresponding arbitration algorithm, carries out computing, and operation result is transmitted to described arbitration subsystem according to described failure message.
Alternatively, in above-mentioned arbitration system, described arbitration control submodule adopts three mould voting methods.
Alternatively, in above-mentioned arbitration system, described from detection module, the faults itself that detects the host computer system of this node refers to:
Detect hardware and the service process failure of the host computer system of this node.
The invention also discloses a kind of referee method towards key application, the method comprises:
The testing mechanism and the fault detection algorithm that utilize system to carry, detect the faults itself of the host computer system of this node;
Detect the failure message of peer node;
According to the failure message of the faults itself of the host computer system of detected node and peer node, by the arbitration algorithm of self, two-shipper node is carried out to final voting.
Alternatively, in said method, according to the failure message of the faults itself of the host computer system of detected node and peer node, by the arbitration algorithm of self, two-shipper node is carried out to final voting and refers to:
Receiving node main frame initiate requests for arbitration time, the corresponding arbitration algorithm of selecting from arbitration algorithm storehouse according to the failure message of the faults itself of the host computer system of detected node and peer node;
Call selected arbitration algorithm, the faults itself of the host computer system to this node detecting and the failure message of peer node carry out computing, determine the final voting result of two-shipper node according to operation result.
Alternatively, in said method, the faults itself of the host computer system to this node detecting and the failure message of peer node carry out after computing, again carry out three mould votings to determine the final voting result of two-shipper node.
Alternatively, in said method, the faults itself that detects the host computer system of this node refers to:
Detect hardware and the service process failure of the host computer system of this node.
The arbitration system design proposal towards crucial applied host machine that present techniques scheme proposes, whole arbitration software systems adopt with different levels structure, have very high fault coverage, have met the requirement of system high-available.Based on the arbitration system of arbitration process device, can effectively solve traditional fault-tolerant computer and easily occur the shortcoming of two-shipper coreference, in the time that system breaks down, arbitrate plate and can locate rapidly fault machine, thereby greatly improved the availability of fault-tolerant computer system.
Brief description of the drawings
Fig. 1 is the overall construction drawing of arbitration system software of the present invention;
Fig. 2 is the workflow diagram of the present invention from detection module;
Fig. 3 is the structural drawing of heartbeat detection module of the present invention;
Fig. 4 is arbitration process modular structure figure of the present invention;
Fig. 5 is the interaction diagrams of arbitration subsystem of the present invention and arbitration process module.
Embodiment
For making the object, technical solutions and advantages of the present invention clearer, below in connection with accompanying drawing, technical solution of the present invention is described in further detail.It should be noted that, in the situation that not conflicting, the feature in the application's embodiment and embodiment can combine arbitrarily mutually.
Embodiment 1
While breaking down in order to solve traditional fault-tolerant computer system, fault machine is located to inaccurate problem, the present embodiment provides a kind of arbitration system towards key application, at least comprise that its framework as shown in Figure 1 from detection module, heartbeat detection module and arbitration management module.
From detection module, in order to improve the failure checking cover ratio of system, and guarantee reduces detection overhead as much as possible, introduce herein from detection module, testing mechanism and fault detection algorithm that this module utilizes system to carry, the faults itself of system can be detected, send the failure message detecting to arbitration management module, using the part foundation as system diagnostics and switching.
The faults itself that wherein, can detect from detection module comprises as faults such as power supply, CPU, fan, CPU board, arbitration system hardware.Particularly, in the present embodiment, can directly read the malfunction of above-mentioned hardware from detection module.And for system service process, duty that can detection procedure by detection algorithm from detection module is containing startup, operation, end and the abort etc. of process.
Particularly, mainly comprise that from the testing process of detection module hardware system detects and service processes detects.The workflow diagram from detection module as shown in Figure 2, if find hardware fault, under double computer cooperation management, calls handover management flow process, and all services of this machine operation are switched to peer host.Meanwhile, notify the error message of keeper's particular hardware equipment, keeper is fixed a breakdown as early as possible.The service processes moving on this main frame of service detection module quantitative check, makes mistakes or during by abnormal end, calls Service Management flow process when lookup service, restarts corresponding service, carries out hand-off process if cannot restart by double computer cooperation.
Heartbeat detection module, because it is realized simply, system overhead is little, high reliability is comparatively conventional in group system.Heartbeat detection mechanism Heartbeat in the project of the increasing income Linux-HA of the present embodiment reference Linux, design meets the heartbeat detection mechanism of fault-tolerant computer system requirement.Can obtain the failure message of other nodes by heartbeat detection, send arbitration management module to, think that system diagnostics and switching provide foundation.
Particularly, shown in the structural drawing 3 of heartbeat detection module, system can detect by periodically sending heartbeat message the other side's running status, the running state information that comprises the machine in the heartbeat message sending, comprises timestamp, system status information, application services status information, network state information and hardware status information.
Arbitration management module, according to the described failure message from detection module and the transmission of heartbeat detection module, carries out final voting by the arbitration algorithm of self to two-shipper node.
In the present embodiment, arbitration management module is divided into again two parts: operate in the arbitration process unit on main frame and operate in the arbitration subsystem on arbitration plate.
Arbitration process unit, is mainly to coordinate arbitration subsystem to carry out detection system situation, and abort situation is located accurately.Figure 4 shows that the basic structure of arbitration process unit, it is divided into arbitration process device and submodule is controlled in arbitration, wherein, arbitration process device drives to be responsible for main frame and to have arbitrated communicating by letter of plate, the requests for arbitration that receiving node main frame is initiated is also transmitted to described arbitration subsystem, receives the described failure message from detection module and the transmission of heartbeat detection module and is transmitted to described arbitration subsystem.Submodule is controlled in arbitration, triggers requests for arbitration to arbitration subsystem, according to the instruction of arbitration subsystem, calls corresponding arbitration algorithm, carries out computing, and operation result is transmitted to described arbitration subsystem according to described failure message.
Arbitration subsystem starts after arbitration, and submodule is controlled in instruction arbitration, calls the corresponding arbitration algorithm (being the arbitration algorithm corresponding to type selecting of corresponding failure message) in arbitration algorithm storehouse, and operation result is returned to arbitration subsystem.If once arbitration can not be located fault machine, need repeatedly to arbitrate by moving different arbitration algorithms.The arbitration suggestion of arbitration subsystem is controlled three mould votings of submodule via arbitration, to realize the final voting of two-shipper node.
Wherein, arbitration algorithm storehouse is a series of diagnostic routines that operate on main frame, and the selection of algorithm should be followed the principle of low time overhead and high fault coverage.Therefore the present embodiment has been selected the detections such as communication line, process, internal memory and CPU.These diagnosis respectively corresponding unique arbitrations of trace routine number (being corresponding a kind of arbitration algorithm).
In addition, in preferred version, arbitration control submodule carries out can also sending final voting result to remote port after three mould votings, takes corresponding action by remote port according to arbitration result.
Arbitration subsystem is the program operating on the single-chip microcomputer of arbitrating plate, the requests for arbitration being used for to the proposition of response main frame, corresponding arbitration algorithm is specified in the error message transmitting according to two main frames, and algorithm operation result is judged, and submits final arbitration suggestion to two main frames.
Particularly, after arbitration plate powers up, through initialization and self-inspection work, arbitration subsystem has just entered ready state.In the time that arbitration subsystem receives the requests for arbitration of certain main frame, arbitration subsystem responds this requests for arbitration, and three arbitration nodes are put to the vote synchronously.After having at least two nodes to receive requests for arbitration that is decide by vote successfully, subsystem just enters arbitration starting state, and notice need to be arbitrated machine.Now arbitration subsystem is returned to arbitration algorithm number (calculation religious name corresponding to arbitration algorithm of selecting from arbitration algorithm storehouse) to main frame, the arbitration algorithm that requires its operation to specify.The arbitration algorithm operation result that arbitration node returns according to main frame, arbitration subsystem is carried out three mould votings again, and voting result is returned to two main frames, enters arbitration done state.Subsequently, system will be to the isolation that resets of fault main frame, and normal main frame will be shared peripheral hardware and switch and come, and system service is taken over and come, and sends alerting signal to system simultaneously, carries out the maintenance of fault main frame.This process as shown in Figure 5.
It should be noted that, above-mentioned arbitration management module is received after requests for arbitration, just trigger above-mentioned arbitration operation flow process, and the requests for arbitration that arbitration management module is received can be sent in real time by main frame (being this node main frame), or periodically sent by main frame.The present embodiment does not impose any restrictions the concrete mode that sends requests for arbitration.
With an instantiation, the process that realizes said system is described.The present embodiment application and trouble implantttion technique, to simulated injection various faults in dual host fault tolerance system, detects in dual systems operational process, the handoff procedure of service and application, thus effectively detect the availability, particularly arbitration function of arbitration system.The fault type that experiment is injected comprises multiple, mainly contains network service fault, comprises data-bag lost, packet delay, service processes fault, memory failure injection, cpu fault injection etc.Wherein each process is divided into without arbitration management module and has arbitration management module to test, and unloads arbitration process module complete by installation.
By injecting service processes fault, cause the service on target machine to close, because dual host fault tolerance system is to this service monitoring, now, heartbeat checking module and arbitration management module can successfully check and switch.Carry out simulation system deadlock and restart by injecting cpu fault and memory failure, now starting arbitration checking module and find successful detection failure, and switch.
Embodiment 2
The present embodiment provides a kind of referee method towards key application, and the method comprises following operation:
The testing mechanism and the fault detection algorithm that utilize system to carry, detect the faults itself of the host computer system of this node;
Detect the failure message of peer node;
According to the failure message of the faults itself of the host computer system of detected node and peer node, by the arbitration algorithm of self, two-shipper node is carried out to final voting.
Wherein, according to the failure message of the faults itself of the host computer system of detected node and peer node, by the arbitration algorithm of self, two-shipper node is carried out to final voting and refers to:
Receiving node main frame initiate requests for arbitration time, the corresponding arbitration algorithm of selecting from arbitration algorithm storehouse according to the failure message of the faults itself of the host computer system of detected node and peer node;
Call selected arbitration algorithm, the faults itself of the host computer system to this node detecting and the failure message of peer node carry out computing, determine the final voting result of two-shipper node according to operation result.
In preferred version, the faults itself of the above-mentioned host computer system to this node detecting and the failure message of peer node carry out after arithmetic operation, can again carry out three mould votings to determine the final voting result of two-shipper node.Concrete,, after initialization and self-inspection work, while receiving the requests for arbitration of certain main frame, return to arbitration algorithm number, the arbitration algorithm that requires its operation to specify.The arbitration algorithm operation computing that arbitration node returns according to main frame, and then carry out three mould votings, voting result is returned to two main frames, enter arbitration done state.Subsequently, system will be to the isolation that resets of fault main frame, and normal main frame will be shared peripheral hardware and switch and come, and system service is taken over and come, and sends alerting signal to system simultaneously, carries out the maintenance of fault main frame.This shows, this kind of preferred version improved the reliability of arbitration result greatly.
In addition, the faults itself that detects the host computer system of this node generally comprises hardware and the service process failure of the host computer system that detects this node.
The arbitration system that is also noted that the responsible above-described embodiment 1 of said method is realized.Now, other details of operations of said method can, referring to the corresponding contents of above-described embodiment 1, not repeat them here.
Can find out from above-described embodiment, present techniques scheme is in order to solve conventional arbitration mechanism fault coverage and the low problem of fault diagnosis success ratio, for crucial applied host machine system, propose a kind of arbitration mechanism based on arbitration process device, and designed arbitration system and arbitration algorithm.Wherein arbitration process device uses the fault-tolerant design technique of triple-modular redundancy system and chip-scale, arbitration algorithm adopts hierarchical approaches, adopt from the malfunction monitoring mechanism detecting and heartbeat inspecting combines simultaneously, effectively solved Single Point of Faliure and be detected as the problem that power is low.
One of ordinary skill in the art will appreciate that all or part of step in said method can carry out instruction related hardware by program and complete, described program can be stored in computer-readable recording medium, as ROM (read-only memory), disk or CD etc.Alternatively, all or part of step of above-described embodiment also can realize with one or more integrated circuit.Correspondingly, the each module/unit in above-described embodiment can adopt the form of hardware to realize, and also can adopt the form of software function module to realize.The application is not restricted to the combination of the hardware and software of any particular form.
The above, be only preferred embodiments of the present invention, is not intended to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any amendment of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (9)

1. towards an arbitration system for key application, it is characterized in that, at least comprise from detection module, heartbeat detection module and arbitration management module, wherein:
From detection module, the testing mechanism and the fault detection algorithm that utilize system to carry, detect the faults itself of the host computer system of this node, and send the failure message detecting to described arbitration management module;
Heartbeat detection module, the status information of detection peer node, and the failure message of other nodes that detect sends described arbitration management module to;
Arbitration management module, according to the described failure message from detection module and the transmission of heartbeat detection module, carries out final voting by the arbitration algorithm of self to two-shipper node.
2. arbitration system as claimed in claim 1, is characterized in that, described arbitration management module comprises the arbitration process unit operating on node main frame and the arbitration subsystem operating on arbitration plate, wherein:
Arbitration process unit, the requests for arbitration that receiving node main frame is initiated is also transmitted to described arbitration subsystem, receive the described failure message from detection module and the transmission of heartbeat detection module and be transmitted to described arbitration subsystem, and according to the instruction of arbitration subsystem, call corresponding arbitration algorithm, carry out computing according to described failure message, and operation result is transmitted to described arbitration subsystem;
Arbitration subsystem, receive after described requests for arbitration, the corresponding arbitration algorithm of selecting from arbitration algorithm storehouse according to the failure message receiving, and indicate described arbitration process to arbitrate computing according to selected arbitration algorithm, and the operation result sending according to described arbitration process module is determined the final voting result of two-shipper node.
3. arbitration system as claimed in claim 2, is characterized in that, described arbitration process unit comprises:
Arbitration process device, drive communicating by letter between node main frame and arbitration plate, the requests for arbitration that receiving node main frame is initiated is also transmitted to described arbitration subsystem, receives the described failure message from detection module and the transmission of heartbeat detection module and is transmitted to described arbitration subsystem;
Submodule is controlled in arbitration, according to the instruction of arbitration subsystem, calls corresponding arbitration algorithm, carries out computing, and operation result is transmitted to described arbitration subsystem according to described failure message.
4. arbitration system as claimed in claim 3, is characterized in that,
Described arbitration control submodule adopts three mould voting methods.
5. the arbitration system as described in claim 1 to 4 any one, is characterized in that,
Described from detection module, the faults itself that detects the host computer system of this node refers to:
Detect hardware and the service process failure of the host computer system of this node.
6. towards a referee method for key application, it is characterized in that, the method comprises:
The testing mechanism and the fault detection algorithm that utilize system to carry, detect the faults itself of the host computer system of this node;
Detect the failure message of peer node;
According to the failure message of the faults itself of the host computer system of detected node and peer node, by the arbitration algorithm of self, two-shipper node is carried out to final voting.
7. method as claimed in claim 6, is characterized in that, according to the failure message of the faults itself of the host computer system of detected node and peer node, by the arbitration algorithm of self, two-shipper node is carried out to final voting and refers to:
Receiving node main frame initiate requests for arbitration time, the corresponding arbitration algorithm of selecting from arbitration algorithm storehouse according to the failure message of the faults itself of the host computer system of detected node and peer node;
Call selected arbitration algorithm, the faults itself of the host computer system to this node detecting and the failure message of peer node carry out computing, determine the final voting result of two-shipper node according to operation result.
8. method as claimed in claim 7, is characterized in that, the faults itself of the host computer system to this node detecting and the failure message of peer node carry out after computing, again carries out three mould votings to determine the final voting result of two-shipper node.
9. the method as described in claim 6 to 8 any one, is characterized in that, the faults itself that detects the host computer system of this node refers to:
Detect hardware and the service process failure of the host computer system of this node.
CN201410377840.XA 2014-08-01 2014-08-01 Arbitration system and method oriented to critical applications Pending CN104133744A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410377840.XA CN104133744A (en) 2014-08-01 2014-08-01 Arbitration system and method oriented to critical applications

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410377840.XA CN104133744A (en) 2014-08-01 2014-08-01 Arbitration system and method oriented to critical applications

Publications (1)

Publication Number Publication Date
CN104133744A true CN104133744A (en) 2014-11-05

Family

ID=51806429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410377840.XA Pending CN104133744A (en) 2014-08-01 2014-08-01 Arbitration system and method oriented to critical applications

Country Status (1)

Country Link
CN (1) CN104133744A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105487945A (en) * 2016-02-19 2016-04-13 中国航天科技集团公司第五研究院第五一三研究所 Self-monitoring fault-tolerant control method of non-similar dual-redundancy four processors
CN107229238A (en) * 2016-03-23 2017-10-03 通用汽车环球科技运作有限责任公司 Framework and device for the advanced arbitration in embedded Control
US10055315B2 (en) 2016-06-29 2018-08-21 Microsoft Technology Licensing, Llc Failure monitoring in distributed computing systems
US10114712B2 (en) 2016-06-29 2018-10-30 Microsoft Technology Licensing, Llc Failure detection via implicit leases in distributed computing systems
US10467126B2 (en) 2017-03-31 2019-11-05 Microsoft Technology Licensing, Llc Scenarios based fault injection

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6785892B1 (en) * 2000-06-23 2004-08-31 Unisys Communications between partitioned host processors and management processor
US20040221198A1 (en) * 2003-04-17 2004-11-04 Vecoven Frederic Louis Ghislain Gabriel Automatic error diagnosis
CN101281483A (en) * 2008-05-12 2008-10-08 北京邮电大学 Double-machine redundant tolerant system and redundant switching method thereof
CN202004776U (en) * 2011-01-07 2011-10-05 北京捷世伟业电子科技有限公司 Redundant hot swapping system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6785892B1 (en) * 2000-06-23 2004-08-31 Unisys Communications between partitioned host processors and management processor
US20040221198A1 (en) * 2003-04-17 2004-11-04 Vecoven Frederic Louis Ghislain Gabriel Automatic error diagnosis
CN101281483A (en) * 2008-05-12 2008-10-08 北京邮电大学 Double-machine redundant tolerant system and redundant switching method thereof
CN202004776U (en) * 2011-01-07 2011-10-05 北京捷世伟业电子科技有限公司 Redundant hot swapping system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105487945A (en) * 2016-02-19 2016-04-13 中国航天科技集团公司第五研究院第五一三研究所 Self-monitoring fault-tolerant control method of non-similar dual-redundancy four processors
CN105487945B (en) * 2016-02-19 2017-11-03 中国航天科技集团公司第五研究院第五一三研究所 A kind of non-similar pair of remaining four machine monitors fault tolerant control method certainly
CN107229238A (en) * 2016-03-23 2017-10-03 通用汽车环球科技运作有限责任公司 Framework and device for the advanced arbitration in embedded Control
CN107229238B (en) * 2016-03-23 2019-08-30 通用汽车环球科技运作有限责任公司 The method for arbitrating the conflict output in redundancy control system
US10055315B2 (en) 2016-06-29 2018-08-21 Microsoft Technology Licensing, Llc Failure monitoring in distributed computing systems
US10114712B2 (en) 2016-06-29 2018-10-30 Microsoft Technology Licensing, Llc Failure detection via implicit leases in distributed computing systems
US10467126B2 (en) 2017-03-31 2019-11-05 Microsoft Technology Licensing, Llc Scenarios based fault injection

Similar Documents

Publication Publication Date Title
CN105659215B (en) A kind of fault handling method, relevant apparatus and computer
CN104133744A (en) Arbitration system and method oriented to critical applications
KR101908465B1 (en) Fault management method, entity and system
US11687391B2 (en) Serializing machine check exceptions for predictive failure analysis
EP2518627B1 (en) Partial fault processing method in computer system
CN107347018A (en) A kind of triple redundance 1553B bus dynamic switching methods
KR101331935B1 (en) Method and system of fault diagnosis and repair using based-on tracepoint
CN107729190B (en) IO path failover processing method and system
US7925922B2 (en) Failover method and system for a computer system having clustering configuration
CN109634171B (en) Dual-core dual-lock-step two-out-of-two framework and safety platform thereof
CN102664755B (en) Control channel fault determining method and device
CN102026042A (en) Keep-alive and self-healing method and device for advanced telecom computing architecture control surface
Lu et al. Iaso: an autonomous fault-tolerant management system for supercomputers
CN115408240B (en) Redundancy system active-standby method, redundancy system active-standby device, redundancy system active-standby equipment and redundancy system storage medium
Lee et al. Fault localization in NFV framework
KR101827052B1 (en) Distributed system management method for operating information processing function in battle system of naval vessel with multiple modes and system thereof
US10089200B2 (en) Computer apparatus and computer mechanism
Cisco Troubleshooting and Fault Management
JPWO2008120383A1 (en) Information processing apparatus and failure processing method
CN117311769B (en) Server log generation method and device, storage medium and electronic equipment
EP4361817A1 (en) 2*2oo2 security system based on cloud platform
RU2694008C1 (en) Method for dynamic reconfiguration of computing systems of modular architecture
KR100604552B1 (en) Method for dealing with system troubles through joint-owning of state information and control commands
RU2559767C2 (en) Method of providing fault-tolerance computer system based on task replication, self-reconfiguration and self-management of degradation
CN117407863A (en) Safety architecture for automatic driving, automatic driving system and vehicle

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20141105