CN102541697A - Switching method for processing fault of dual-redundancy computer - Google Patents

Switching method for processing fault of dual-redundancy computer Download PDF

Info

Publication number
CN102541697A
CN102541697A CN201010620061XA CN201010620061A CN102541697A CN 102541697 A CN102541697 A CN 102541697A CN 201010620061X A CN201010620061X A CN 201010620061XA CN 201010620061 A CN201010620061 A CN 201010620061A CN 102541697 A CN102541697 A CN 102541697A
Authority
CN
China
Prior art keywords
slave unit
main equipment
fault
master
control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201010620061XA
Other languages
Chinese (zh)
Inventor
刘文学
刘硕
向建军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AVIC No 631 Research Institute
Original Assignee
AVIC No 631 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AVIC No 631 Research Institute filed Critical AVIC No 631 Research Institute
Priority to CN201010620061XA priority Critical patent/CN102541697A/en
Publication of CN102541697A publication Critical patent/CN102541697A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Hardware Redundancy (AREA)

Abstract

The invention discloses a switching method for processing a fault of a dual-redundancy computer, which comprises the following steps that: 1, a system is powered up, a master device carries out control inhibition and a slave device carries out control inhibition; 2, the slave device carries out real-time monitoring on the state of the system; 3, the slave device judges whether the master device normally works, if yes, the master device carries out control output and the slave device carries out backup monitoring, and if no, the step 4 is executed; 4, the master equipment notifies the slave equipment fault information; and 5, the master device carries out control inhibition and the slave device implements switching control output and simultaneously monitors the master device. According to the switching method for processing the fault of the dual-redundancy computer, which is disclosed by the invention, the fault logicality is implemented by adopting a discrete magnitude signal and a FPGA (Field Programmable Gate Array), the fault judgment, the fault information transmission and the master and slave function switching of the master and slave devices are completed in real time.

Description

A kind of pair of changing method that the remaining computer failure is handled
Technical field
The present invention relates to a kind of fault handling method of computing machine, relate in particular to the fault handling changing method of a kind of pair of remaining computing machine.
Background technology
In order to improve the reliability of system, many computer systems are to the working method of entire equipment or the two remaining Hot Spares of part key function employing, when master malfunction or after breaking down; Alternate device can determine main equipment and break down and switch automatically, continues operate as normal with the assurance system, and embedded computer all compares harshness to reliability power consumption and volume requirement now; The method must back up equipment; And after system breaks down, just can find the fault of main equipment through the equipment of backup, and switching, the extra resource that needs is very big; Power consumption is big, and operational process, operating process are quite complicated.
Summary of the invention
In order to solve existing technical matters in the background technology; The present invention proposes the changing method that a kind of pair of remaining computer failure handled; Adopt discrete magnitude signal and FPGA to realize fault logic, fault judgement, failure message transmission and the principal and subordinate's function of accomplishing master-slave equipment are in real time switched.
Technical solution of the present invention is: a kind of pair of changing method that the remaining computer failure is handled, and its special character is: said method comprising the steps of:
1) system is powered on, main equipment control forbids that slave unit control is forbidden;
2) slave unit is monitored system state in real time;
3) slave unit judges whether main equipment is in proper working order; If, then main equipment control output, slave unit backs up monitoring; If not, then carry out step 4);
4) main equipment is notified slave unit with failure message;
5) main equipment control is forbidden, slave unit is realized switching controls output, and simultaneously to master device monitors.
Above-mentioned steps 5) also comprises step 6) afterwards if slave unit goes wrong in system's operational process, then carry out step 6) and judge whether to force to switch, if then switch to main equipment control output.
Above-mentioned steps 3) slave unit judges whether concrete steps in proper working order are main equipment:
3.1) slave unit detects main equipment, at first judges whether power-fail of main equipment, if then the fault register of main equipment is changed to ' 0 ' by ' 1 ', if not, then carry out step 3.2);
3.2) judge that whether main equipment has house dog to report to the police, if then the fault register of main equipment is changed to ' 0 ' by ' 1 ', if not, then carry out step 3.3);
3.3) judge whether software BIT mistake of main equipment, if then the fault register of main equipment is changed to ' 0 ' by ' 1 ', if not, then returns step 3) and carry out again.
Above-mentioned steps 4) concrete steps of notice are in:
4.1) main equipment becomes low level with C_Go/Nogo discrete magnitude fault-signal by high level through hardware;
4.2) this C_Go/Nogo discrete magnitude isolates the back through light lotus root and become high level from low level;
4.3) the high level fault signal in the step 2 sends into the B_gonogo_in input of slave unit, notice slave unit master-failure.
Above-mentioned steps 5) main equipment control is forbidden, the concrete steps that slave unit switches are:
5.1) the application program master-failure of slave unit through the interrupt notification slave unit, need switch to slave unit;
5.2) state of slave unit Hardware configuration internal fault switching controls register is changed to ' 0 ', is configured to switching state;
5.3) the B_valid signal of switching controls register output is low level, expression is carried out system's control by slave unit;
5.4) after the B_valid signal of output isolated through the light lotus root, the C_valid_in input control main equipment of sending into main equipment switched, main equipment will export to control through internal hardware and forbid;
5.5) the inner switching controls register controlled switch logic of slave unit enables the output control of slave unit.
Main equipment carries out system's control under the method normal condition of the present invention, and slave unit is monitored backup; During master-failure system task is switched to the slave unit operation, carry out system's control by slave unit; After system gets into slave unit backup state of a control,, can system task be switched to the main equipment operation from slave unit by force through forcing handoff functionality according to the needs of reality.Can make and realize fault judgement, failure message transmission and the switching of principal and subordinate's function between the master-slave equipment; The complete electrical isolation of this method master-slave equipment adopts discrete magnitude signal and FPGA to realize fault logic, and system works mechanism is clear, control realizes simple, stable and reliable operation.
Description of drawings
Fig. 1 is a method flow diagram of the present invention;
Fig. 2 is fault handling of the present invention and switches the specific embodiment synoptic diagram;
Fig. 3 is a slave unit fault judgement process flow diagram in the method for the present invention;
Fig. 4 is from the schematic diagram of active and standby signalling trouble in the method for the present invention;
Fig. 5 is the schematic flow sheet that slave unit switches among the present invention;
Embodiment
Referring to Fig. 1, Fig. 2, the of the present invention pair of changing method that the remaining computer failure is handled may further comprise the steps:
1) system is powered on, main equipment control forbids that slave unit control is forbidden;
2) slave unit is monitored system state in real time;
3) slave unit judges whether main equipment is in proper working order; If, then main equipment control output, slave unit backs up monitoring; If not, then carry out step 4);
Referring to Fig. 3, slave unit judges whether concrete steps in proper working order are main equipment:
3.1) slave unit detects main equipment, at first judges whether power-fail of main equipment, if then the fault register of main equipment is changed to ' 0 ' by ' 1 ', if not, then carry out step 3.2);
3.2) judge that whether main equipment has house dog to report to the police, if then the fault register of main equipment is changed to ' 0 ' by ' 1 ', if not, then carry out step 3.3);
3.3) judge whether software BIT mistake of main equipment, if then the fault register of main equipment is changed to ' 0 ' by ' 1 ', if not, then returns step 3) and carry out again.
Referring to Fig. 4,4) main equipment notice slave unit, the concrete steps of notice are:
4.1) main equipment becomes low level with C_Go/Nogo discrete magnitude fault-signal by high level through hardware;
4.2) this C_Go/Nogo discrete magnitude isolates the back through light lotus root and become high level from low level;
4.3) step 2) and in the high level fault signal send into the B_gonogo_in input of slave unit, notice slave unit master-failure.
Referring to Fig. 5, main equipment control forbids that slave unit is realized switching controls output, and simultaneously to master device monitors,
Concrete implementation is:
5.1) the application program master-failure of slave unit through the interrupt notification slave unit, need switch to slave unit;
5.2) state of slave unit Hardware configuration internal fault switching controls register is changed to ' 0 ', is configured to switching state;
5.3) the B_valid signal of switching controls register output is low level, expression is carried out system's control by slave unit;
5.4) after the B_valid signal of output isolated through the light lotus root, the C_valid_in input control main equipment of sending into main equipment switched, main equipment will export to control through internal hardware and forbid;
5.5) the inner switching controls register controlled switch logic of slave unit enables the output control of slave unit.
6), then carry out step 6) and judge whether to force to switch, if then switch to main equipment control output if slave unit goes wrong in system's operational process.
Carry out the signal parameter that method that main equipment and slave unit switch is used among the present invention, referring to table one;
Table one
Figure BDA0000042452180000041

Claims (5)

1. changing method that two remaining computer failures are handled is characterized in that: said method comprising the steps of:
1) system is powered on, main equipment control forbids that slave unit control is forbidden;
2) slave unit is monitored system state in real time;
3) slave unit judges whether main equipment is in proper working order; If, then main equipment control output, slave unit backs up monitoring; If not, then carry out step 4);
4) main equipment is notified slave unit with failure message;
5) main equipment control is forbidden, slave unit is realized switching controls output, and simultaneously to master device monitors.
2. the according to claim 1 pair of changing method that the remaining computer failure is handled; It is characterized in that: also comprise step 6) after the said step 5) if slave unit goes wrong in system's operational process; Then carrying out step 6) judges whether to force to switch; If then switch to main equipment control output.
3. the according to claim 1 and 2 pair of changing method that the remaining computer failure is handled, it is characterized in that: said step 3) slave unit judges whether concrete steps in proper working order are main equipment:
3.1) slave unit detects main equipment, at first judges whether power-fail of main equipment, if then the fault register of main equipment is changed to ' 0 ' by ' 1 ', if not, then carry out step 3.2);
3.2) judge that whether main equipment has house dog to report to the police, if then the fault register of main equipment is changed to ' 0 ' by ' 1 ', if not, then carry out step 3.3);
3.3) judge whether software BIT mistake of main equipment, if then the fault register of main equipment is changed to ' 0 ' by ' 1 ', if not, then returns step 3) and carry out again.
4. the according to claim 3 pair of changing method that the remaining computer failure is handled, it is characterized in that: the concrete steps of notifying in the said step 4) are:
4.1) main equipment becomes low level with C_Go/Nogo discrete magnitude fault-signal by high level through hardware;
4.2) this C_Go/Nogo discrete magnitude isolates the back through light lotus root and become high level from low level;
4.3) step 2) and in the high level fault signal send into the B_gonogo_in input of slave unit, notice slave unit master-failure.
5. the according to claim 4 pair of changing method that the remaining computer failure is handled is characterized in that: said step 5) main equipment control forbids that the concrete steps that slave unit switches are:
5.1) the application program master-failure of slave unit through the interrupt notification slave unit, need switch to slave unit;
5.2) state of slave unit Hardware configuration internal fault switching controls register is changed to ' 0 ', is configured to switching state;
5.3) the B_valid signal of switching controls register output is low level, expression is carried out system's control by slave unit;
5.4) after the B_valid signal of output isolated through the light lotus root, the C_valid_in input control main equipment of sending into main equipment switched, main equipment will export to control through internal hardware and forbid;
5.5) the inner switching controls register controlled switch logic of slave unit enables the output control of slave unit.
CN201010620061XA 2010-12-31 2010-12-31 Switching method for processing fault of dual-redundancy computer Pending CN102541697A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010620061XA CN102541697A (en) 2010-12-31 2010-12-31 Switching method for processing fault of dual-redundancy computer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010620061XA CN102541697A (en) 2010-12-31 2010-12-31 Switching method for processing fault of dual-redundancy computer

Publications (1)

Publication Number Publication Date
CN102541697A true CN102541697A (en) 2012-07-04

Family

ID=46348647

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010620061XA Pending CN102541697A (en) 2010-12-31 2010-12-31 Switching method for processing fault of dual-redundancy computer

Country Status (1)

Country Link
CN (1) CN102541697A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103533588A (en) * 2013-09-27 2014-01-22 宇龙计算机通信科技(深圳)有限公司 Method for switching current operation application in sub machine to host machine for operation, and mobile terminal for method
CN103678049A (en) * 2013-12-10 2014-03-26 中国航空工业集团公司第六三一研究所 Fault isolating and switching method of redundancy fault-tolerant computer system
CN103853622A (en) * 2012-11-28 2014-06-11 中国航空工业集团公司第六三一研究所 Control method of dual redundancies capable of being backed up mutually
CN104615510A (en) * 2015-03-09 2015-05-13 中国科学院自动化研究所 Programmable device-based dual-mode redundant fault-tolerant method
CN105608039A (en) * 2015-12-10 2016-05-25 中国航空工业集团公司西安航空计算技术研究所 FIFO and ARINC659 bus based dual-redundancy computer period control system and method
CN105974879A (en) * 2016-06-27 2016-09-28 北京广利核系统工程有限公司 Redundancy control equipment of digital instrument control system, digital instrument control system and control method
CN106597944A (en) * 2016-12-20 2017-04-26 中国船舶重工集团公司第七�三研究所 Dual-DSP-controller seamless switching system and switching method
CN107105337A (en) * 2017-02-27 2017-08-29 深圳市卓翼科技股份有限公司 Radio multimedium playing method and device
CN107579918A (en) * 2017-08-15 2018-01-12 新华三技术有限公司 The maintaining method and device of a kind of neighborhood
CN107957692A (en) * 2016-10-14 2018-04-24 中国石油天然气集团公司 Controller redundancy approach, apparatus and system
CN108021406A (en) * 2017-11-03 2018-05-11 中国航空工业集团公司西安航空计算技术研究所 A kind of double remaining Hot Spare cpu systems suitable for airborne computer
CN109698775A (en) * 2018-11-21 2019-04-30 中国航空工业集团公司洛阳电光设备研究所 A kind of dual-machine redundancy backup system based on real-time status detection
CN109814519A (en) * 2017-11-22 2019-05-28 成都凯天电子股份有限公司 The method of remaining switching dual-redundancy avionics apparatus output signal
CN110442073A (en) * 2019-08-30 2019-11-12 四川腾盾科技有限公司 A kind of redundance Aircraft Management Computer MIO board channel fault logic judgment method
CN110825206A (en) * 2019-11-13 2020-02-21 沧州师范学院 Computer integration system and automatic fault switching method
CN111142945A (en) * 2019-11-28 2020-05-12 中国航空工业集团公司西安航空计算技术研究所 Dynamic switching method for master channel and slave channel of dual-redundancy computer
WO2020143243A1 (en) * 2019-01-07 2020-07-16 北京全路通信信号研究设计院集团有限公司 Dual-system hot backup switching method and system applied to automatic running system of train
CN112182876A (en) * 2020-09-25 2021-01-05 西安微电子技术研究所 Dual-redundancy steering engine channel fault switching system and logic design method
CN113050407A (en) * 2021-03-04 2021-06-29 中国航空工业集团公司西安航空计算技术研究所 Method for determining and switching master controller and slave controller of distributed processing system
CN113485089A (en) * 2021-07-05 2021-10-08 浙江胄天科技股份有限公司 Control method for monitoring system in mobile offshore environment monitoring equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1373427A (en) * 2001-03-01 2002-10-09 深圳市中兴通讯股份有限公司 Device and method for implementing dual system slots
CN1673985A (en) * 2004-03-26 2005-09-28 华为技术有限公司 Connection circuit for master equipment and multi-slave equipment and method for generating answer signal thereof
CN1815908A (en) * 2006-03-02 2006-08-09 迈普(四川)通信技术有限公司 Telecommunication apparatus master-slave switching method and telecommunication apparatus thereof
CN1949133A (en) * 2005-10-10 2007-04-18 纬创资通股份有限公司 System of controlling electric source supply according to electricity supplying situation and its controlling method and device
CN101127653A (en) * 2006-08-14 2008-02-20 中兴通讯股份有限公司 Ethernet loop device with backup and implementation method for master device backup

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1373427A (en) * 2001-03-01 2002-10-09 深圳市中兴通讯股份有限公司 Device and method for implementing dual system slots
CN1673985A (en) * 2004-03-26 2005-09-28 华为技术有限公司 Connection circuit for master equipment and multi-slave equipment and method for generating answer signal thereof
CN1949133A (en) * 2005-10-10 2007-04-18 纬创资通股份有限公司 System of controlling electric source supply according to electricity supplying situation and its controlling method and device
CN1815908A (en) * 2006-03-02 2006-08-09 迈普(四川)通信技术有限公司 Telecommunication apparatus master-slave switching method and telecommunication apparatus thereof
CN101127653A (en) * 2006-08-14 2008-02-20 中兴通讯股份有限公司 Ethernet loop device with backup and implementation method for master device backup

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103853622A (en) * 2012-11-28 2014-06-11 中国航空工业集团公司第六三一研究所 Control method of dual redundancies capable of being backed up mutually
CN103533588A (en) * 2013-09-27 2014-01-22 宇龙计算机通信科技(深圳)有限公司 Method for switching current operation application in sub machine to host machine for operation, and mobile terminal for method
CN103678049A (en) * 2013-12-10 2014-03-26 中国航空工业集团公司第六三一研究所 Fault isolating and switching method of redundancy fault-tolerant computer system
CN104615510A (en) * 2015-03-09 2015-05-13 中国科学院自动化研究所 Programmable device-based dual-mode redundant fault-tolerant method
CN105608039B (en) * 2015-12-10 2019-04-05 中国航空工业集团公司西安航空计算技术研究所 A kind of double redundancy computer cycle control system and method based on FIFO and ARINC659 bus
CN105608039A (en) * 2015-12-10 2016-05-25 中国航空工业集团公司西安航空计算技术研究所 FIFO and ARINC659 bus based dual-redundancy computer period control system and method
CN105974879A (en) * 2016-06-27 2016-09-28 北京广利核系统工程有限公司 Redundancy control equipment of digital instrument control system, digital instrument control system and control method
CN107957692A (en) * 2016-10-14 2018-04-24 中国石油天然气集团公司 Controller redundancy approach, apparatus and system
CN106597944A (en) * 2016-12-20 2017-04-26 中国船舶重工集团公司第七�三研究所 Dual-DSP-controller seamless switching system and switching method
CN106597944B (en) * 2016-12-20 2019-04-19 中国船舶重工集团公司第七一三研究所 A kind of two CSTR controller seamless switch-over system and switching method
CN107105337B (en) * 2017-02-27 2020-07-24 深圳市卓翼科技股份有限公司 Wireless multimedia playing method and device
CN107105337A (en) * 2017-02-27 2017-08-29 深圳市卓翼科技股份有限公司 Radio multimedium playing method and device
CN107579918A (en) * 2017-08-15 2018-01-12 新华三技术有限公司 The maintaining method and device of a kind of neighborhood
CN107579918B (en) * 2017-08-15 2020-05-12 新华三技术有限公司 Method and device for maintaining neighbor relation
CN108021406A (en) * 2017-11-03 2018-05-11 中国航空工业集团公司西安航空计算技术研究所 A kind of double remaining Hot Spare cpu systems suitable for airborne computer
CN108021406B (en) * 2017-11-03 2021-06-01 中国航空工业集团公司西安航空计算技术研究所 Dual-redundancy hot backup CPU system suitable for onboard computer
CN109814519A (en) * 2017-11-22 2019-05-28 成都凯天电子股份有限公司 The method of remaining switching dual-redundancy avionics apparatus output signal
CN109814519B (en) * 2017-11-22 2021-11-16 成都凯天电子股份有限公司 Method for switching output signals of dual-redundancy avionics equipment
CN109698775A (en) * 2018-11-21 2019-04-30 中国航空工业集团公司洛阳电光设备研究所 A kind of dual-machine redundancy backup system based on real-time status detection
WO2020143243A1 (en) * 2019-01-07 2020-07-16 北京全路通信信号研究设计院集团有限公司 Dual-system hot backup switching method and system applied to automatic running system of train
CN110442073A (en) * 2019-08-30 2019-11-12 四川腾盾科技有限公司 A kind of redundance Aircraft Management Computer MIO board channel fault logic judgment method
CN110442073B (en) * 2019-08-30 2020-07-10 四川腾盾科技有限公司 Logical judgment method for redundant airplane management computer MIO board channel fault
CN110825206A (en) * 2019-11-13 2020-02-21 沧州师范学院 Computer integration system and automatic fault switching method
CN110825206B (en) * 2019-11-13 2024-03-19 沧州师范学院 Computer integrated system and fault automatic switching method
CN111142945A (en) * 2019-11-28 2020-05-12 中国航空工业集团公司西安航空计算技术研究所 Dynamic switching method for master channel and slave channel of dual-redundancy computer
CN111142945B (en) * 2019-11-28 2023-06-13 中国航空工业集团公司西安航空计算技术研究所 Master and slave channel dynamic switching method for dual-redundancy computer
CN112182876A (en) * 2020-09-25 2021-01-05 西安微电子技术研究所 Dual-redundancy steering engine channel fault switching system and logic design method
CN112182876B (en) * 2020-09-25 2023-06-20 西安微电子技术研究所 Dual-redundancy steering engine channel fault switching system and logic design method
CN113050407A (en) * 2021-03-04 2021-06-29 中国航空工业集团公司西安航空计算技术研究所 Method for determining and switching master controller and slave controller of distributed processing system
CN113485089A (en) * 2021-07-05 2021-10-08 浙江胄天科技股份有限公司 Control method for monitoring system in mobile offshore environment monitoring equipment

Similar Documents

Publication Publication Date Title
CN102541697A (en) Switching method for processing fault of dual-redundancy computer
CN101132314B (en) Method for implementing redundancy backup
CN103955188A (en) Control system and method supporting redundancy switching function
CN105974879A (en) Redundancy control equipment of digital instrument control system, digital instrument control system and control method
CN109739694A (en) A kind of two-shipper arbitrary switch-over system and method
CN102281178A (en) Ring network link redundancy control system and control method thereof
CN101488844A (en) Method and system for communication link switching control between boards
CN101542444A (en) Security features in interconnect centric architectures
CN104579791A (en) Method for achieving automatic K-DB main and standby disaster recovery cluster switching
CN101083559A (en) Method and system for switching master control module and slave control module
CN105242980A (en) Complementary watchdog system and complementary watchdog monitoring method
CN103885860A (en) Method for achieving BMC double-management hot redundancy by applying IPMI command
CN104283718A (en) Network device and hardware fault diagnosis method used for network device
CN104317679A (en) Communication fault-tolerant method based on thread redundancy for SCADA (Supervisory Control and Data Acquisition) system
CN204406385U (en) The management devices of computer system
CN105009086A (en) Method for switching processors, computer, and switching apparatus
CN101782617B (en) Method and device for detecting circuit faults
CN201163348Y (en) Application control apparatus used for redundant system
CN101957786B (en) Method and device for realizing start and fault switching control in dual-control system
CN101296064B (en) Bypass switching method, system and bypass equipment
CN105573869B (en) System controller fault tolerant control method based on I2C bus
CN103840956A (en) Backup method for gateway device of Internet of Things
CN103067205A (en) Switchover method of receiver-transmitter (RT) and back-up RT using same address under same host management
CN203733107U (en) Quick active/standby shifting device in active-standby system
CN102780576B (en) Method and device for submitting fault generation and fault disappearing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20120704