CN104050051A - Fault diagnosis method for on-board computer - Google Patents

Fault diagnosis method for on-board computer Download PDF

Info

Publication number
CN104050051A
CN104050051A CN201410301310.7A CN201410301310A CN104050051A CN 104050051 A CN104050051 A CN 104050051A CN 201410301310 A CN201410301310 A CN 201410301310A CN 104050051 A CN104050051 A CN 104050051A
Authority
CN
China
Prior art keywords
fault
logic
hardware
state
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410301310.7A
Other languages
Chinese (zh)
Other versions
CN104050051B (en
Inventor
花秋琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Aerospace Electronic Communication Equipment Research Institute
Original Assignee
Shanghai Aerospace Electronic Communication Equipment Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Aerospace Electronic Communication Equipment Research Institute filed Critical Shanghai Aerospace Electronic Communication Equipment Research Institute
Priority to CN201410301310.7A priority Critical patent/CN104050051B/en
Publication of CN104050051A publication Critical patent/CN104050051A/en
Application granted granted Critical
Publication of CN104050051B publication Critical patent/CN104050051B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Hardware Redundancy (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a fault diagnosis method for an on-board computer. Fault diagnosis of the on-board computer is completed in a software and hardware cooperative work mode. The fault diagnosis method comprises the steps of fault detection based on assertion and hardware fault event driving. The step of fault detection based on assertion comprises the steps that an operation interface and a value range of hardware drive parameters are provided by a hardware system of the on-board computer and are read back and judged by the software; the software conducts input parameter and state return detection on a function interface; by asserting the working range of the input parameters, exception throwing is conducted in a soft interruption or callback function mode when it is judged that the input parameters exceed a threshold value, and fault diagnosis and fault recovery are completed in the process of exception handling by a processor. The step of hardware fault event driving comprises the steps that the current running process of the processor is interrupted in a bus access wait signal trigger mode, a bus access error signal trigger mode and a bus access interrupt signal trigger mode through synchronous state feedback of control flow and data flow; fault recognition and fault recovery are conducted according to an event driving source and feedback information.

Description

A kind of method for diagnosing faults of spaceborne computer
Technical field
The present invention relates to fault diagnosis technology field, particularly a kind of method for diagnosing faults of spaceborne computer.
Background technology
Be subject to the unfavorable factors such as radiation, solar flare and the high-low temperature difference of high energy particle in space environment, make logical resource, storage medium in computing machine be prone to all kinds of instantaneous or permanent faults.Spacecraft height from main control requirement, to the requirement of complex space environment adaptive faculty, in critical event, continue the requirement of non-stop run, need to possess particularly automatic fault diagnosis and the fault-tolerant ability of core control calculating unit of spaceborne electronic equipment.
Fault diagnosis and fault-tolerant technique are mainly by increasing redundant information, backing up, the mode such as coding, pattern-recognition reaches diagnosis and the recovery to equipment failure.Existing spaceborne electronic equipment is restricted by the many factors such as scale, components selection, and the main method adopting comprises that multimachine independently switches, cold and hot redundancy at present, and the hamming code, three of storage resources is got second-class mode.Aforesaid way effective guarantee the maximum service ability of unit product after fault, the transient fault of storage resources is had to preferably in real time error correction and detection ability.And in the development of the spaceborne electronic equipment in modern times, due to the increase of lifting, integrated level and the design scale of properties of product, the use of a large amount of large scale integrated circuits, makes existing diagnosis and fault-tolerant way can not meet the particularly application requirements of spaceborne computer of spaceborne electronic equipment.
Summary of the invention
The present invention is directed to prior art above shortcomings, a kind of method for diagnosing faults of spaceborne computer is provided, the present invention is achieved through the following technical solutions:
A method for diagnosing faults for spaceborne computer, completes the fault diagnosis of spaceborne computer by the mode of cooperative work of software and hardware, comprising: based on the fault detect of asserting, and hardware fault event-driven;
Fault detect based on asserting comprises:
The hardware system of spaceborne computer provides operation-interface and the numerical range of hardware driving parameter, by software retaking of a year or grade judgement; Software carries out to function interface the detection that input parameter and state return; By asserting the working range of input parameter, in the time judging that input parameter exceeds threshold value, abnormal with the mode throwing of soft interruption or call back function, and complete fault diagnosis and recovery in the abnormality processing flow process of processor;
Hardware fault event-driven comprises:
Adopt the synchronous regime feedback system of controlling stream, data stream, with waiting signal, rub-out signal and three kinds of current operational schemees of triggering mode interrupt handler of look-at-me of bus access, and carry out Fault Identification and recovery according to event-driven source, feedback information.
Preferably, adopt the synchronous regime feedback system of controlling stream, data stream to comprise:
The data stream of spaceborne computer is cut apart by functional domain or clock zone, cut apart and obtain some functional units, cut apart each functional unit is set up to state machine, state machine comprises Idle state, operating conditions and three kinds of states of confirmation state, cut-point is controlled to stream synchronously shakes hands, synchronously shake hands and comprise the data communication verification between state confirmation and two functional units of state machine, make mistakes and cause two synchronization failures between functional unit or communication verification when incorrect when functional unit, two controls are flowed the failure of synchronously shaking hands and are caused data stream to link, until processor bus access is overtime, enter bus access operation exception flow process.
Preferably, cut apart and obtain some functional units and comprise: processor decoding respective logic, main equipment Communication Control logic, from device talk steering logic, store control logic and interface accessing logic.
Preferably, state machine comprises Idle state, operating conditions and confirms three kinds of states of state, switch condition between three kinds of states comprises: current logic is under Idle state, whether the startup process identification that detects higher level's logic is effective, and whether subordinate's logic is in Idle state, if judged result is all for being to enter operating conditions, otherwise do not change; Current logic is under operating conditions, and detection, when whether prelogical workflow finishes, if the determination result is YES enters confirmation state, otherwise do not change; Current logic is confirming that under state, whether the startup process identification that detects higher level's logic is invalid, and whether subordinate's logic in operating conditions, if judged result is all for being to enter Idle state, otherwise does not change.
Preferably, for there is no the processors to be identified such as bus, increase time-out count device at Idle state, and in the time that counter exceedes threshold value to interrupt or bus error mode notification processor.
Brief description of the drawings
Shown in Fig. 1 is the process flow diagram that the present invention is based on the synchronous regime feedback system of controlling stream;
Shown in Fig. 2 is state machine diagram of the present invention;
Shown in Fig. 3 is the algorithm flow chart that the present invention is based on the fault detect of asserting;
Shown in Fig. 4 is the multi-data source acquisition system fault detect schematic diagram that transaction-level of the present invention is asserted.
Embodiment
Below with reference to accompanying drawing of the present invention; technical scheme in the embodiment of the present invention is carried out to clear, complete description and discussion; obviously; as described herein is only a part of example of the present invention; it is not whole examples; based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain under the prerequisite of not making creative work, belongs to protection scope of the present invention.
For the ease of the understanding to the embodiment of the present invention, be further explained as an example of specific embodiment example below in conjunction with accompanying drawing, and each embodiment does not form the restriction to the embodiment of the present invention.
A method for diagnosing faults for spaceborne computer, completes the fault diagnosis of spaceborne computer by the mode of cooperative work of software and hardware, comprising: based on the fault detect of asserting, and hardware fault event-driven.
Fault detect based on asserting be finger processor establish outside access time complete feature operation in interaction response mode, management software is by port operation access hardware functional interface, hardware system response software operation real-time feedback control state, management software carries out the judgement of threshold value according to hardware feedback states.In the time that numerical value exceeds threshold value, initiatively throwing is abnormal for management software.The functional interface of the plain mode such as equipment interface, communication logic to spaceborne computer, can be provided by the hardware system of spaceborne computer operation-interface and the numerical range of hardware driving parameter, by software retaking of a year or grade judgement; Software carries out to function interface the detection that input parameter and state return; By asserting the working range of input parameter, in the time judging that input parameter exceeds threshold value, abnormal with the mode throwing of soft interruption or call back function, and complete fault diagnosis and recovery in the abnormality processing flow process of processor.Above-mentioned detection method can solve the duty of functional interface or communication state and judge, for the complex control logic in hardware, the enforcement of algorithm policy, because procedural information amount is large, the generation of fault and releasing process are of short duration, the sampling rate that uses processor to carry out information cannot meet the requirement of fault detect, for this reason, the present invention proposes a kind of fault detection method of transaction-level, adopt the functional test mode of built-in evaluating system, real-time dynamic monitoring is carried out in evaluating system measuring ability logic the information to process or signal condition conversion, to information value, signal condition is carried out " asserting ", in the time detecting that procedural information or signal condition are abnormal, carry out the record of error condition.In multi-data source collecting device as shown in Figure 4, adopt multiple buffer queues to carry out the storage of data source, and implement scheduling strategy and carry out the unloading of data by scheduler, scheduling strategy has ensured the receiving ability of each channel data source data, it is the situation that data from overflow must not appear in buffer queue, evaluating system is by detecting full marking signal and the write queue signal of buffer queue, and " asserting when two signals are different effective ", in the time that " asserting " lost efficacy, evaluating system carries out error count and feeds back to processor or master control system.Processor or main control unit detect counting when non-vanishing, abnormal with software " assert mode " throwing, and enter as stated above abnormality processing flow process.
Described " asserting ", " soft interruption ", known title and the technology that " call back function " is all this area in the present invention, the present invention is not described in detail this.
The active fault detect based on asserting is the fault detection method of implementing for hardware device drivers, the logic that detects be in equipment by the interface logic of drive software direct control, the control stream to utonomous working in equipment, steering logic are the invisible part of device drives.The present invention is divided into multiple data process by data flow and control mode by control stream for this reason.Divided multiple data process is shaken hands by state of a control in the time of information interaction, and the modes such as communication data verification are synchronous.The steering logic front end i.e. steering logic mutual with processor is completed and is synchronizeed by wait control signal, look-at-me, the miscue signal processing device of processor.When the control link link of data stream occurs when abnormal; two groups of relevant steering logics cannot be synchronous; cause these two groups of steering logics to lose efficacy; and finally affect the synchronous of whole data stream link; until the synchronous logic of front-end control logical and processor lost efficacy; cause the abnormal or enabled devices mistake of processor wait timeout and interrupt, to there is no the processor of bus waiting status interface by interrupting or error flag notification processor, and implement the recovery of fault by the abnormality processing of management software.As shown in Figure 1, the information control flow journey of spaceborne computer comprises the functions such as data communication processing, interface accessing operation, signal processing, order-driven.Whole flow process has been carried out cutting apart of flow process by the difference of functional type or clock zone, is divided into processor decoding response logic, main equipment Communication Control logic, and from device talk steering logic, store control logic and interface accessing logic.Each several part logic function is described as follows:
Processor decoding response logic: the port access operation of receiving processor, the bus cycles of maintenance and answer processor, comprises the mutual of processor waiting signal (RDY).Waiting signal (RDY) is peripheral hardware shows mark from DSR in bus to processor, and in the time that waiting signal is invalid, processor waits for that peripheral hardware provides data to bus, until DSR or bus cycles are overtime.
Main equipment Communication Control logic: receive the operational order after treated device function shifts, and start external communication bus and complete with the communication from equipment mutual.
From device talk steering logic: safeguard and the communication of primary processor, and the data that bus is sent or order are sent to the store control logic of local module.
Store control logic: receive data or order that bus transmits, bus data is processed or order is translated, data or order are committed to interface accessing logic.
Interface accessing logic: carry out the order of processor or carry out the functional unit of data communication, completing with the communication of External Functionality Interface alternately, carrying out or implement the order of self processor.
Above-mentioned 5 functional units have been passed through in the interface accessing operation of spaceborne computer, and each functional unit is set up state machine as shown in Figure 2, and the state transition condition of state machine is as shown in table 1.Idle state in Fig. 2 and confirmation state are two synchronous points in the present embodiment.Under Idle state, the control flow mark startup that functional unit detects higher level's logic comprises the judgement of data validity, and judges that whether subordinate's logical block is in operable state, only in the time that two conditions meet simultaneously, proceeds to operating conditions.Confirming under state, whether the startup mark that functional unit detects higher level's functional unit finishes, and confirms whether subordinate's flow process enters duty, only in the time that two conditions all meet, returns to Idle state.For processor response logic, flow startup that the bus access of processor is this unit mark, and (RDY) to be identified signal such as bus of answer processor while being idle condition in this unit.When certain functional unit lost efficacy or data check mistake while being absorbed in Idle state because of logic, its higher level's functional unit deadlock is in confirming state, and in the time that certain functional unit is absorbed in busy state because of fault, its higher level's logic deadlock is in Idle state.The like until processor response logic is absorbed in deadlock, cause processor bus etc. to be identified invalid, until processor bus access is overtime, processor carries out bus access operation exception flow process.For there is no the processors to be identified such as bus, increase time-out count device at Idle state, and in the time that counter exceedes threshold value, to interrupt or bus error mode notification processor.
numbering source volt state dbjective state switch condition
1 idle state operating conditions start comb journey mark effectively, subordinate's logic is in Idle state.
2 operating conditions confirm state work comb journey finishes
3 confirm state idle state it is invalid that higher level combs journey mark, and subordinate's Logical Deriving goes out Idle state.
Table 1
The transaction operation of spaceborne computer is comprised to above-mentioned many are controlled stream, or controlled stream completing alternately repeatedly by certain.In order to reach the diagnosis and detection to affairs, the present invention has adopted the soft interruption of processor to carry out the detection based on asserting.Testing process flow process as shown in Figure 3.Under normal flow, processor carries out the access control of certain function port by Management Information Base sequence.Increase in the present invention the testing process to command sequence, and complete the judgement of carrying out hardware state after any operation, hardware device provides the operational feedback remote measurement of order, management software carries out the judgement of threshold value, when measurement parameter is not during at threshold range, by call back function or the abnormal flow process of soft IE.Assert that description form has following two kinds:
Assert(Expression,function); (1)
Assert(Expression,kind); (2)
Asserting of above-mentioned two kinds of forms has two parameters, and wherein parameter one is conditional expression, the corresponding abnormality processing mode of parameter two.The exception handling parameter call back function mode of form (1), in the time that conditional expression does not meet, management software calls corresponding exception handler (function).The exception handling parameter of form (2) is soft interrupt number, in the time that conditional expression does not meet, produces the soft interruption of corresponding vectorial number.The false code that two kinds of modes are asserted is as follows:
Can adopt the detection method of asserting of above-mentioned form (1) for the non-processor fault in spaceborne computer, relate to processor carries out processor when abnormal system register operation due to need, need there are the access rights of system register, different forms (2) assert testing mechanism.
The above; only for preferably embodiment of the present invention, but protection scope of the present invention is not limited to this, is anyly familiar with in technical scope that those skilled in the art disclose in the present invention; the variation that can expect easily or replacement, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims (5)

1. a method for diagnosing faults for spaceborne computer, is characterized in that, completes the fault diagnosis of spaceborne computer by the mode of cooperative work of software and hardware, comprising: based on the fault detect of asserting, and hardware fault event-driven;
The described fault detect based on asserting comprises:
The hardware system of spaceborne computer provides operation-interface and the numerical range of hardware driving parameter, by software retaking of a year or grade judgement; Software carries out to function interface the detection that input parameter and state return; By asserting the working range of input parameter, in the time judging that input parameter exceeds threshold value, abnormal with the mode throwing of soft interruption or call back function, and complete fault diagnosis and recovery in the abnormality processing flow process of processor;
Described hardware fault event-driven comprises:
Adopt the synchronous regime feedback system of controlling stream, data stream, with waiting signal, rub-out signal and three kinds of current operational schemees of triggering mode interrupt handler of look-at-me of bus access, and carry out Fault Identification and recovery according to event-driven source, feedback information.
2. the method for diagnosing faults of spaceborne computer according to claim 1, is characterized in that, the synchronous regime feedback system of described employing control stream, data stream comprises:
The data stream of spaceborne computer is cut apart by functional domain or clock zone, cut apart and obtain some functional units, cut apart each functional unit is set up to state machine, described state machine comprises Idle state, operating conditions and three kinds of states of confirmation state, cut-point is controlled to stream synchronously shakes hands, described synchronously shaking hands comprises the data communication verification between state confirmation and two functional units of state machine, make mistakes and cause two synchronization failures between functional unit or communication verification when incorrect when functional unit, two controls are flowed the failure of synchronously shaking hands and are caused data stream to link, until processor bus access is overtime, enter bus access operation exception flow process.
3. the method for diagnosing faults of spaceborne computer according to claim 2, it is characterized in that, cut apart and obtain some functional units and comprise: processor decoding respective logic, main equipment Communication Control logic, from device talk steering logic, store control logic and interface accessing logic.
4. the method for diagnosing faults of spaceborne computer according to claim 2, it is characterized in that, described state machine comprises Idle state, operating conditions and confirms three kinds of states of state, switch condition between three kinds of states comprises: current logic is under Idle state, whether the startup process identification that detects higher level's logic is effective, and whether subordinate's logic in Idle state, if judged result is all for being to enter operating conditions, otherwise do not change; Current logic is under operating conditions, and detection, when whether prelogical workflow finishes, if the determination result is YES enters confirmation state, otherwise do not change; Current logic is confirming that under state, whether the startup process identification that detects higher level's logic is invalid, and whether subordinate's logic in operating conditions, if judged result is all for being to enter Idle state, otherwise does not change.
5. the method for diagnosing faults of spaceborne computer according to claim 4, it is characterized in that, for there is no the processors to be identified such as bus, increase time-out count device at Idle state, and in the time that counter exceedes threshold value to interrupt or bus error mode notification processor.
CN201410301310.7A 2014-06-27 2014-06-27 A kind of method for diagnosing faults of spaceborne computer Active CN104050051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410301310.7A CN104050051B (en) 2014-06-27 2014-06-27 A kind of method for diagnosing faults of spaceborne computer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410301310.7A CN104050051B (en) 2014-06-27 2014-06-27 A kind of method for diagnosing faults of spaceborne computer

Publications (2)

Publication Number Publication Date
CN104050051A true CN104050051A (en) 2014-09-17
CN104050051B CN104050051B (en) 2016-10-26

Family

ID=51502944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410301310.7A Active CN104050051B (en) 2014-06-27 2014-06-27 A kind of method for diagnosing faults of spaceborne computer

Country Status (1)

Country Link
CN (1) CN104050051B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815114A (en) * 2017-01-12 2017-06-09 西安科技大学 A kind of computer system fault handling method based on software-hardware synergism
CN108733539A (en) * 2018-05-24 2018-11-02 郑州云海信息技术有限公司 A kind of method, apparatus, system and readable storage medium storing program for executing stopping OSD services
CN110673975A (en) * 2019-08-23 2020-01-10 上海航天控制技术研究所 Security kernel structure and security operation method of satellite-borne computer software

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493809B (en) * 2009-03-03 2010-09-08 哈尔滨工业大学 Multi-core onboard spacecraft computer based on FPGA
CN103116535B (en) * 2011-11-17 2016-09-07 上海航天测控通信研究所 Spaceborne dual redundant main frame working state monitoring and the autonomous switching device of fault

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马寅: "星载高速数据处理技术研究", 《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815114A (en) * 2017-01-12 2017-06-09 西安科技大学 A kind of computer system fault handling method based on software-hardware synergism
CN108733539A (en) * 2018-05-24 2018-11-02 郑州云海信息技术有限公司 A kind of method, apparatus, system and readable storage medium storing program for executing stopping OSD services
CN110673975A (en) * 2019-08-23 2020-01-10 上海航天控制技术研究所 Security kernel structure and security operation method of satellite-borne computer software
CN110673975B (en) * 2019-08-23 2023-06-02 上海航天控制技术研究所 Secure kernel structure of spaceborne computer software and secure operation method

Also Published As

Publication number Publication date
CN104050051B (en) 2016-10-26

Similar Documents

Publication Publication Date Title
CN101833497B (en) Computer fault management system based on expert system method
US20090193298A1 (en) System and method of fault detection, diagnosis and prevention for complex computing systems
CN105677497A (en) High availability watchdog circuit
CA2549540C (en) A task management control apparatus and method
CN114328102B (en) Equipment state monitoring method, equipment state monitoring device, equipment and computer readable storage medium
CN103440171A (en) Realization method of real-time operating system of component-based hardware
CN110955571B (en) Fault management system for functional safety of vehicle-specification-level chip
CN103761172B (en) Hardware fault diagnosis system based on neutral net
CN104156197A (en) Microprocessor and method for operating microprocessor
EP2021925B1 (en) Arbiter diagnostic apparatus and method
US20100268857A1 (en) Management of Redundant Physical Data Paths in a Computing System
CN104050051A (en) Fault diagnosis method for on-board computer
CN100395722C (en) Method for preserving abnormal state information of control system
US20090307526A1 (en) Multi-cpu failure detection/recovery system and method for the same
CN102915260B (en) The method that solid state hard disc is fault-tolerant and solid state hard disc thereof
CN110659147B (en) Self-repairing method and system based on module self-checking behavior
CN103530197B (en) A kind of method for detecting and solving Linux system deadlock
CN102792278B (en) For the method and apparatus that the diagnostic data in computing environment is caught
CN101794241A (en) Circuit of power-on reset of triple redundancecy fault-tolerance computer based on programmable logic device
CN101165630A (en) Combined type reset system processing method and device
US20130198575A1 (en) System error response
CN103995759B (en) High-availability computer system failure handling method and device based on core internal-external synergy
US8122166B2 (en) Management of redundant physical data paths in a computing system
CN102929761A (en) System and method for responding corruption error
US20090235123A1 (en) Computer system and bus control device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant