CN101349978A - Method for restoring star load computer hardware scanning error - Google Patents

Method for restoring star load computer hardware scanning error Download PDF

Info

Publication number
CN101349978A
CN101349978A CNA2008101180400A CN200810118040A CN101349978A CN 101349978 A CN101349978 A CN 101349978A CN A2008101180400 A CNA2008101180400 A CN A2008101180400A CN 200810118040 A CN200810118040 A CN 200810118040A CN 101349978 A CN101349978 A CN 101349978A
Authority
CN
China
Prior art keywords
data
module
error correction
hardware scanning
error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2008101180400A
Other languages
Chinese (zh)
Other versions
CN101349978B (en
Inventor
施思寒
李孝同
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Dongfanghong Satellite Co Ltd
Original Assignee
Aerospace Dongfanghong Satellite Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Dongfanghong Satellite Co Ltd filed Critical Aerospace Dongfanghong Satellite Co Ltd
Priority to CN2008101180400A priority Critical patent/CN101349978B/en
Publication of CN101349978A publication Critical patent/CN101349978A/en
Application granted granted Critical
Publication of CN101349978B publication Critical patent/CN101349978B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Detection And Correction Of Errors (AREA)

Abstract

The invention relates to a method for scanning hardware and recovering errors of an on-board computer, which comprises: firstly, reading data in an external memory to correct the errors by an error detection correction module, sending data which is corrected to a bus in a processor, waiting for an order of a calculating and processing unit CPU which is connected with the bus of the processor, secondly, setting the scanning area, the scanning rate and the using function of a hardware scanning error recovering module by the CPU, starting the hardware scanning error recovering module, thirdly, writing data which is located on the bus in the processor and is corrected back to the external memory through the error detection correction module by the hardware scanning error recovering module according to the scanning area and the scanning rate which are determined by the CPU, and realizing the error recovering function. The invention improves the fault tolerance and reliability of a satellite on-board computer, lowers the risk of satellite motion in an orbit, and solves the practical problem that when a satellite is in the orbit and when faults of the memory appear, the automatic recovery of a satellite in orbit memory is realized through means of hardware automatic scanning and real time error recovery.

Description

Method for restoring star load computer hardware scanning error
Technical field
The present invention relates to a kind of spaceborne computer scanning error recovery method, particularly a kind of method for restoring star load computer hardware scanning error.
Background technology
At present, domestic and international many satellites have been realized the error correction and detection function on spaceborne computer, the Star Service main frame of the moonlet series of China for example, " design of TS-1.1 moonlet Star Service computing machine RAM error correction and detection circuit and the method that realizes its realization " of " computer engineering and science " 2002 the 24th the 2nd phases of volume, it has adopted special-purpose chip with EDAC function, add buffer memory and EDAC controller and form, as shown in Figure 1.This method for designing makes system comparatively complicated, and it is comparatively frequent to need computing machine to participate in simultaneously in implementation procedure, computer time anxiety.
Summary of the invention
Technology of the present invention is dealt with problems and is: overcome the deficiencies in the prior art, a kind of method for restoring star load computer hardware scanning error is provided, this method has reduced the frequency that computing machine participates in, and has improved the utilization ratio of spaceborne computer.
Technical solution of the present invention: method for restoring star load computer hardware scanning error, its characteristics are: be added with the error correction and detection module between the internal bus of spaceborne computer processor and external memory data bus, the hardware scanning mistake is recovered module and is directly hung on the processor internal bus, is achieved as follows:
(1) calculation processing unit CPU initiates the order of reading external memory, and the error correction and detection module reads data and error correction and detection in the described external memory storage according to the order of CPU, then the data behind the error correction and detection is sent on the processor internal bus;
(2) set the hardware scanning mistake as required by calculation processing unit CPU and recover scanning area, the sweep speed of module, enable, start the hardware scanning mistake and recover module;
(3) recover module if CPU has started the hardware scanning mistake, then hardware scanning mistake recovery module is carried out fault recovery scan process according to the state of processor internal bus;
(4) recover scanning area, the sweep speed that module is set according to calculation processing unit CPU by the hardware scanning mistake, to be positioned at according to the state of processor internal bus that the data after the error correction are written back to external storage through the error correction and detection module again on the processor internal bus, realize wrong restore funcitons.
The function that described hardware scanning mistake is recovered module is cured to processor inside by hardware description language, realizes according to the state employing state machine of internal bus, is divided into following several state:
A. idle condition, when the control of calculation processing unit CPU starts, the hardware scanning mistake is recovered module and is switched to next state from idle condition, i.e. the bus request state;
B. bus request state, the control of request processor internal bus is recovered module when the hardware scanning mistake and is obtained control, switches to next state from the bus request state, i.e. the data read state of a control;
C. data read state of a control, hardware scanning mistake recover that module is sent address signal by the processor internal bus and control signal is given the error correction and detection module, read state of a control and switch to next state, and promptly data are read waiting status;
D. data are read waiting status, wait for the data of external memory storage, the data that are the external storage of error correction and detection module after with error correction and detection are sent on the processor internal bus, when the hardware scanning mistake is recovered the data that mould obtains external memory storage, data are read waiting status and are switched to next state, and promptly data are write waiting status;
E. data are write waiting status, the wait write operation is finished, and promptly recover the data that module will obtain external memory storage and be written back to external memory storage by the error correction and detection module when the hardware scanning mistake, thereby when write operation is finished, data are write waiting status and are switched to next state, i.e. idle condition;
So circulation, hardware scanning mistake recover module and finish promptly that the data after the error correction are written back to external storage on the processor internal bus with being positioned at.
The present invention's advantage compared with prior art is:
(1) the present invention recovers module by adopting error correction and detection module, hardware scanning mistake, realized that the spaceborne computer peripheral storage is carried out mistake to be recovered, and wrong recovery of scanning scanned at spaceborne computer idle period, reduce the frequency that computing machine participates in, improved the spaceborne computer utilization ratio.
(2) spaceborne computer among the present invention only need configure scanning area and sweep frequency, simplified the realization task greatly, and the restoring star load computer hardware scanning error function is cured to chip internal by hardware language, system forms simple, reduce hardware and manually participated in circuit diagram design and debugging, the reliability and the security of spaceborne computer have been improved, when having solved satellite storage failure having appearred in system when rail, by the hardware autoscan, the wrong in real time means of recovering realize the practical problems that satellite is repaired automatically at the rail storer.
Description of drawings
Fig. 1 is traditional computer EDAC error correction and detection circuit theory diagrams;
Fig. 2 is star load computer hardware scanning error correction principles figure of the present invention;
Fig. 3 is an error correction and detection theory diagram of the present invention;
Fig. 4 is the state transition graph of restoring star load computer hardware scanning error module status machine of the present invention;
Fig. 5 is a total system workflow diagram of the present invention.
Embodiment
As shown in Figure 2, spaceborne computer processor calculation processing unit is the core component of whole spaceborne computer, that is responsible for instruction reads and analyzes work such as scheduling, and it is made up of five big parts: instruction fetch parts, instruction decode parts, execution unit, memory access parts, data write back parts.Corresponding with five big parts is 5 grades of flowing structures, Pyatyi flowing water such as they are that instruction is read, instruction decode, instruction execution, instruction storage and instruction write-back, the calculation processing unit of this spaceborne computer processor recovers module by processor internal bus and outer memory module, error correction and detection module and hardware scanning mistake and is connected, and obtains and control each functional part by internal bus.Because emphasis of the present invention does not lie in spaceborne computer structure itself,, carried out before this simply introducing in order further to set forth the present invention so the function of spaceborne computer itself and structure are techniques well known.
Emphasis of the present invention is that the hardware scanning mistake is recovered module, the calculation processing unit CPU of error correction and detection module and spaceborne calculating meter is connected by the processor internal bus, wherein external memory storage is connected with the error correction and detection module by data bus, the connection of error correction and detection module hangs on the processor internal bus by address bus again, the hardware scanning mistake is recovered module and is directly hung on the processor internal bus, the locking that is transmitted as of processor internal bus data was transmitted when the hardware scanning mistake was recovered module scanning, guarantee that other bus apparatus can be when the hardware scanning mistake not be recovered some data cells of mould swept memory, revise data, cause error in data.The processor internal bus adopts AMBA bus or WISHBONE bus, and calculation processing unit CPU adopts x86 series or SPARC series or 8031 series or 8051 series microprocessors.
As shown in Figure 3, the function of error correction and detection module is cured to processor inside by hardware description language.The error correction and detection module is by exterior arrangement, selects to carry out the error correction and detection logic of 8 or 16 or 32 or 64 bit memories according to highway width, can realize by multiselect one switch.
When spaceborne computer brings into operation, just set 8 of highway widths, or 16 or 32 or 64, according to this width is set then, corresponding highway width error correction and detection module logic is started working, these 8 or 16 or 32 or 64 error correction and detection logics employing hamming algorithms are realized correcting 1 bit data, detect 2 bit data.
Be example explanation hamming algorithm with 16 cpu data bus below, adopt hamming algorithm similar to 16 for 8,32 error correction and detection logics with 64.
The figure place of supposing information source is 16, construct a kind ofly can correct a bit-errors, checks the coded system of two bit-errors.According to " error correction theorem ", need the code character of design smallest hamming distance 〉=4.Can adopt linear block code, utilize the notion of linear block codes can construct six picket code, they are produced by following linear relationship:
C 0 = d 0 ⊕ d 1 ⊕ d 3 ⊕ d 4 ⊕ d 8 ⊕ d 9 ⊕ d 10 ⊕ d 13 C 1 = d 0 ⊕ d 2 ⊕ d 3 ⊕ d 5 ⊕ d 6 ⊕ d 8 ⊕ d 11 ⊕ d 14 C 2 = d 1 ⊕ d 2 ⊕ d 4 ⊕ d 5 ⊕ d 7 ⊕ d 9 ⊕ d 12 ⊕ d 15 C 3 = d 0 ⊕ d 1 ⊕ d 2 ⊕ d 6 ⊕ d 7 ⊕ d 10 ⊕ d 11 ⊕ d 12 C 4 = d 3 ⊕ d 4 ⊕ d 5 ⊕ d 6 ⊕ d 7 ⊕ d 13 ⊕ d 14 ⊕ d 15 C 5 = d 8 ⊕ d 9 ⊕ d 10 ⊕ d 11 ⊕ d 12 ⊕ d 13 ⊕ d 14 ⊕ d 15
Wherein, d0~d15 is 16 bit data (15 are most significant digit MSB, and 0 is lowest order LSB), six picket code of C0~C5 for producing, and XOR is carried out in expression.When data are read, only need to investigate syndrome S=[S0 S1 S2 S3 S4 S5], wherein:
S 0 = C 0 ⊕ d 0 ⊕ d 1 ⊕ d 3 ⊕ d 4 ⊕ d 8 ⊕ d 9 ⊕ d 10 ⊕ d 13 S 1 = C 1 ⊕ d 0 ⊕ d 2 ⊕ d 3 ⊕ d 5 ⊕ d 6 ⊕ d 8 ⊕ d 11 ⊕ d 14 S 2 = C 2 ⊕ d 1 ⊕ d 2 ⊕ d 4 ⊕ d 5 ⊕ d 7 ⊕ d 9 ⊕ d 12 ⊕ d 15 S 3 = C 3 ⊕ d 0 ⊕ d 1 ⊕ d 2 ⊕ d 6 ⊕ d 7 ⊕ d 10 ⊕ d 11 ⊕ d 12 S 4 = C 4 ⊕ d 3 ⊕ d 4 ⊕ d 5 ⊕ d 6 ⊕ d 7 ⊕ d 13 ⊕ d 14 ⊕ d 15 S 5 = C 5 ⊕ d 8 ⊕ d 9 ⊕ d 10 ⊕ d 11 ⊕ d 12 ⊕ d 13 ⊕ d 14 ⊕ d 15
Be easy to proof, carry out error diagnostics according to syndrome.
As S=[000000] time, data are correct;
As S=[001011] time, wrong one of data, and mistake occurs in the d0 position, the data negate of d0 position can be corrected;
As S=[001101] time, wrong one of data, and mistake occurs in the d1 position, the data negate of d1 position can be corrected;
As S=[110100] time, wrong one of data, and mistake occurs in the d15 position, the data negate of d15 position can be corrected;
As S=[000001] time, wrong one of data, and mistake occurs in the C0 position;
As S=[100000] time, wrong one of data, and mistake occurs in the C5 position;
When S was other situation, two bit-errors took place at least.
As can be seen, this coded system can satisfy automatic correction one bit-errors, and finds the requirement of two bit-errors.
As shown in Figure 4, the hardware scanning mistake is recovered module and is adopted state machine to realize, total following state:
(1) idle condition: when the control start bit of calculation processing unit CPU is 1, and to suspend the position be 0 o'clock, and the hardware scanning mistake is recovered the module status machine and switched to next state from idle condition, i.e. the bus request state;
(2) bus request state: be used to ask the control of internal bus, after the hardware scanning mistake was recovered module status machine acquisition control, state machine switched to next state from the bus request state, i.e. the data read state of a control;
(3) data read state of a control: be used to send address signal and control signal etc., recover module when the hardware scanning mistake and send address signal and control signal by the processor internal bus to after the error correction and detection module, the state of a control that reads of state machine switches to next state, and promptly data are read waiting status;
(4) data are read waiting status: the data that are used to wait for external memory storage, the data that are the external storage of error correction and detection module after with error correction and detection are sent on the processor internal bus, when the hardware scanning mistake is recovered the data that mould obtains external memory storage, the data of state machine are read waiting status and are switched to next state, and promptly data are write waiting status;
(5) data are write waiting status: be used to wait for that write operation finishes, promptly recover the data that module will obtain external memory storage and be written back to external memory storage by the error correction and detection module when the hardware scanning mistake, when thereby write operation is finished, the data of state machine are write waiting status and are switched to next state, i.e. idle condition;
So circulation, hardware scanning mistake recover module and finish promptly that the data after the error correction are written back to external storage on the processor internal bus with being positioned at.
As shown in Figure 5, the total system workflow is, when bringing into operation, and the wrong sweep frequency of recovering module of the calculation processing unit CPU initiating hardware of spaceborne computer scanning, scanning area, ena-bung function promptly begins to scan the setting with the end of scan; Then, CPU just can carry out other work, no longer participates in the scanning restore funcitons, is recovered to finish automatically to resume work by the hardware scanning mistake.
Ena-bung function designs according to the real work situation of spaceborne computer, opens and finishes by the ground remote control order and scan; Sweep frequency is according to the clock frequency setting of spaceborne computer; Scanning area is provided with according to the valid data district size that spaceborne computer is stored in external memory storage; The setting principle of this each parameter of scan module mainly is exactly the actual demand according to spaceborne computer, carries out design flexible.
The content that is not described in detail in the instructions of the present invention belongs to this area professional and technical personnel's known prior art.
Although disclose most preferred embodiment of the present invention and accompanying drawing for the purpose of illustration, it will be appreciated by those skilled in the art that: without departing from the spirit and scope of the invention and the appended claims, various replacements, variation and modification all are possible.Therefore, the present invention should not be limited to most preferred embodiment and the disclosed content of accompanying drawing.

Claims (8)

1, method for restoring star load computer hardware scanning error, it is characterized in that: between the internal bus of spaceborne computer processor and external memory data bus, be added with the error correction and detection module, the hardware scanning mistake is recovered module and is directly hung on the processor internal bus, is achieved as follows:
(1) calculation processing unit CPU initiates the order of reading external memory, and the error correction and detection module reads data and error correction and detection in the described external memory storage according to the order of CPU, then the data behind the error correction and detection is sent on the processor internal bus;
(2) set the hardware scanning mistake as required by calculation processing unit CPU and recover scanning area, the sweep speed of module, enable, start the hardware scanning mistake and recover module;
(3) recover module if CPU has started the hardware scanning mistake, then hardware scanning mistake recovery module is carried out fault recovery scan process according to the state of processor internal bus;
(4) recover scanning area, the sweep speed that module is set according to calculation processing unit CPU by the hardware scanning mistake, to be positioned at according to the state of processor internal bus that the data after the error correction are written back to external memory storage through the error correction and detection module again on the processor internal bus, realize wrong restore funcitons.
2, method for restoring star load computer hardware scanning error according to claim 1, it is characterized in that: the function that described hardware scanning mistake is recovered module is cured to processor inside by hardware description language, state according to the processor internal bus adopts state machine to realize, is divided into following several state:
A. idle condition, when the control of calculation processing unit CPU starts, the hardware scanning mistake is recovered module and is switched to next state from idle condition, i.e. the bus request state;
B. bus request state, the control of request processor internal bus is recovered module when the hardware scanning mistake and is obtained control, switches to next state from the bus request state, i.e. the data read state of a control;
C. data read state of a control, hardware scanning mistake recover that module is sent address signal by the processor internal bus and control signal is given the error correction and detection module, read state of a control then and switch to next state, and promptly data are read waiting status;
D. data are read waiting status, wait for the data of external memory storage, the data that are the external storage of error correction and detection module after with error correction and detection are sent on the processor internal bus, when the hardware scanning mistake is recovered the data that mould obtains external memory storage, data are read waiting status and are switched to next state, and promptly data are write waiting status;
E. data are write waiting status, the wait write operation is finished, and promptly recover the data that module will obtain external memory storage and be written back to external memory storage by the error correction and detection module when the hardware scanning mistake, thereby when write operation is finished, data are write waiting status and are switched to next state, i.e. idle condition;
So circulation, hardware scanning mistake recover module and finish promptly that the data after the error correction are written back to external storage on the processor internal bus with being positioned at.
3, method for restoring star load computer hardware scanning error according to claim 1 and 2, it is characterized in that: the locking that is transmitted as of processor internal bus data was transmitted when described hardware scanning mistake was recovered module scanning, guarantee that other bus apparatus can be when the hardware scanning mistake not be recovered some data cells of mould swept memory, revise data, cause error in data.
4, according to claim 1 or described method for restoring star load computer hardware scanning error, it is characterized in that: the function of described error correction and detection module is cured to processor inside by hardware description language.
5, according to claim 1 or 4 described method for restoring star load computer hardware scanning error, it is characterized in that: described error correction and detection module is selected 8 error correction and detection logics or 16 error correction and detection logics or 32 error correction and detection logics or 64 error correction and detection logics according to highway width, realizes by multiselect one switch.
6, method for restoring star load computer hardware scanning error according to claim 5 is characterized in that: described error correction and detection logic adopts the hamming algorithm to realize correcting 1 bit data, detects 2 bit data.
7, method for restoring star load computer hardware scanning error according to claim 1 is characterized in that: described processor internal bus adopts AMBA bus or WISHBONE bus.
8, method for restoring star load computer hardware scanning error according to claim 1 is characterized in that: described calculation processing unit CPU adopts x86 series or SPARC series or 8031 series or 8051 series microprocessors.
CN2008101180400A 2008-08-07 2008-08-07 Method for restoring star load computer hardware scanning error Active CN101349978B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101180400A CN101349978B (en) 2008-08-07 2008-08-07 Method for restoring star load computer hardware scanning error

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101180400A CN101349978B (en) 2008-08-07 2008-08-07 Method for restoring star load computer hardware scanning error

Publications (2)

Publication Number Publication Date
CN101349978A true CN101349978A (en) 2009-01-21
CN101349978B CN101349978B (en) 2010-09-08

Family

ID=40268782

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101180400A Active CN101349978B (en) 2008-08-07 2008-08-07 Method for restoring star load computer hardware scanning error

Country Status (1)

Country Link
CN (1) CN101349978B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101794622A (en) * 2010-02-10 2010-08-04 成都市华为赛门铁克科技有限公司 Data scanning method and device for storage device
CN104484272A (en) * 2014-12-10 2015-04-01 深圳航天东方红海特卫星有限公司 Satellite borne electronic system capable of being debugged on orbit and on-orbit debugging method
CN107885611A (en) * 2017-11-24 2018-04-06 西安微电子技术研究所 Can active write-back classification instruction memory architecture fault-tolerance approach and device
WO2019184612A1 (en) * 2018-03-30 2019-10-03 华为技术有限公司 Terminal and electronic device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101794622A (en) * 2010-02-10 2010-08-04 成都市华为赛门铁克科技有限公司 Data scanning method and device for storage device
CN101794622B (en) * 2010-02-10 2012-12-12 华为数字技术(成都)有限公司 Data scanning method and device for storage device
CN104484272A (en) * 2014-12-10 2015-04-01 深圳航天东方红海特卫星有限公司 Satellite borne electronic system capable of being debugged on orbit and on-orbit debugging method
CN104484272B (en) * 2014-12-10 2017-12-08 深圳航天东方红海特卫星有限公司 One kind can Debug on orbit satellite borne electronic system and Debug on orbit method
CN107885611A (en) * 2017-11-24 2018-04-06 西安微电子技术研究所 Can active write-back classification instruction memory architecture fault-tolerance approach and device
CN107885611B (en) * 2017-11-24 2021-02-19 西安微电子技术研究所 Fault-tolerant method and device for hierarchical instruction memory structure capable of actively writing back
WO2019184612A1 (en) * 2018-03-30 2019-10-03 华为技术有限公司 Terminal and electronic device

Also Published As

Publication number Publication date
CN101349978B (en) 2010-09-08

Similar Documents

Publication Publication Date Title
CN100347677C (en) Primary particle inversion resistant memory error correction and detection and automatic write back method for spacial computer
CN101349978B (en) Method for restoring star load computer hardware scanning error
EP0090175B1 (en) Memory system
CN102609334B (en) Nonvolatile flash memory is wiped abnormal memory block restorative procedure and device
CN100458692C (en) System and method for correcting fault of turn-on self-test
CN1761946A (en) Error detection and recovery within processing stages of an integrated circuit
CN1794196A (en) Securing time for identifying cause of asynchronism in fault-tolerant computer
CN101615147A (en) The skin satellite is based on the fault-tolerance approach of the memory module of FPGA
CN103150228A (en) Synthesizable pseudorandom verification method and device for high-speed buffer memory
WO2014183557A1 (en) Star sensor in-orbit maintenance method
CN106249840A (en) Power saving non-volatile microprocessor
CN101388256B (en) Controller and method for generating Low-level error-correction code for a memory device
CN220983766U (en) Periodic fault detection and repair circuit for dual-core lockstep
CN111858141B (en) System-on-chip memory control device and system-on-chip
CN103226977B (en) Quick NAND FLASH controller based on FPGA and control method thereof
CN105320575A (en) Self-checking and recovering device and method for dual-modular redundancy assembly lines
CN103279329B (en) The efficient fetching streamline supporting synchronous EDAC to verify
CN1716211A (en) Data error detects and corrects the positive and negative coding structure of intersection of usefulness and the method for decoding
CN103235921B (en) A kind of computer system
CN101521041A (en) Control circuit system based on nand gate structure memory
US20220100598A1 (en) Device, system and method to determine a structure of a crash log record
Zhang et al. Design and verification of sram self-detection repair based on ecc and bisr circuit
JPH07129427A (en) Comparative check method for data with ecc code
CN105511984A (en) Processor fault-tolerant structure based on active link backup data, and method thereof
CN110322979A (en) Nuclear power station digital control computer system core processing unit based on FPGA

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant