CN102521062A - Software fault-tolerant method capable of comprehensively on-line self-detection single event upset - Google Patents

Software fault-tolerant method capable of comprehensively on-line self-detection single event upset Download PDF

Info

Publication number
CN102521062A
CN102521062A CN2011103879089A CN201110387908A CN102521062A CN 102521062 A CN102521062 A CN 102521062A CN 2011103879089 A CN2011103879089 A CN 2011103879089A CN 201110387908 A CN201110387908 A CN 201110387908A CN 102521062 A CN102521062 A CN 102521062A
Authority
CN
China
Prior art keywords
fault
program code
ram
district
passage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011103879089A
Other languages
Chinese (zh)
Other versions
CN102521062B (en
Inventor
吴国春
吴化军
陶晓霞
徐丽娜
钟兴旺
王一唯
林梦园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Institute of Space Radio Technology
Original Assignee
Xian Institute of Space Radio Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Institute of Space Radio Technology filed Critical Xian Institute of Space Radio Technology
Priority to CN201110387908.9A priority Critical patent/CN102521062B/en
Publication of CN102521062A publication Critical patent/CN102521062A/en
Application granted granted Critical
Publication of CN102521062B publication Critical patent/CN102521062B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Hardware Redundancy (AREA)

Abstract

A software fault-tolerant method capable of comprehensively on-line self-detection single event upset comprises the steps of executing storage address interlinking configuration, a fault-tolerant processing parameter generation module, a fault-tolerant processing A module and a fault-tolerant processing B module, reading program storage data in direct memory access (DMA) subsection mode, dynamically generating fault-tolerant processing parameters through verification algorithm and conducting redundancy storage. The fault-tolerant processing B module is used for autonomously and timely monitoring application programs and operation of the fault-tolerant processing A module which is used for timely monitoring operation of the fault-tolerant processing B module, once the single event upset of the programs occurs, corresponding code segment is loaded from a read only memory (ROM), a purpose of conducting error correction of application program codes is achieved, the whole realization process is carried out in a DMA mode, no central processing unit (CPU) time is occupied, the programs is guaranteed to timely operate while conducting error correction, and reliability and safety of on-track operation of software is improved, simultaneously a large amount of hardware cost and time cost are saved, and efficiency is improved.

Description

Can be online comprehensively from the software fault-tolerant method that detects single-particle inversion
Technical field
The present invention relates to a kind of can be online comprehensively from the software fault-tolerant method that detects single-particle inversion, the single-particle inversion or other that are used for SPACE APPLICATION cause the error-detection error-correction of program code mistake.
Background technology
Primary particle inversion resistant design is mainly contained dual mode at present; The one, cooperate application programs to carry out EDAC error-detection error-correction or software through hardware designs software and directly single-particle inversion is almost being moved on the high-grade device of immunity; The 2nd, do not rely on hardware designs, the mode application programs that adopts pure software to design is carried out the error-detection error-correction of single-particle inversion.
In recent years, reach recognized on the open channel following based on memory RAM anti-single particle overturn incident protection design conditions both at home and abroad from open publication:
Scheme (1) adopts the error-detection error-correction digital coding that single-particle inversion is detected and corrects, like parity check code, CRC, Hamming code, R-S sign indicating number etc.Based on this measure, mainly take following dual mode at present both at home and abroad:
A) itself carry the cpu chip that EDAC designs.Like TSC695 chip, AT697 chip series etc.
B) the EDAC verification of DSP chip external memory+FPGA (or ASIC) realization.But it can not carry out error-detection error-correction to the DSP internal memory space and handle.
This scheme mainly realizes the single-particle inversion error-detection error-correction function to chip external memory, and shortcoming is can not carry out single-particle inversion error-detection error-correction function to processor internal program memory space.
Scheme (2) realizes the software error-detection error-correction through structural redundancy TMR.Adopt three processor chips to move identical program simultaneously; Control through the selection of processor chip mouth by main control computer; Address bus and data are carried out three select a judgement, when voting circuit detects a bit-errors, error correction service subprogram startup work; Come the errorlevel that identifies according to voting circuit and self-check program, the type of error repair system returns normally.This mode preferably resolves the single-particle inversion problem, and is big but its shortcoming is a hardware spending, needs hardware redundancy, because hardware configuration is complicated, brought a series of integrity problems simultaneously.And this paper adopts the pure software design, need not revise hardware platform and can realize primary particle inversion resistant function.But list of references: Single Event Upset Characterization of the SMJ320C6701Digital Signal Processor Using Proton Irradiation; David M.Hiemstra; SeniorMember IEEE; Bojan Miladinovic, and Fayez Chayab;
Scheme (3) program code be directly fixed in to single-particle inversion almost the immunity PROM in move.But most of digital signal processing software function performances can't be moved in PROM owing to limited by the PROM travelling speed, must move in DSP internal RAM high speed, and this method solve the anti-single particle overturn problem of DSP internal RAM exactly.
Scheme (4) adopts application software periodic refreshing mode to carry out the program anti-single particle overturn.This scheme takes software timing to load the mode that refreshes; Owing to all want whole load softwares at every turn; If refresh time is long at interval, then in interim, can't realize the anti-single particle overturn function, if refresh too fast; Then, not only influence the real time execution of program but also can produce some integrity problems because the frequent fetch program from PROM writes RAM.Present spaceborne software generally be at least with hour or minute the order of magnitude refresh once.Compare with this scheme; This method is just from PROM, to load the purpose that the appropriate section code reaches error correction after the error detection of application programs single-particle inversion, and therefore, the error correction of this method is superior to scheme (4) opportunity; And scheme (4) is owing to be periodic refreshing; Can't add up the number of times of single-particle inversion, and the present invention can add up, for subsurface passes abundant single-particle inversion information.
Scheme (5) is based on the anti-single particle technology of part software reconfiguration.The error correcting routine realization application programs that this scheme is taked regularly to load among the read-only ROM is carried out error-detection error-correction; Application program is in operational process, after keeping the scene intact, through loading the error correction inspection of error correction code realization application programs code; Carry out the scene after the inspection and recover, continue operation.Will be from PROM before this scheme refreshes at every turn loading procedure; The problem that also has the refresh cycle; To realizing anti-single particle overturn error-detection error-correction function in interim; Simultaneously because this error correcting routine, is not suitable for high-speed digital signal processor such as the DSP 6X series of present space extensive application etc. to program storage realizations of directly programming.Referring to patented claim: 201010527687.6.
In open source literature, do not see as yet at present and utilize dma mode, adopt the pure software design, it is online comprehensively from the software fault-tolerant method that detects single-particle inversion to utilize the two redundancies of fault-tolerant processing that processor internal processes code is carried out.
Summary of the invention
Technology of the present invention is dealt with problems: the deficiency that overcomes prior art; Provide a kind of can be online comprehensively from the software fault-tolerant method that detects single-particle inversion; Realized the processor program storer software that has the DMA function in the SPACE APPLICATION is carried out comprehensively online from detecting the error-detection error-correction of single-particle inversion mistake, thereby improved software reliability and security in orbit.
Technical solution of the present invention: can be online comprehensively from the software fault-tolerant method that detects single-particle inversion; Its characteristics are to comprise: the execution of storage address link configuration, fault-tolerant processing parameter generation module, fault-tolerant processing A module and fault-tolerant processing B module, and step is following:
(1) storage address link configuration
Storage address is divided into ROM_A district, ROM_B, according to the RAM_A of the corresponding program storage of compiling link difference, RAM_B; Said ROM_A only places the program code of fault-tolerant processing B module in the district, and the program code of all application programs is placed in said ROM_B district, and the program code of fault-tolerant processing parameter generation module and fault-tolerant processing A module;
(2) fault-tolerant processing parameter generation module
Two passage DMA1 through DMA and DMA2 are respectively with RAM_A; The program code of RAM_B is moved the corresponding data field of program code; Dynamically generate fault-tolerant processing A module and all checking parameters of fault-tolerant processing B module through checking algorithm, and respectively checking parameter is carried out redundant storage.
After having the dma processor chip power or resetting, when initialization, call fault-tolerant processing parameter generating function and dynamically generate fault-tolerant parameter.This program is only called execution once after processor powers on or resets, the fault-tolerant parameter of generation is stored among the RAM through Redundancy Design;
(3) fault-tolerant processing A module
Fault-tolerant processing A is used for the autonomous operation of monitoring facilities memory RAM _ A district internal program code in real time; In case this program code generation single-particle inversion mistake; Then write down said error message; And the startup error correction, program code read covers RAM_A district program code from ROM_A, guarantees the program code safety and Health operation of RAM_A district;
(4) fault-tolerant processing B module
Fault-tolerant processing B is used for the autonomous operation of monitoring facilities memory RAM _ B district internal program code in real time; In case this program code generation single-particle inversion mistake; Then write down said error message; And the startup error correction, program code read covers RAM_B district program code from ROM_B, guarantees the program code safety and Health operation of RAM_B district;
(5) obtain start address, the program memory ROM _ B district in above-mentioned steps (1) program memory ROM _ A district, RAM_A district, the start address in RAM_B district; Read through program code information in the MAP table of the generation behind the compiling link; Check the program code length in RAM_A district, RAM_B district; These information are called as the formal parameter in fault-tolerant processing parameter generation module, fault-tolerant processing A module and the fault-tolerant processing B module, thereby canbe used on line is from detecting the single-particle inversion mistake.
The concrete implementation procedure of said step (2) is following:
(21) at first start the DMA1 passage, from the program storage RAM_B district that CPU not directly reads and writes, move program code to the data field;
(22) move and dynamically generate the checking parameter that the RAM_B district moves program code through checking algorithm after accomplishing;
(23) through the redundant storage method checking parameter is stored in the data RAM;
(24) start the DMA2 passage, from the program storage RAM_A district that CPU not directly reads and writes, move program code to the data field;
(25) move and dynamically generate the checking parameter that the RAM_A district moves program code through checking algorithm after accomplishing;
(26) through the balance method for storing with verification be stored in the data RAM.
The concrete implementation procedure of said step (3) fault-tolerant processing A module is following:
(31) move function
Whether the program code of checking the DMA1 passage satisfies program is moved condition, if satisfy, the controller parameter of DMA1 passage is set then; Said controller parameter comprises the master control and the setting of secondary control register of source address, destination address, DMA1 passage; Source address is set to RAM_A district start address, and destination address is set to the data field, and the master control of startup DMA1 passage is carried out program with secondary control register and moved; Program is moved condition setting for moving; But the program repairing condition of DMA1 passage is set, and the work of moving is accomplished by the DMA1 passage automatically, move accomplish after the source address field program code all copy the destination address data field to;
(32) verification monitoring function
A. after the moving function and accomplish of DMA1 passage, carry out the error detection of single-particle inversion mistake, dynamically generate the checking parameter of moving program code, said checking algorithm and claim 1 step (2) consistent through checking algorithm;
B. the checking parameter to the fault-tolerant processing A module of dynamic generation carries out refresh process, and the checking parameter that reaches redundant storage is consistent, obtains the checking parameter of redundant storage;
C. the checking parameter of b step and the checking parameter in a step are compared,, then assert the single-particle inversion mistake has taken place, reach monitoring function if inconsistent;
(33) error correction
A. comparative result is inconsistent among the above-mentioned steps c, and the source address that then starts the DMA1 passage is set to ROM_A district start address, and destination address is set to RAM_A district start address; The program that the DMA1 passage is set can not repairing condition; Master control through the DMA1 passage is moved with the program code that secondary control register is provided with startup DMA1 passage; The original program code of ROM_A district storage is loaded into RAM; Cover original unusual program code, thereby accomplish the error correction of single-particle inversion mistake;
B. accomplish in error correction, the DMA1 passage is set for can move condition, DMA1 passage repairing condition is set for repairing, the error correction to the inspection of RAM_A district program code is accomplished in circulation.
The concrete implementation procedure of said step (4) fault-tolerant processing B module is following:
(41) move function
Whether the program code of checking the DMA2 passage satisfies program is moved condition, if satisfy, the controller parameter of DMA2 passage is set then; Said controller parameter comprises the master control and the setting of secondary control register of source address, destination address, DMA2 passage; Source address is set to RAM_B district start address, and destination address is set to the data field, and the master control of startup DMA2 passage is carried out program with secondary control register and moved; Program is moved condition setting for moving; But the program repairing condition of DMA2 passage is set, and the work of moving is accomplished by the DMA2 passage automatically, move accomplish after the source address field program code all copy the destination address data field to;
(42) verification monitoring function
A. after the moving function and accomplish of DMA2 passage, carry out the error detection of single-particle inversion mistake, dynamically generate the checking parameter of moving program code, said checking algorithm and claim 1 step (2) consistent through checking algorithm;
B. the checking parameter to the fault-tolerant processing B module of dynamic generation carries out refresh process, and the checking parameter that reaches redundant storage is consistent, obtains the checking parameter of redundant storage;
C. the checking parameter of b step and the checking parameter in a step are compared,, then assert the single-particle inversion mistake has taken place, reach monitoring function if inconsistent;
(43) error correction
A. comparative result is inconsistent among the above-mentioned steps c, and the source address that then starts the DMA1 passage is set to ROM_B district start address, and destination address is set to RAM_B district start address; The program that DMA 2 passages are set can not repairing condition; Master control through the DMA2 passage is moved with the program code that secondary control register is provided with startup DMA 2 passages; The original program code of ROM_B district storage is loaded into RAM; Cover original unusual program code, thereby accomplish the error correction of single-particle inversion mistake;
B. accomplish in error correction, the DMA1 passage is set for can move condition, DMA1 passage repairing condition is set for repairing, the error correction to the inspection of RAM_A district program code is accomplished in circulation.
Checking algorithm in the said step (2) is XOR and checking algorithm, adds up and, or CRC check method.
Redundant storage in the said step (2) is 3 to get 2 redundant storage.
The present invention's advantage compared with prior art is:
(1) to be primarily aimed at target be the program storage that CPU not directly reads and writes in the present invention; Adopt pure software to realize this program storage code is carried out error-detection error-correction; Main process is carried out through dma mode; Seldom take the CPU time, purpose is in error-detection error-correction, can guarantee the program code real time execution.In open source literature, do not see as yet at present and utilize dma mode and the two redundancies of fault-tolerant processing processor internal processes code to be carried out online comprehensively from the software fault-tolerant situation that detects single-particle inversion; Therefore the present invention has certain novelty and creativeness; Save a large amount of hardware costs and time cost simultaneously, improved efficient;
(2) prior art mainly is that application software periodic refreshing mode or part software reconfiguration mode are carried out the anti-single particle overturn design.This technology has limitation, if interval length is set, in interim, can't realize the anti-single particle overturn function; If weak point is set interval, then can influence the real time execution of program again because of frequent loading.And among the present invention to error-detection error-correction opportunity of program storage single-particle inversion by the decision of application call cycle, so error correction of the present invention is superior to prior art opportunity;
(3) prior art is carried out the program anti-single particle overturn through periodic refreshing or software reconfiguration; Owing to there is interval; Prior art statistical space single-particle inversion number of times accuracy is low; And the present invention was determined by the application call cycle, and statistics single-particle inversion number of times is comprehensive, can be that subsurface passes abundant single-particle inversion information.
Description of drawings
Fig. 1 is that situation map is moved in ROM of the present invention space and internal memory space mapping;
Fig. 2 moves situation map for DMA of the present invention;
Fig. 3 is a fault-tolerant processing module invokes situation map of the present invention;
Fig. 4 is the dynamic generating function implementation status of fault-tolerant processing parameter of the present invention figure;
Fig. 5 is a fault-tolerant processing A flowchart of the present invention;
Fig. 6 is a fault-tolerant processing B flowchart of the present invention.
Embodiment
Below through using more DSP 6X series processors to carry out the practical implementation explanation.
1, carries out storage address link configuration
Behind the dsp processor chip reset,, adopt the ROM loading mode, the code in ROM space is all moved on the program memory space address through processor chips according to the memory mapped relation.Shown in description of drawings Fig. 1.
The corresponding situation of load address segmentation: the program code that comprises all application programs is placed in the ROM_B district, and the program code of fault-tolerant processing parameter generation module and fault-tolerant processing A module, is carried in corresponding program storage RAM_B district during program run; The program code of fault-tolerant processing B module is placed in the ROM_A district, is loaded into corresponding program storage RAM_A district during program run, and ROM_B district and ROM_A district the basis size of institute's load module amount are separately confirmed.
2, the fault-tolerant processing parameter generates
Fault-tolerant processing parameter generation module only carries out a dynamic parameter and produces at cpu reset or when powering on, this design makes the maximum modularization of redundant correcting program.The situation of calling of fault-tolerant processing parameter generation module is void xor_init (UNIT32 *PAdd1, UNIT32 *PAdd2, UNIT32pPro_len1, UNIT32 pPro_len2); Wherein pAdd1 is a RAM_A district start address; Wherein pAdd2 is a RAM_B district start address, and pPro_len1 is that RAM_A area code length, pPro_len2 are RAM_B area code length, and invoked procedure is shown in description of drawings Fig. 3.Fault-tolerant processing parameter generation module implementation status is mainly carried out following process shown in description of drawings Fig. 4:
(1) the DMA1 passage at first is set; Source address is a RAM_A district start address; Destination address is data field Shuo Zudizhi &xor_ram_data_sig; DMA1 master control, secondary control register are set, start the DMA1 passage, from the program storage RAM_A district that CPU not directly reads and writes, move program code to the data field;
(2) move and accomplish the back and calculate the checking parameter that the RAM_A district moves program code, the checking algorithm of employing be XOR and checking algorithm (also can adopt add up and, or CRC check method calculation check parameter);
(3) through the 32 redundant storage methods of getting checking parameter is stored in Xor_Single_Sum in the data RAM;
(4) the DMA2 passage is set; Source address is a RAM_B district start address; Destination address is data field Shuo Zudizhi &xor_ram_data_sig; DMA2 master control, secondary control register are set, start the DMA2 passage, from the program storage RAM_B district that CPU not directly reads and writes, move program code to the data field;
(5) move and accomplish the back and calculate the checking parameter that the RAM_B district moves program code, the checking algorithm of employing be XOR and checking algorithm (also can adopt add up and, or CRC check method calculation check parameter);
(6) through the 32 redundant storage methods of getting checking parameter is stored in Xor_All_Sum in the data RAM;
3, fault-tolerant processing A module is carried out
Application code periodically calls fault-tolerant processing A module, and the call parameters situation of fault-tolerant processing A module is void sig_code_To_Ram (UNIT32 *PAdd1, UNIT32 pPro_len1), parameter situation: pAdd1 is that RAM_A area code start address, pPro_len1 are RAM_A area code length, invoked procedure is shown in description of drawings Fig. 3.Fault-tolerant processing A module implementation status is shown in description of drawings Fig. 5, and function realizes comprising that program code is moved, verification monitoring, error correction, and concrete implementation procedure is following:
(1) program code is moved
Whether the program code of a. checking the DMA1 passage satisfies program is moved condition (0xAA is set for satisfying, and initialization be 0xAA for the first time), if satisfied; The controller parameter of DMA1 passage then is set; Source address is set to RAM_A district start address, and destination address is set to the data field, and the master control of startup DMA1 passage is carried out program with secondary control register and moved; Program is moved condition setting for can not move (being set to 0x0)
But b. the program repairing condition (being set to 0xAA) of DMA 1 passage is set, and the work of moving is accomplished by the DMA1 passage automatically, move accomplish after the source address field program code all copy the destination address data field to;
(2) program code verification monitoring
A. after the moving function and accomplish of DMA1 passage; Dynamically generate the checking parameter that the RAM_A district moves program code through checking algorithm; The checking algorithm that adopts be XOR with checking algorithm (also can adopt add up and; Checking algorithm when or CRC check method calculation check parameter), guaranteeing that this step checking algorithm generates with the fault-tolerant processing parameter is consistent;
B. the checking parameter Xor_Single_Sum to the fault-tolerant processing A module of dynamic generation carries out refresh process, and the checking parameter that reaches redundant storage is consistent, obtains the checking parameter of redundant storage;
C. the checking parameter of b step and the checking parameter in a step are compared,, then assert the single-particle inversion mistake has taken place, reach monitoring function if inconsistent;
(3) program code error correction
A. comparative result is inconsistent among the above-mentioned steps c, and the source address that then starts the DMA1 passage is set to ROM_A district start address, and destination address is set to RAM_A district start address; The program that DMA 1 passage is set can not repairing condition (being set to 0x0); Master control through the DMA1 passage is moved with the program code that secondary control register is provided with startup DMA 1 passage; The original program code of ROM_A district storage is loaded into RAM; Cover original unusual program code, thereby accomplish the error correction of single-particle inversion mistake;
B. accomplish in error correction, the DMA1 passage is set for can move condition (being set to 0xAA), DMA1 passage repairing condition is set for can repair (being set to 0xAA), the error correction to the inspection of RAM_A district program code is accomplished in circulation.
4, fault-tolerant processing B module is carried out
Application code periodically calls fault-tolerant processing B module, and the call parameters situation of fault-tolerant processing B module is void all_code_To_Ram (UNIT32 *PAdd2, UNIT32 pPro_len2), parameter situation: pAdd2 is that RAM_B area code start address, pPro_len2 are RAM_B area code length, invoked procedure is shown in description of drawings Fig. 3.Fault-tolerant processing B module implementation status is shown in description of drawings Fig. 6, and function realizes comprising that program code is moved, verification monitoring, error correction, and concrete implementation procedure is following:
(1) program code is moved
Whether the program code of a. checking the DMA2 passage satisfies program is moved condition (0xAA is set for satisfying, and initialization be 0xAA for the first time), if satisfied; The controller parameter of DMA2 passage then is set; Source address is set to RAM_B district start address, and destination address is set to the data field, and the master control of startup DMA2 passage is carried out program with secondary control register and moved; Program is moved condition setting for can not move (being set to 0x0)
But b. the program repairing condition (being set to 0xAA) of DMA 2 passages is set, and the work of moving is accomplished by the DMA2 passage automatically, move accomplish after the source address field program code all copy the destination address data field to;
(2) program code verification monitoring
A. after the moving function and accomplish of DMA2 passage; Dynamically generate the checking parameter of moving program code through checking algorithm; The checking algorithm that adopts be XOR with checking algorithm (also can adopt add up and; Checking algorithm when or CRC check method calculation check parameter), guaranteeing that this step checking algorithm generates with the fault-tolerant processing parameter is consistent;
B. the checking parameter Xor_All_Sum to the fault-tolerant processing B module of dynamic generation carries out refresh process, and the checking parameter that reaches redundant storage is consistent, obtains the checking parameter of redundant storage;
C. the checking parameter of b step and the checking parameter in a step are compared,, then assert the single-particle inversion mistake has taken place, reach monitoring function if inconsistent;
(3) program code error correction
A. comparative result is inconsistent among the above-mentioned steps c, and the source address that then starts the DMA2 passage is set to ROM_B district start address, and destination address is set to RAM_B district start address; The program that DMA 2 passages are set can not repairing condition (being set to 0x0); Master control through the DMA2 passage is moved with the program code that secondary control register is provided with startup DMA 2 passages; The original program code of ROM_B district storage is loaded into RAM; Cover original unusual program code, thereby accomplish the error correction of single-particle inversion mistake;
B. accomplish in error correction, the DMA2 passage is set for can move condition (being set to 0xAA), DMA2 passage repairing condition is set for can repair (being set to 0xAA), the error correction to the inspection of RAM_B district program code is accomplished in circulation.
Foregoing has been introduced error detection, error correction to RAM district single-particle inversion with DSP internal RAM code area, and in fact through changing the parameter input, this method is applicable to the error-detection error-correction of all RAM area codes, data, to the error correction of data field constant.The situation that the application programs code is bigger can be carried out the segmentation error correction and detection, to guarantee the real-time of application code operation, reaches the single-particle inversion error detection to code, the purpose of error correction simultaneously.

Claims (6)

1. can be online comprehensively from the software fault-tolerant method that detects single-particle inversion, it is characterized in that comprising: the execution of storage address link configuration, fault-tolerant processing parameter generation module, fault-tolerant processing A module and fault-tolerant processing B module, step is following:
(1) storage address link configuration
Storage address is divided into ROM_A district, ROM_B, according to the RAM_A of the corresponding program storage of compiling link difference, RAM_B; Said ROM_A only places the program code of fault-tolerant processing B module in the district, and the program code of all application programs is placed in said ROM_B district, and the program code of fault-tolerant processing parameter generation module and fault-tolerant processing A module;
(2) fault-tolerant processing parameter generation module
Two passage DMA1 through DMA and DMA2 are respectively with RAM_A; The program code of RAM_B is moved the corresponding data field of program code; Dynamically generate fault-tolerant processing A module and all checking parameters of fault-tolerant processing B module through checking algorithm, and respectively checking parameter is carried out redundant storage;
(3) fault-tolerant processing A module
Fault-tolerant processing A is used for the autonomous operation of monitoring facilities memory RAM _ A district internal program code in real time; In case this program code generation single-particle inversion mistake; Then write down said error message; And the startup error correction, program code read covers RAM_A district program code from ROM_A, guarantees the program code safety and Health operation of RAM_A district;
(4) fault-tolerant processing B module
Fault-tolerant processing B is used for the autonomous operation of monitoring facilities memory RAM _ B district internal program code in real time; In case this program code generation single-particle inversion mistake; Then write down said error message; And the startup error correction, program code read covers RAM_B district program code from ROM_B, guarantees the program code safety and Health operation of RAM_B district;
(5) obtain start address, the storer ROM_B district in above-mentioned steps (1) storer ROM_A district, RAM_A district, the start address in RAM_B district; Read through program code information in the MAP table of the generation behind the compiling link; Check the program code length in RAM_A district, RAM_B district; These information are participated in calling as the reality in fault-tolerant processing parameter generation module, fault-tolerant processing A module and the fault-tolerant processing B module, thus the error-detection error-correction of canbe used on line whole procedure code single-particle inversion mistake comprehensively.
2. according to claim 1 can be online comprehensively from the software fault-tolerant method that detects single-particle inversion, it is characterized in that: the concrete implementation procedure of said step (2) is following:
(21) at first start the DMA1 passage, from the program storage RAM_B district that CPU not directly reads and writes, move program code to the data field;
(22) move and dynamically generate the checking parameter that the RAM_B district moves program code through checking algorithm after accomplishing;
(23) through the redundant storage method checking parameter is stored in the data RAM;
(24) start the DMA2 passage, from the program storage RAM_A district that CPU not directly reads and writes, move program code to the data field;
(25) move and dynamically generate the checking parameter that the RAM_A district moves program code through checking algorithm after accomplishing;
(26) through the balance method for storing with verification be stored in the data RAM.
3. according to claim 1 can be online comprehensively from the software fault-tolerant method that detects single-particle inversion, it is characterized in that: the concrete implementation procedure of said step (3) fault-tolerant processing A module is following:
(31) move function
Whether the program code of checking the DMA1 passage satisfies program is moved condition, if satisfy, the controller parameter of DMA1 passage is set then; Said controller parameter comprises the master control and the setting of secondary control register of source address, destination address, DMA1 passage; Source address is set to RAM_A district start address, and destination address is set to the address that the data field has defined, and the master control of startup DMA1 passage is carried out program with secondary control register and moved; Program is moved condition setting for moving; But the program repairing condition of DMA 1 passage is set, and the work of moving is accomplished by the DMA1 passage automatically, move accomplish after the source address field program code all copy the destination address data field to;
(32) verification monitoring function
A. after the moving function and accomplish of DMA1 passage, carry out the error detection of single-particle inversion mistake, dynamically generate the checking parameter of moving program code, said checking algorithm and claim 1 step (2) consistent through checking algorithm;
B. the checking parameter to the fault-tolerant processing A module of dynamic generation carries out refresh process, and the checking parameter that reaches redundant storage is consistent, obtains the checking parameter of redundant storage;
C. the checking parameter of b step and the checking parameter in a step are compared,, then assert the single-particle inversion mistake has taken place, reach monitoring function if inconsistent;
(33) error correction
A. comparative result is inconsistent among the above-mentioned steps c, and the source address that then starts the DMA1 passage is set to ROM_A district start address, and destination address is set to RAM_A district start address; The program that DMA 1 passage is set can not repairing condition; Master control through the DMA1 passage is moved with the program code that secondary control register is provided with startup DMA 1 passage; The original program code of ROM_A district storage is loaded into RAM; Cover original unusual program code, thereby accomplish the error correction of single-particle inversion mistake;
B. accomplish in error correction, the DMA1 passage is set for can move condition, DMA1 passage repairing condition is set for repairing, the error correction to the inspection of RAM_A district program code is accomplished in circulation.
4. according to claim 1 can be online comprehensively from the software fault-tolerant method that detects single-particle inversion, it is characterized in that: the concrete implementation procedure of said step (4) fault-tolerant processing B module is following:
(41) move function
Whether the program code of checking the DMA2 passage satisfies program is moved condition, if satisfy, the controller parameter of DMA2 passage is set then; Said controller parameter comprises the master control and the setting of secondary control register of source address, destination address, DMA2 passage; Source address is set to RAM_B district start address, and destination address is set to the address that the data field has defined, and the master control of startup DMA2 passage is carried out program with secondary control register and moved; Program is moved condition setting for moving; But the program repairing condition of DMA 2 passages is set, and the work of moving is accomplished by the DMA2 passage automatically, move accomplish after the source address field program code all copy the destination address data field to;
(42) verification monitoring function
A. after the moving function and accomplish of DMA2 passage, carry out the error detection of single-particle inversion mistake, dynamically generate the checking parameter of moving program code, said checking algorithm and claim 1 step (2) consistent through checking algorithm;
B. the checking parameter to the fault-tolerant processing B module of dynamic generation carries out refresh process, and the checking parameter that reaches redundant storage is consistent, obtains the checking parameter of redundant storage;
C. the checking parameter of b step and the checking parameter in a step are compared,, then assert the single-particle inversion mistake has taken place, reach monitoring function if inconsistent;
(43) error correction
A. comparative result is inconsistent among the above-mentioned steps c, and the source address that then starts the DMA1 passage is set to ROM_B district start address, and destination address is set to RAM_B district start address; The program that DMA 2 passages are set can not repairing condition; Master control through the DMA2 passage is moved with the program code that secondary control register is provided with startup DMA 2 passages; The original program code of ROM_B district storage is loaded into RAM; Cover original unusual program code, thereby accomplish the error correction of single-particle inversion mistake;
B. accomplish in error correction, the DMA1 passage is set for can move condition, DMA1 passage repairing condition is set for repairing, the error correction to the inspection of RAM_A district program code is accomplished in circulation.
5. according to claim 1 can be online comprehensively from the software fault-tolerant method that detects single-particle inversion, it is characterized in that: the checking algorithm in the said step (2) is XOR and checking algorithm, adds up and, or CRC check method.
6. according to claim 1 can be online comprehensively from the software fault-tolerant method that detects single-particle inversion, it is characterized in that: the redundant storage in the said step (2) is 3 to get 2 redundant storage.
CN201110387908.9A 2011-11-29 2011-11-29 Software fault-tolerant method capable of comprehensively on-line self-detection single event upset Active CN102521062B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110387908.9A CN102521062B (en) 2011-11-29 2011-11-29 Software fault-tolerant method capable of comprehensively on-line self-detection single event upset

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110387908.9A CN102521062B (en) 2011-11-29 2011-11-29 Software fault-tolerant method capable of comprehensively on-line self-detection single event upset

Publications (2)

Publication Number Publication Date
CN102521062A true CN102521062A (en) 2012-06-27
CN102521062B CN102521062B (en) 2015-02-11

Family

ID=46291997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110387908.9A Active CN102521062B (en) 2011-11-29 2011-11-29 Software fault-tolerant method capable of comprehensively on-line self-detection single event upset

Country Status (1)

Country Link
CN (1) CN102521062B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246581A (en) * 2013-04-26 2013-08-14 杭州和利时自动化有限公司 Redundant electronic system channel diagnosis method and device
CN103678123A (en) * 2013-11-29 2014-03-26 西安空间无线电技术研究所 Method applied to recognizing vulnerabilities of single-event soft errors in processor systems
CN103984630A (en) * 2014-05-27 2014-08-13 中国科学院空间科学与应用研究中心 Single event upset fault processing method based on AT697 processor
CN104898477A (en) * 2015-04-09 2015-09-09 北京空间飞行器总体设计部 Method for satellite spread-spectrum transponder to independently resist space single-event upset fault
CN105446842A (en) * 2015-12-03 2016-03-30 南京南瑞继保电气有限公司 ADI DSP code online monitoring method
CN106528312A (en) * 2016-09-29 2017-03-22 北京广利核系统工程有限公司 FPGA-based fault repairing method and device
CN108255636A (en) * 2017-12-13 2018-07-06 太原航空仪表有限公司 A kind of anti-single particle overturning system and its application method
CN108804028A (en) * 2018-04-20 2018-11-13 江苏华存电子科技有限公司 Data guard method in a kind of storage device
CN109976962A (en) * 2019-03-10 2019-07-05 国家卫星气象中心(国家空间天气监测预警中心) A kind of FPGA single particle overturning means of defence and system for FY-4A satellite Lightning Imaging Sensor
CN112035290A (en) * 2020-09-18 2020-12-04 上海无线电设备研究所 Single event upset resistance method for satellite-borne digital signal processor
CN112115017A (en) * 2020-08-07 2020-12-22 航天科工空间工程发展有限公司 Logic code monitoring method and device of satellite-borne software program
CN112181709A (en) * 2020-09-08 2021-01-05 国电南瑞科技股份有限公司 RAM storage area single event effect fault tolerance method of FPGA chip
CN112256463A (en) * 2020-09-30 2021-01-22 北京控制工程研究所 Single-particle soft error processing method for ensuring content consistency of Cache and off-chip memory
CN112328396A (en) * 2020-11-09 2021-02-05 西安电子科技大学 Dynamic self-adaptive SOPC fault-tolerant method based on task level
CN113687871A (en) * 2021-05-28 2021-11-23 西安空间无线电技术研究所 Anti-deadlock method and device for start of satellite-borne microprocessor
CN113721135A (en) * 2021-07-22 2021-11-30 南京航空航天大学 SRAM type FPGA fault online fault tolerance method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020138669A1 (en) * 2001-03-21 2002-09-26 Kadatch Andrew V. Hibernation of computer systems
CN1604057A (en) * 2003-09-30 2005-04-06 国际商业机器公司 Method and system for hardware enforcement of logical partitioning of a channel adapter's resources in a system area network
CN101273338A (en) * 2005-09-30 2008-09-24 英特尔公司 DMA transfers of sets of data and an exclusive or (xor) of the sets of data
US20110099301A1 (en) * 2009-10-28 2011-04-28 Moallem Maziar H Using Central Direct Memory Access (CDMA) Controller to Test Integrated Circuit

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020138669A1 (en) * 2001-03-21 2002-09-26 Kadatch Andrew V. Hibernation of computer systems
CN1604057A (en) * 2003-09-30 2005-04-06 国际商业机器公司 Method and system for hardware enforcement of logical partitioning of a channel adapter's resources in a system area network
CN101273338A (en) * 2005-09-30 2008-09-24 英特尔公司 DMA transfers of sets of data and an exclusive or (xor) of the sets of data
US20110099301A1 (en) * 2009-10-28 2011-04-28 Moallem Maziar H Using Central Direct Memory Access (CDMA) Controller to Test Integrated Circuit

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246581B (en) * 2013-04-26 2016-05-04 杭州和利时自动化有限公司 A kind of redundant electric subsystem passage diagnostic method and device
CN103246581A (en) * 2013-04-26 2013-08-14 杭州和利时自动化有限公司 Redundant electronic system channel diagnosis method and device
CN103678123A (en) * 2013-11-29 2014-03-26 西安空间无线电技术研究所 Method applied to recognizing vulnerabilities of single-event soft errors in processor systems
CN103678123B (en) * 2013-11-29 2016-08-17 西安空间无线电技术研究所 One is applicable to processor system single-particle soft error tender spots recognition methods
CN103984630A (en) * 2014-05-27 2014-08-13 中国科学院空间科学与应用研究中心 Single event upset fault processing method based on AT697 processor
CN103984630B (en) * 2014-05-27 2017-02-01 中国科学院空间科学与应用研究中心 Single event upset fault processing method based on AT697 processor
CN104898477A (en) * 2015-04-09 2015-09-09 北京空间飞行器总体设计部 Method for satellite spread-spectrum transponder to independently resist space single-event upset fault
CN104898477B (en) * 2015-04-09 2016-10-19 北京空间飞行器总体设计部 A kind of satellite spread spectrum answering machine autonomous anti-space single-particle inversion fault method
CN105446842B (en) * 2015-12-03 2019-01-04 南京南瑞继保电气有限公司 A kind of ADI DSP code in-service monitoring method
CN105446842A (en) * 2015-12-03 2016-03-30 南京南瑞继保电气有限公司 ADI DSP code online monitoring method
CN106528312B (en) * 2016-09-29 2019-07-12 北京广利核系统工程有限公司 Fault repairing method and device based on FPGA
CN106528312A (en) * 2016-09-29 2017-03-22 北京广利核系统工程有限公司 FPGA-based fault repairing method and device
CN108255636A (en) * 2017-12-13 2018-07-06 太原航空仪表有限公司 A kind of anti-single particle overturning system and its application method
CN108804028A (en) * 2018-04-20 2018-11-13 江苏华存电子科技有限公司 Data guard method in a kind of storage device
CN109976962A (en) * 2019-03-10 2019-07-05 国家卫星气象中心(国家空间天气监测预警中心) A kind of FPGA single particle overturning means of defence and system for FY-4A satellite Lightning Imaging Sensor
CN109976962B (en) * 2019-03-10 2023-10-20 国家卫星气象中心(国家空间天气监测预警中心) FPGA single event upset protection method and system for FY-4A satellite lightning imager
CN112115017B (en) * 2020-08-07 2022-07-12 航天科工空间工程发展有限公司 Logic code monitoring method and device of satellite-borne software program
CN112115017A (en) * 2020-08-07 2020-12-22 航天科工空间工程发展有限公司 Logic code monitoring method and device of satellite-borne software program
CN112181709A (en) * 2020-09-08 2021-01-05 国电南瑞科技股份有限公司 RAM storage area single event effect fault tolerance method of FPGA chip
CN112181709B (en) * 2020-09-08 2022-11-11 国电南瑞科技股份有限公司 RAM storage area single event effect fault tolerance method of FPGA chip
CN112035290A (en) * 2020-09-18 2020-12-04 上海无线电设备研究所 Single event upset resistance method for satellite-borne digital signal processor
CN112256463B (en) * 2020-09-30 2023-07-14 北京控制工程研究所 Single-particle soft error processing method for guaranteeing consistency of Cache and content of off-chip memory
CN112256463A (en) * 2020-09-30 2021-01-22 北京控制工程研究所 Single-particle soft error processing method for ensuring content consistency of Cache and off-chip memory
CN112328396B (en) * 2020-11-09 2022-10-21 西安电子科技大学 Dynamic self-adaptive SOPC fault-tolerant method based on task level
CN112328396A (en) * 2020-11-09 2021-02-05 西安电子科技大学 Dynamic self-adaptive SOPC fault-tolerant method based on task level
CN113687871A (en) * 2021-05-28 2021-11-23 西安空间无线电技术研究所 Anti-deadlock method and device for start of satellite-borne microprocessor
CN113687871B (en) * 2021-05-28 2024-05-03 西安空间无线电技术研究所 Method and device for starting up and preventing deadlock of satellite-borne microprocessor
CN113721135A (en) * 2021-07-22 2021-11-30 南京航空航天大学 SRAM type FPGA fault online fault tolerance method
CN113721135B (en) * 2021-07-22 2022-05-13 南京航空航天大学 SRAM type FPGA fault online fault tolerance method

Also Published As

Publication number Publication date
CN102521062B (en) 2015-02-11

Similar Documents

Publication Publication Date Title
CN102521062A (en) Software fault-tolerant method capable of comprehensively on-line self-detection single event upset
CN102567134B (en) Error check and correction system and error check and correction method for memory module
US20180039528A1 (en) Techniques for Handling Errors in Persistent Memory
CN101976212B (en) Small amount code reloading-based DSP anti-single particle error correction method
EP3566139A1 (en) Error-correcting code memory
KR101557572B1 (en) Memory circuits, method for accessing a memory and method for repairing a memory
TW201929441A (en) System and method for online functional testing for error-correcting code function
CN106873990A (en) Multi partition bootstrap technique under embedded system RAM defective patterns
CN109491821A (en) Primary particle inversion resistant hardened system and method
CN104536727B (en) The in-orbit maintaining method of star sensor software
CN112328396A (en) Dynamic self-adaptive SOPC fault-tolerant method based on task level
CN108089892A (en) A kind of method, apparatus of safety startup of system, set-top box and storage medium
KR102004928B1 (en) Data storage device and processing method for error correction code thereof
CN105279043A (en) Method and system for recovering single-chip microcomputer software error
CN110008056A (en) EMS memory management process, device, electronic equipment and computer readable storage medium
CN104461798A (en) Random number validation method for processor arithmetic logic unit instruction
CN103853661A (en) Space radiation-resisting fault detection method based on weighting Merkle tree
JP2011154459A (en) Program abnormal operation detection device for computer system
CN111651118B (en) Memory system, control method and control device
Fouad et al. Context-aware resources placement for SRAM-based FPGA to minimize checkpoint/recovery overhead
JP2021061077A (en) Memory error discrimination device and computer program for memory error discrimination
CN111352754A (en) Data storage error detection and correction method and data storage device
EP3367242A1 (en) Method of error detection in a microcontroller unit
TW201503150A (en) Fault bits scrambling memory and method thereof
Navas et al. On providing scalable self-healing adaptive fault-tolerance to RTR SoCs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant