CN112506701B - Multiprocessor chip error recovery method based on three-mode lockstep - Google Patents

Multiprocessor chip error recovery method based on three-mode lockstep Download PDF

Info

Publication number
CN112506701B
CN112506701B CN202011394672.7A CN202011394672A CN112506701B CN 112506701 B CN112506701 B CN 112506701B CN 202011394672 A CN202011394672 A CN 202011394672A CN 112506701 B CN112506701 B CN 112506701B
Authority
CN
China
Prior art keywords
processor
lockstep
slave
mode
slave processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011394672.7A
Other languages
Chinese (zh)
Other versions
CN112506701A (en
Inventor
陈道品
罗春风
武利会
何子兰
倪伟东
黄凯
张铖洪
蒋小文
张晓旭
刘智力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan Power Supply Bureau of Guangdong Power Grid Corp
Original Assignee
Foshan Power Supply Bureau of Guangdong Power Grid Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan Power Supply Bureau of Guangdong Power Grid Corp filed Critical Foshan Power Supply Bureau of Guangdong Power Grid Corp
Priority to CN202011394672.7A priority Critical patent/CN112506701B/en
Publication of CN112506701A publication Critical patent/CN112506701A/en
Application granted granted Critical
Publication of CN112506701B publication Critical patent/CN112506701B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1441Resetting or repowering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Abstract

The invention discloses a multiprocessor chip error recovery method based on a three-module lockstep, which is based on a rollback double-core lockstep structure of the error recovery of a checkpoint, can quickly check out a CPU with an error through a second processor, can also have certain detection capability on a CPU hard error through subsequent operation, greatly improves the reliability and the real-time property of the error processing of the processor, and reduces the resource consumption of the three-module lockstep.

Description

Multiprocessor chip error recovery method based on three-mode lockstep
Technical Field
The invention relates to the technical field of computers, in particular to a multiprocessor chip error recovery method based on three-mode lockstep.
Background
With the continuous shrinkage of the process nodes of the integrated circuit, the reliability of the integrated circuit becomes more and more a focus of attention, and the occurrence of soft errors is an important factor influencing the reliability of the integrated circuit. So-called soft errors, also known as single event upsets, are when a transistor is bombarded by energetic charged particles (e.g., neutrons from cosmic rays and alpha ions from packaging materials) causing the charge stored in the PN junction to change, which in turn causes a change in the logic state of the data, such as an upset between 01. Such flipping can cause the circuit to fail logically, thereby affecting the proper use of the circuit. Soft errors are more randomized and frequent than hard errors caused by permanent device damage, but are recoverable and can be eliminated by rewriting or resetting.
The processor, one of the cores of the integrated circuit, is undoubtedly the focus of ensuring reliability in the design, and the main method of eliminating processor soft errors is lockstep. The main idea of the method is to realize mutual monitoring of two processors by adding a hardware unit identical to a main processor system, and continuously check correctness of operating functions of the two processors, thereby ensuring that the processors function correctly and can detect errors, isolating the errors and recovering the errors to a certain extent. The main method of lockstep is the rollback method based on checkpoint. The method comprises the steps of detecting errors by carrying out consistency check on a processor, carrying out checkpoint within a certain time interval to save the correct state of the processor, carrying out rollback if the errors occur, reloading the saved correct state into the processor, and then executing the program which just makes errors.
Chinese patent publication No. CN104699550A, 10/06/2015, discloses an error recovery method based on lockstep architecture, comprising the following steps: 1) and (3) operating states and conversion of the Lock-Step module: after the Lock-Step module is electrified, state saving is carried out based on time flow, a hardware signal is sent after a period of time through a hardware timer, and the state of a processor is saved after software reads the state; 2) and switching the hardware state, and dividing the hardware storage and recovery into 2 states, namely an operating state and a storage state, wherein if the processor generates write operation in the operating state, if the processor has the operation of writing address data in the SM, the SM finishes reading and writing in the time slice in the operating state so as to ensure the consistency of the data. However, this method cannot determine the CPU in which the error has occurred, and has no capability of detecting the generation of the hard error.
Disclosure of Invention
The invention provides a multiprocessor chip error recovery method based on three-mode lockstep, which can identify a specific CPU with errors.
In order to solve the technical problems, the technical scheme of the invention is as follows:
a multiprocessor chip error recovery method based on three-mode lockstep comprises the following steps:
s1: comparing the output of the master processor and the output of the first slave processor by using the checking module, and when the output of the master processor is inconsistent with the output of the first slave processor, generating an interrupt instruction by using the checking module, and performing error recovery operation on the master processor and the first slave processor;
s2: the second slave processor executes checkpoint operation and saves the processor state of the second slave processor into the memory;
s3: setting the second slave processor into a lockstep mode, inputting the second slave processor obtained from the master processor, and outputting the second slave processor to the checking module;
s4: loading the main processor state stored in the memory to the main processor, the first slave processor and the second slave processor;
s5: the checking module compares the outputs of the master processor, the first slave processor and the second slave processor at the same time, if no error is found, the last error is considered to be a soft error and has been recovered, and the process goes to step S6; if the outputs of the master processor, the first slave processor and the second slave processor are inconsistent, the process goes to step S7;
s6: the checking module initiates an interrupt to the second slave processor, the second slave processor exits the lockstep mode and reloads the last saved processor state into the processor, and in addition, the master processor and the first slave processor continue to operate in a normal lockstep architecture;
s7: voting the outputs of the master processor, the first slave processor and the second slave processor, determining the inconsistent processor, setting the inconsistent processor as the processor with the error, and continuing to operate in a dual-mode lockstep mode by the other two processors with consistent outputs.
Preferably, the master processor and the first slave processor in step S1 are a lockstep pair, specifically:
the first slave processor can select a normal mode or a lockstep mode through a configuration register;
in the normal mode, the input and the output of the first slave processor are processed through the bus;
in the lockstep mode, the input of the first slave processor is obtained by copying the input of the master processor, and the output of the first slave processor is transmitted to the detection module.
Preferably, in lockstep mode, the checking module records the output of the master processor at each clock edge, compares the output with the output of the first slave processor, and clears the stored comparison data and waits for the next comparison if the outputs of the two match.
Preferably, after the configurable number of write operations, the checking module initiates an interrupt to instruct the host processor to initiate a ckeckpoint operation, so as to save the state of the host processor into the memory.
Preferably, when the register and the internal cache of the master and slave processors are saved in the memory, the program data segment changed in the two saving processes is carried to the designated position of the memory through the recording module, and the recording module records the address of the program data segment and the corresponding data.
Preferably, the recording module operates in a pingpong mode, two FIFOs are built in the recording module, and each time a checkpoint operation is performed, the FIFOs are switched and start to start DMA to move data in the FIFOs to a specified position of the memory.
Preferably, a checkpoint operation does not need to wait for a DMA transfer to complete, but the DMA transfer must complete between two checkpoints.
Preferably, when one FIFO of the recording module is full, the checkpoint operation is also initiated, and the switching FIFO stores data.
Preferably, in step S6, if the output of the master processor is inconsistent, it is necessary to select a slave processor as a new master processor and set the original master processor to be in the normal mode.
Preferably, in step S6, the processor with the error is reset and self-checked, if a hard error occurs, the processor is stopped to run, and if no hard error occurs, the processor is set to the normal mode, and the last saved state of the second slave processor is loaded to continue running.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
on the basis of the traditional double-module lockstep architecture, when an error occurs, the third CPU is added into the structure to realize the three-module lockstep, so that the CPU with the error can be quickly checked, certain detection capability can be provided for the CPU hard error through subsequent operation, the reliability and the real-time performance of the error processing of the processor are greatly improved, and the resource consumption of the three-module lockstep is reduced.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention.
FIG. 2 is a block diagram of a processor mode of a conventional dual-mode lockstep.
FIG. 3 is a diagram of a normal operation dual-mode lockstep architecture.
FIG. 4 is a schematic diagram of a three-mode lockstep architecture for error recovery.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;
it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
The embodiment provides a multiprocessor chip error recovery method based on three-mode lockstep, as shown in fig. 1 to 4, including the following steps:
s1: comparing the output of the master processor and the output of the first slave processor by using the checking module, and when the output of the master processor is inconsistent with the output of the first slave processor, generating an interrupt instruction by using the checking module, and performing error recovery operation on the master processor and the first slave processor;
s2: the second slave processor executes checkpoint operation and saves the processor state of the second slave processor into the memory;
s3: setting the second slave processor into a lockstep mode, inputting the second slave processor obtained from the master processor, and outputting the second slave processor to the checking module;
s4: loading the main processor state stored in the memory to the main processor, the first slave processor and the second slave processor;
s5: the checking module compares the outputs of the master processor, the first slave processor and the second slave processor at the same time, if no error is found, the last error is considered to be a soft error and has been recovered, and the process goes to step S6; if the outputs of the master processor, the first slave processor and the second slave processor are inconsistent, the process goes to step S7;
s6: the checking module initiates an interrupt to the second slave processor, the second slave processor exits the lockstep mode and reloads the last saved processor state into the processor, and in addition, the master processor and the first slave processor continue to operate in a normal lockstep architecture;
s7: voting the outputs of the master processor, the first slave processor and the second slave processor, determining the inconsistent processor, setting the inconsistent processor as the processor with the error, and continuing to operate in a dual-mode lockstep mode by the other two processors with consistent outputs.
The master processor and the first slave processor in step S1 are a lockstep pair, which specifically includes:
the first slave processor can select a normal mode or a lockstep mode through a configuration register;
in the normal mode, the input and the output of the first slave processor are processed through the bus;
in the lockstep mode, the input of the first slave processor is obtained by copying the input of the master processor, and the output of the first slave processor is transmitted to the detection module.
In lockstep mode, the checking module records the output of the master processor at each clock edge, compares the output with the output of the first slave processor, and clears the stored comparison data to wait for the next comparison if the outputs of the master processor and the first slave processor are consistent.
After the checking module performs write operations for a configurable number of times, the checking module initiates an interrupt to instruct the host processor to initiate a ckeckpoint operation, so as to save the state of the host processor into the memory.
When the register and the internal cache of the master and slave processors are stored in the memory, the program data segment which is changed in the two storage processes is carried to the specified position of the memory through the recording module, and the recording module records the address of the program data segment and the corresponding data.
The working mode of the recording module is a pingpong mode, two FIFOs are arranged in the recording module, the FIFOs are switched each time a checkpoint operation is carried out, and the DMA is started to carry data in the FIFOs to a specified position of a memory.
The checkpoint operation does not need to wait for the completion of the DMA transfer, but the DMA transfer must be completed between two checkpoints.
When one FIFO of the recording module is full, the checkpoint operation is also initiated, and the switching FIFO stores data.
A recording module is added to record the change condition of the program data segment, so that the DMA is convenient to carry;
the recording module operates in a pingpong mode to reduce the influence on the operating efficiency of the system;
in step S6, if the output of the master processor is inconsistent, it is necessary to select a slave processor as a new master processor and set the original master processor to the normal mode. The processor can be switched between a normal mode and a lockstep mode, and the flexibility of system operation is enhanced.
In step S6, the processor in which the error occurred is reset and self-checked, and if a hard error occurred, the operation of the processor is stopped, and if no hard error occurred, the processor is set to the normal mode, and the state of the second slave processor saved last time is loaded to continue the operation.
Converting the double-mode lockstep structure into a three-mode lockstep structure when the error is recovered, loading the correct state of the processor stored last time into the three processors for rollback operation, and determining the processor with the problem through voting;
the self-checking of the processor with the problem in the error recovery process can determine whether the processor has a hard error;
in the specific implementation process, the basic structure of the method is a dual-core lockstep structure for rollback error recovery based on checkpoint. Under normal operating conditions, the two processors form a lockstep pair, wherein the processors can select a normal mode or a lockstep mode through a configuration register. In the normal mode, the input and output of the processor are processed through the bus. In lockstep mode, the input and output of the processor are not associated with the bus, the input of the processor is obtained by copying the input of the other processor, and the output of the processor is transmitted to the detection module. A pair of lockstep processors is formed by setting two master and slave processors to a normal mode and a lockstep mode, respectively, and copying slave processor input in the lockstep mode from the master processor.
The checking module records the output of the master processor at each clock edge and compares the output with the output of the slave processor, and if the outputs of the master processor and the slave processor are consistent, the stored comparison data are cleared to wait for the next comparison. After the configurable number of write operations, an IP initiation interrupt is checked to instruct the processor to initiate a ckeckpoint operation to save the state of the CPU to the memory. The checkpoint operation ensures that the processor can recover to the previous correct node after the soft error occurs, so as to re-run the program, which is equivalent to re-run the execution interval of the program code in which the soft error occurs, so as to eliminate the soft error. After the processor receives the interrupt, for the cache in the write-through mode, since the cache itself can write the data of the write operation into the external space, the data does not need to be stored again, and for the cache in the write-back mode, the cache itself needs to be emptied, and the data in the cache is written into the external space. Then, the registers (general purpose register and status register), C bit and PC are stored in the corresponding addresses of the memory (DDR with ECC protection).
Since the code section of the program is generally not rewritten and only the data section is frequently rewritten during the operation of the processor, the code section needs to be additionally saved during the process of saving the processor state. However, if the CPU is used to store the segment of data in the course of checkpoint, the occupied time is too long, which affects the efficiency of the system, so a recording module for recording the rewriting condition of the program data segment is added in the architecture. And when the write operation address corresponds to the program data segment, the recording module records the address and the corresponding data. The working mode of the recording module is a pingpong mode, two FIFOs are arranged in the recording module, the FIFOs are switched each time the checkpoint operation is carried out, and the DMA is started to carry the data in the FIFOs to the appointed position of the memory. Considering that the transfer time may be too long, the checkpoint operation does not have to wait for the completion of the DMA transfer, but the DMA transfer must be completed between two checkpoints, so that the checkpoint operation can be performed after waiting for the completion of the DMA transfer when the next checkpoint interrupt is initiated. In addition, when one FIFO of the recording module is full, the checkpoint operation is also initiated so as to switch the FIFO to save data.
When the checking module finds that the outputs of the two processors are inconsistent, the processor is considered to generate an error, the checking module generates an interrupt at the moment, and the system starts an error recovery process. And (3) executing checkpoint operation on the third processor except the lockstep processor pair, and saving the self processor state into a memory, including a register, a PC (personal computer), a cache and the like of the processor. And then setting a third processor into a lockstep mode, wherein the input is obtained from the main processor, the output does not pass through the bus any more, but is input into the checking module, the checking module needs to simultaneously compare the outputs of the three processors at the moment, if the checking IP does not find an error before the next checkpoint, the last error is considered to be a soft error and is recovered, the checking IP initiates an interrupt to the third processor at the moment, so that the third processor exits the lockstep mode, the last saved processor state is reloaded into the processors, and the other two processors continue to operate in a normal lockstep architecture. And if the output of the three processors is inconsistent before the next checkpoint, voting the three processors, determining the inconsistent processor, and setting the inconsistent processor as the processor with the error. In addition, two other processors with consistent output can continue to operate in a dual-mode lockstep mode, and if the output of the main processor is inconsistent, one processor needs to be selected as a new main processor and set to be in a normal mode. And resetting the processor with the error, performing self-checking to check whether a hard error occurs, stopping the operation of the processor if the hard error occurs, setting the processor to be in a normal mode if the hard error does not occur, and loading the state of the third processor saved last time into the normal mode to continue the operation.
The same or similar reference numerals correspond to the same or similar parts;
the terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A multiprocessor chip error recovery method based on three-mode lockstep is characterized by comprising the following steps:
s1: comparing the output of the master processor and the output of the first slave processor by using the checking module, and when the output of the master processor is inconsistent with the output of the first slave processor, generating an interrupt instruction by using the checking module, and performing error recovery operation on the master processor and the first slave processor;
s2: the second slave processor executes checkpoint operation and saves the processor state of the second slave processor into the memory;
s3: setting the second slave processor into a lockstep mode, inputting the second slave processor obtained from the master processor, and outputting the second slave processor to the checking module;
s4: loading the state of the second slave processor stored in the memory to the master processor, the first slave processor and the second slave processor;
s5: the checking module compares the outputs of the master processor, the first slave processor and the second slave processor at the same time, if no error is found, the last error is considered to be a soft error and has been recovered, and the process goes to step S6; if the outputs of the master processor, the first slave processor and the second slave processor are inconsistent, the process goes to step S7;
s6: the checking module initiates an interrupt to the second slave processor, the second slave processor exits the lockstep mode and reloads the last saved processor state into the processor, and in addition, the master processor and the first slave processor continue to operate in a normal lockstep architecture;
s7: voting the outputs of the master processor, the first slave processor and the second slave processor, determining the inconsistent processor, setting the inconsistent processor as the processor with the error, and continuing to operate in a dual-mode lockstep mode by the other two processors with consistent outputs.
2. The multi-processor chip error recovery method based on the triple-modulus lockstep as claimed in claim 1, wherein the master processor and the first slave processor in step S1 are a lockstep pair, specifically:
the first slave processor selects a normal mode or a lockstep mode through a configuration register;
in the normal mode, the input and the output of the first slave processor are processed through the bus;
in the lockstep mode, the input of the first slave processor is obtained by copying the input of the master processor, and the output of the first slave processor is transmitted to the detection module.
3. The multi-processor chip error recovery method based on the three-module lockstep of claim 2, wherein in the lockstep mode, the checking module records the output of the master processor at each clock edge and compares the output with the output of the first slave processor, and if the two outputs are consistent, the saved comparison data is cleared and the next comparison is waited.
4. The multi-processor chip error recovery method based on three-module lockstep of claim 3, wherein after the checking module performs the write operation for the configurable number of times, the checking module initiates an interrupt to instruct the main processor to initiate a lockpoint operation for saving the state of the main processor to the memory.
5. The multi-processor chip error recovery method based on the triple-module lockstep of claim 4, wherein when the register and the internal cache of the master and slave processors are saved in the memory, the changed program data segment in the two saving processes is carried to the designated position of the memory through the recording module, and the recording module records the address of the program data segment and the corresponding data.
6. The multiprocessor chip error recovery method based on the triple-module lockstep of claim 5, wherein the recording module works in a pingpong mode, two FIFOs are built in the recording module, the FIFOs are switched each time the checkpoint operates, and start DMA to transport data in the FIFOs to a specified position of a memory.
7. The multi-processor chip error recovery method based on triple-modulus lockstep as claimed in claim 6, wherein the checkpoint operation does not need to wait for the completion of the DMA transfer, but the DMA transfer must be completed between two checkpoints.
8. The multi-processor chip error recovery method based on triple-module lockstep of claim 7, wherein when one FIFO of the recording module is full, a checkpoint operation is also initiated, and the switching FIFO saves data.
9. The multi-processor chip error recovery method based on triple-modulus lockstep of claim 8, wherein in step S7, if the output of the master processor is inconsistent, it is necessary to select a slave processor as a new master processor and set the original master processor to a normal mode.
10. The multi-processor chip error recovery method based on three-module lockstep of claim 9, wherein in step S7, the processor with error is reset and self-checked, if a hard error occurs, the processor is stopped to run, if no hard error occurs, the processor is set to normal mode, and the last saved state of the second processor is loaded to continue running.
CN202011394672.7A 2020-12-02 2020-12-02 Multiprocessor chip error recovery method based on three-mode lockstep Active CN112506701B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011394672.7A CN112506701B (en) 2020-12-02 2020-12-02 Multiprocessor chip error recovery method based on three-mode lockstep

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011394672.7A CN112506701B (en) 2020-12-02 2020-12-02 Multiprocessor chip error recovery method based on three-mode lockstep

Publications (2)

Publication Number Publication Date
CN112506701A CN112506701A (en) 2021-03-16
CN112506701B true CN112506701B (en) 2022-01-21

Family

ID=74969429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011394672.7A Active CN112506701B (en) 2020-12-02 2020-12-02 Multiprocessor chip error recovery method based on three-mode lockstep

Country Status (1)

Country Link
CN (1) CN112506701B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104718532A (en) * 2012-10-16 2015-06-17 大陆-特韦斯贸易合伙股份公司及两合公司 Interface for interchanging data between redundant programs for controlling a motor vehicle
CN105630732A (en) * 2015-12-17 2016-06-01 西北工业大学 Hot switching method for dual-mode redundant microprocessor
CN107247644A (en) * 2017-07-03 2017-10-13 上海航天控制技术研究所 A kind of reconstruct down method of triple redundance computer system
CN107533493A (en) * 2015-04-17 2018-01-02 微软技术许可有限责任公司 Recover service to accelerate
CN108132857A (en) * 2017-12-15 2018-06-08 天津津航计算技术研究所 A kind of FPGA off-positions Exact recovery method
CN108446189A (en) * 2018-06-12 2018-08-24 中国科学院上海技术物理研究所 A kind of fault-tolerant activation system of spaceborne embedded software and method
CN110121698A (en) * 2016-12-31 2019-08-13 英特尔公司 System, method and apparatus for Heterogeneous Computing
CN110147343A (en) * 2019-05-09 2019-08-20 中国航空工业集团公司西安航空计算技术研究所 A kind of Lockstep processor architecture compared entirely
CN110192186A (en) * 2017-01-24 2019-08-30 Arm有限公司 Use the error detection of vector processing circuit
CN111104243A (en) * 2019-12-26 2020-05-05 江南大学 Low-delay dual-mode lockstep soft error-tolerant processor system
CN111176908A (en) * 2019-12-11 2020-05-19 北京遥测技术研究所 Program on-orbit loading and refreshing method based on triple modular redundancy
CN111538369A (en) * 2020-04-17 2020-08-14 北京中科宇航技术有限公司 Triple-modular redundancy computer clock synchronization method and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7467326B2 (en) * 2003-02-28 2008-12-16 Maxwell Technologies, Inc. Self-correcting computer
CN104699550B (en) * 2014-12-05 2017-09-12 中国航空工业集团公司第六三一研究所 A kind of error recovery method based on lockstep frameworks
US20180011768A1 (en) * 2016-07-05 2018-01-11 International Business Machines Corporation Control state preservation during transactional execution

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104718532A (en) * 2012-10-16 2015-06-17 大陆-特韦斯贸易合伙股份公司及两合公司 Interface for interchanging data between redundant programs for controlling a motor vehicle
CN107533493A (en) * 2015-04-17 2018-01-02 微软技术许可有限责任公司 Recover service to accelerate
CN105630732A (en) * 2015-12-17 2016-06-01 西北工业大学 Hot switching method for dual-mode redundant microprocessor
CN110121698A (en) * 2016-12-31 2019-08-13 英特尔公司 System, method and apparatus for Heterogeneous Computing
CN110192186A (en) * 2017-01-24 2019-08-30 Arm有限公司 Use the error detection of vector processing circuit
CN107247644A (en) * 2017-07-03 2017-10-13 上海航天控制技术研究所 A kind of reconstruct down method of triple redundance computer system
CN108132857A (en) * 2017-12-15 2018-06-08 天津津航计算技术研究所 A kind of FPGA off-positions Exact recovery method
CN108446189A (en) * 2018-06-12 2018-08-24 中国科学院上海技术物理研究所 A kind of fault-tolerant activation system of spaceborne embedded software and method
CN110147343A (en) * 2019-05-09 2019-08-20 中国航空工业集团公司西安航空计算技术研究所 A kind of Lockstep processor architecture compared entirely
CN111176908A (en) * 2019-12-11 2020-05-19 北京遥测技术研究所 Program on-orbit loading and refreshing method based on triple modular redundancy
CN111104243A (en) * 2019-12-26 2020-05-05 江南大学 Low-delay dual-mode lockstep soft error-tolerant processor system
CN111538369A (en) * 2020-04-17 2020-08-14 北京中科宇航技术有限公司 Triple-modular redundancy computer clock synchronization method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Carles Hernandez."Timely Error Detection for Effective Recovery in Light-Lockstep Automotive Systems".《IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems》.2015,第34卷(第11期), *
EEWORLD."ISO 26262安全机构插入和验证四步骤讲解".《news.eeworld.com.cn/qcdz/ic495077.html》.2020, *
王锐."一种基于FPGA的高可用容错计算机的研究".《信息通信》.2016,(第168期), *

Also Published As

Publication number Publication date
CN112506701A (en) 2021-03-16

Similar Documents

Publication Publication Date Title
EP0372579B1 (en) High-performance computer system with fault-tolerant capability; method for operating such a system
US5317726A (en) Multiple-processor computer system with asynchronous execution of identical code streams
US5384906A (en) Method and apparatus for synchronizing a plurality of processors
US5890003A (en) Interrupts between asynchronously operating CPUs in fault tolerant computer system
CN100578462C (en) Device, method and system for reducing the error rate in clock synchronization dual-modular redundancy system
Tamir et al. High-performance fault-tolerant VLSI systems using micro rollback
CN109891393B (en) Main processor error detection using checker processor
US7827443B2 (en) Processor instruction retry recovery
US20090044044A1 (en) Device and method for correcting errors in a system having at least two execution units having registers
US20060190702A1 (en) Device and method for correcting errors in a processor having two execution units
US20020116662A1 (en) Method and apparatus for computer system reliability
JPH0683663A (en) Multiprocessor computer system
Tamir et al. The implementation and application of micro rollback in fault-tolerant VLSI systems.
US6810489B1 (en) Checkpoint computer system utilizing a FIFO buffer to re-synchronize and recover the system on the detection of an error
CN112506701B (en) Multiprocessor chip error recovery method based on three-mode lockstep
Zhang et al. Intermittent computing with efficient state backup by asynchronous dma
US20220318053A1 (en) Method of supporting persistence and computing device
Tamir et al. The UCLA mirror processor: A building block for self-checking self-repairing computing nodes
JP2004252525A (en) Emulator and program
US9542266B2 (en) Semiconductor integrated circuit and method of processing in semiconductor integrated circuit
El Salloum et al. Recovery mechanisms for dual core architectures
USRE27485E (en) Ls ec sdr
Tamir Self-checking self-repairing computer nodes using the Mirror Processor
JPH01258154A (en) Control system for micro instruction execution
JPH06266613A (en) Unused memory space access error detection circuit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant