CN111581003B - Full-hardware dual-core lock-step processor fault-tolerant system - Google Patents

Full-hardware dual-core lock-step processor fault-tolerant system Download PDF

Info

Publication number
CN111581003B
CN111581003B CN202010356342.2A CN202010356342A CN111581003B CN 111581003 B CN111581003 B CN 111581003B CN 202010356342 A CN202010356342 A CN 202010356342A CN 111581003 B CN111581003 B CN 111581003B
Authority
CN
China
Prior art keywords
processor
fault
write operation
buffer area
slave
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010356342.2A
Other languages
Chinese (zh)
Other versions
CN111581003A (en
Inventor
黄凯
陈群
蒋小文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Research Institute of Southern Power Grid Co Ltd
Original Assignee
Zhejiang University ZJU
CSG Electric Power Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU, CSG Electric Power Research Institute filed Critical Zhejiang University ZJU
Priority to CN202010356342.2A priority Critical patent/CN111581003B/en
Publication of CN111581003A publication Critical patent/CN111581003A/en
Application granted granted Critical
Publication of CN111581003B publication Critical patent/CN111581003B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0796Safety measures, i.e. ensuring safe condition in the event of error, e.g. for controlling element

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention belongs to the field of microprocessors, and provides a full-hardware dual-core lockstep processor fault-tolerant system, which comprises a main processor and a slave processor, and is characterized by further comprising a hardware fault-tolerant module, wherein the hardware fault-tolerant module comprises: the system comprises a fault detection module, a fault recovery module and a fault isolation module; the master processor and the slave processor have the same input signals, the master processor outputs signals to the outside, and the slave processor does not output signals to the outside. The fault-tolerant system of the all-hardware dual-core lockstep processor can realize rapid fault detection, accelerate the fault recovery speed, does not influence the system performance during fault isolation, and reduces the area cost caused by fault tolerance while ensuring the excellent reliability and real-time performance of the processor fault tolerance.

Description

Full-hardware dual-core lock-step processor fault-tolerant system
Technical Field
The invention belongs to the field of microprocessors, and particularly relates to a fault-tolerant system of a full-hardware dual-core lockstep processor.
Background
With the advent of the industrial 4.0 era, industrial microcontrollers are playing an increasingly important role in the development of industrial automation in China. Compared with general consumer-grade application, the industrial microcontroller has higher requirements on reliability, low cost and real-time property. Embedded processors, which are the core of industrial microcontrollers, are being challenged to become reliable due to the reduction of process nodes and the development of low power technologies. The reduction in feature size and voltage threshold results in semiconductor integrated circuits becoming increasingly sensitive to factors such as circuit crosstalk, atmospheric radiation, high energy particles generated by decay of packaging materials, extreme temperatures, electromagnetic interference, etc., and thus the probability of failure due to interference is increasing. The faults caused by the interference are mostly transient faults, random and temporary state changes or transients in the semiconductor caused by the interference of external conditions, and the functions of the affected devices can be recovered through resetting. However, during the operation of the processor, any one-bit error may result in the output of an erroneous result or the failure of the whole system, which may cause huge property loss or even casualties for industrial applications.
Two common fault tolerance methods currently used by the industry for commercial processors are triple modular redundancy and checkpoint-based dual core lockstep fault tolerance. The former adopts three processors to compare in real time on hardware, and then output after majority voting, so that the reliability and real-time performance are higher, but the required area overhead is too large. The latter adopts two processors to compare in real time on hardware, detects the fault, but the recovery of the fault is completed through software, needs to intermittently save the correct state node of the processor, and when the fault occurs, restores the processor to the previous node. This approach is less reliable in performing failover because only the processor state visible to the software can be restored, and when a suspend-type error is encountered, recovery may fail because the software program fails to respond. In addition, fault tolerance of processor embedded Cache (Cache) is not generally considered, so although the dual-core lock step fault tolerance based on the check point saves area by adopting a soft and hard combination mode, the fault tolerance has defects in reliability, performance and instantaneity.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a fault-tolerant system of a full-hardware dual-core lockstep processor, and the specific technical scheme is as follows.
A fault-tolerant system of a full-hardware dual-core lockstep processor comprises a master processor, a slave processor and a hardware fault-tolerant module, wherein the hardware fault-tolerant module comprises: the system comprises a fault detection module, a fault recovery module and a fault isolation module; the master processor and the slave processor have the same input signals, the master processor outputs signals to the outside, and the slave processor does not output signals to the outside.
Furthermore, the fault detection module pulls out internal related signals of the master processor and the slave processor through hard wires and performs comparison detection, wherein the related signals comprise signals of internal control state registers in the master processor and the slave processor, signals of a bus interface and signals of a Cache interface; wherein the internal control state register comprises: general purpose registers, program counters, status registers, and associated control status registers of the tightly coupled IP inside the processor.
Further, the failure recovery module performs failure recovery including the following two steps:
a. when no fault occurs, the state information of the master processor and the slave processor on the correct node is stored in a rollback buffer area; the correct node is an execution point when the main processor and the slave processor run normally before a fault occurs and the states of the main processor and the slave processor are not inconsistent due to transient errors; the state information is control state register values in the master processor and the slave processor;
b. after the fault occurs, the master processor and the slave processor are reset by hardware, after the reset is completed, the master processor and the slave processor fetch the instruction from the 0 address again, simultaneously the content of the 0 address on the instruction bus is changed, and the state information stored on the correct node in the rollback buffer area is placed into the master processor and the slave processor, so that the master processor and the slave processor execute the instruction again from the correct node stored last time.
Further, the state information is specifically set to the master processor and the slave processor as follows: finding out relevant control state registers in the master processor and the slave processor, adding a data source of state information to be recovered in the condition assignment of the control state registers, and successfully recovering the value in the control state registers after detecting a pulse signal of a set signal; the setting signal is a pulse signal after the hardware reset of the master processor and the slave processor is completed.
Further, the fault isolation module is used for preventing the error writing operation of the master processor and the slave processor and performing rollback operation on the external state.
Further, the external state includes an external memory state, a peripheral interface or system IP state, a state of a master processor and a cache inside a slave processor.
Further, the memory is mounted on a data bus of the master processor and the slave processor, and fault isolation of the memory is completed by establishing a write operation buffer area, wherein the write operation buffer area comprises a write address buffer area, a write data buffer area, a PC buffer area and a fault PC buffer area, and each write operation buffer area consists of 3 registers; the write address buffer area stores write addresses corresponding to each write operation, the write data buffer area stores write data corresponding to each write operation, the PC buffer area stores a PC of a current retirement instruction corresponding to each write operation, and the fault PC buffer area stores a PC of an instruction executed in the period from the fault to the reset of the master processor and the slave processor.
Further, each write operation of the master processor and the slave processor to the memory is temporarily stored in the write operation buffer area; when the main processor and the slave processor initiate the write operation again after the three write operations are fully stored, the write operation with the write address not being 0 stored for the first time in the write operation buffer area is sent out, and the like; when the main processor and the slave processor need to read data from the memory, matching the read address with the address in the write operation buffer area, and returning the data stored in the write operation buffer area to the main processor and the slave processor if the addresses are matched and are not 0; when a fault occurs and the state rollback is needed, the write operation buffer area invalidates the write operation of which the corresponding PC has the same value as the PC in the current fault PC buffer area, namely setting the corresponding write operation address in the write operation buffer area to be 0; when the host except the main processor and the slave processor needs to access the memory, the software ensures that the main processor and the slave processor carry out three times of writing operation to the useless address of the memory, so that the writing operation reserved in the current writing operation buffer area is updated to the memory.
Furthermore, the peripheral interface and the system IP are mounted on a system bus of the master processor and the slave processor, the writing operation of the master processor and the write operation of the slave processor are delayed for three cycles, and the reading operation time sequence is unchanged.
Further, the internal caches of the master processor and the slave processors are as follows: in write-through mode of operation, when a failure occurs, then the following 8 cache lines are invalidated during the failure recovery:
when no data reading error exists in the cache, caching the last 4 write operation addresses by the main processor, and caching the last 4 write operation addresses by the auxiliary processor to be used as cache line addresses needing invalidation;
when read data errors occur in the cache, 1 address of the read data errors is used, the main processor caches the last 3 write operation addresses, and the secondary processor caches the last 4 write operation addresses as cache line addresses needing invalidation.
Has the advantages that:
the fault-tolerant system of the all-hardware dual-core lockstep processor can realize rapid fault detection, accelerate the fault recovery speed, does not influence the system performance during fault isolation, and reduces the area cost caused by fault tolerance while ensuring the excellent reliability and real-time performance of the processor fault tolerance.
Drawings
FIG. 1 is a block diagram of a fault tolerant architecture for a dual core processor of the present invention;
FIG. 2 is a block diagram of a fault detection module of the present invention;
FIG. 3 is a schematic diagram of the preservation of correct node state of the present invention;
FIG. 4 is a block diagram of a status information reset circuit of the present invention;
FIG. 5 is a write operation buffer structure of the present invention;
FIG. 6 is a timing diagram illustrating an invalidation operation of a cache during a reset of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and embodiments.
The fault-tolerant system of the all-hardware dual-core lockstep processor provided by the invention realizes real-time detection and recovery of faults and on-chip cache fault tolerance in a write-through mode by adding a hardware fault-tolerant module to processor dual-mode redundancy.
Fig. 1 shows a fault-tolerant system of a full-hardware dual-core lockstep processor, which includes a master processor, a slave processor and a hardware fault-tolerant module; the hardware fault tolerance module comprises: the system comprises a fault detection module, a fault recovery module and a fault isolation module; the master processor and the slave processor have the same input signals, wherein the master processor outputs signals to the outside, and the slave processor cannot output signals to the outside.
As shown in fig. 2, the fault detection module mainly pulls out and compares internal related signals of the master processor and the slave processor through hard wires, where the related signals include signals of internal control state registers in the master processor and the slave processor, signals of a bus interface, and signals of a Cache interface; wherein the internal control state register comprises: general purpose registers, control registers within the processor such as program counters, processor state registers, etc., and some tightly coupled IP such as timers and interrupt controllers' associated control state registers.
Because the generation of the fault is random, in order to prevent the occurrence of the metastable state, the uncertain state is propagated, and finally, the error alarm signal generated by inconsistent comparison needs to be subjected to two-stage synchronization and then is used as a fault isolation and recovery signal.
As shown in fig. 3, the failure recovery module mainly includes the following two steps:
a. when no fault occurs, state information of a master processor and a slave processor on a correct node needs to be stored in a rollback buffer area; the correct node is an execution point when the main processor and the slave processor run normally before a fault occurs and the states of the main processor and the slave processor are not inconsistent due to transient errors; the state information is control state register values in the master processor and the slave processor;
b. after the fault occurs, the master processor and the slave processor are reset by hardware, after the reset is completed, the master processor and the slave processor fetch the instruction from the 0 address again, simultaneously the content of the 0 address on the instruction bus is changed, and the state information stored on the correct node in the rollback buffer area is placed into the master processor and the slave processor, so that the master processor and the slave processor execute the instruction again from the correct node stored last time.
As shown in fig. 4, the state information embedding into the master processor and the slave processor specifically includes: finding out relevant control state registers in the master processor and the slave processor, adding a data source of state information to be recovered in the condition assignment of the control state registers, and successfully recovering the value in the control state registers after detecting a pulse signal of a set signal; the setting signal is a pulse signal after the hardware reset of the master processor and the slave processor is completed.
The fault recovery module can roll back the states of the master processor and the slave processor, but cannot roll back the external states of the master processor and the slave processor, wherein the external states are an external memory state, a peripheral interface or system IP state and a state of a cache inside the master processor and the slave processor, so that the fault isolation module is required to prevent the false write operation of the master processor and the slave processor and the roll back operation of the external states.
The memory is mounted on the data buses of the master processor and the slave processor, in order to carry out fault isolation, the write operation of the data buses needs to be modified, whether the data in the write operation is actually written or not is not important for the memory, and it is important that when the master processor and the slave processor access the address again, the previously written value can be obtained.
As shown in fig. 5, the fault isolation of the memory may be accomplished by establishing a write operation buffer, where the write operation buffer mainly includes a write address buffer, a write data buffer, a PC buffer, and a fault PC buffer, where the write address buffer stores a write address corresponding to each write operation, the write data buffer stores write data corresponding to each write operation, the PC buffer stores a PC of a current retirement instruction corresponding to each write operation, and the fault PC buffer stores a PC of an instruction executed during a period from when a fault occurs to when the main processor and the slave processor are reset.
After the main processor and the slave processor are in failure, the write operation can be executed at most twice, namely, at most no more than three wrong write operations need to be isolated, so each write operation buffer area consists of 3 registers, each write operation of the main processor and the slave processor to the memory is temporarily stored in the write operation buffer area, and when the main processor and the slave processor initiate the write operation again after the three write operations are fully stored, the write operation with the write address which is stored for the first time in the write operation buffer area and is not 0 is sent out, and the like.
When the main processor and the slave processor need to read data from the memory, the read address at the moment is matched with the address in the write operation buffer area, and if the addresses are matched and are not 0, the data stored in the write operation buffer area are returned to the main processor and the slave processor.
When a fault occurs and the state rollback is needed, the write operation buffer area invalidates the write operation of the corresponding PC with the same value as the PC in the current fault PC buffer area, the specific method is that the corresponding write operation address in the write operation buffer area is set to be 0, and data corresponding to the invalidated write operation cannot be written into and stored or read by the main processor and the slave processor. When a host except the main processor and the slave processor needs to access the memory, such as DMA, three times of write operation to the useless address of the memory by the main processor and the slave processor are guaranteed on software, so that the write operation reserved in the current write operation buffer area is updated to the memory, and the latest data is guaranteed to be obtained when the DMA accesses the memory.
The peripheral interface and the system IP are mounted on a system bus of the master processor and the slave processor, and in order to perform fault isolation, the write operation of the system bus needs to be modified. The access of the master processor and the slave processor to the peripheral interface and the system IP is mainly used for controlling the working mode and state of the IP, so whether data is really written into the IP is very important. In general, the master processor and the slave processor do not have frequent direct access to these IPs, so that the write operation to the processor is delayed by three cycles directly on the system bus, and the read operation timing is not changed. For the AHB bus, the stall operation is implemented by pulling Hready low.
The internal caches of the master processor and the slave processors are as follows: in the write-through working mode, when a fault occurs, the fault isolation of the cache is completed by invalidating the error data or the advanced state data in the cache. To ensure that the cache is properly restored and isolated while reducing the failover time, the following 8 cache lines are selected to be invalidated during failover:
when no data reading error exists in the cache, caching the last 4 write operation addresses by the main processor, and caching the last 4 write operation addresses by the auxiliary processor to be used as cache line addresses needing invalidation;
when read data errors occur in the cache, 1 address of the read data errors is used, the main processor caches the last 3 write operation addresses, and the secondary processor caches the last 4 write operation addresses as cache line addresses needing invalidation.
Specifically, as shown in fig. 6, when a fault occurs, during a jump from a fault to a reset of the master processor and the slave processor, pulling down a CEN on an SRAM interface of the tag memory area corresponding to the cache, writing 0 to addresses of the 8 caches in sequence, and invalidating a corresponding cache line, thereby completing fault isolation of the cache.

Claims (7)

1. A fault-tolerant system of a full-hardware dual-core lockstep processor comprises a main processor, a secondary processor and a hardware fault-tolerant module, wherein the hardware fault-tolerant module comprises: the system comprises a fault detection module, a fault recovery module and a fault isolation module; the master processor and the slave processor have the same input signals, the master processor outputs signals to the outside, and the slave processor does not output signals to the outside; the fault isolation module is used for preventing the error write operation of the master processor and the slave processor and performing rollback operation on an external state, wherein the external state comprises an external memory state, a peripheral interface or system IP state and a cached state inside the master processor and the slave processor;
the method is characterized in that the memory is mounted on a data bus of a master processor and a slave processor, and the fault isolation of the memory is completed by establishing a write operation buffer area, wherein the write operation buffer area comprises a write address buffer area, a write data buffer area, a PC buffer area and a fault PC buffer area, and each write operation buffer area consists of 3 registers; the write address buffer area stores write addresses corresponding to each write operation, the write data buffer area stores write data corresponding to each write operation, the PC buffer area stores a PC of a current retirement instruction corresponding to each write operation, and the fault PC buffer area stores a PC of an instruction executed in the period from the fault to the reset of the master processor and the slave processor.
2. The full-hardware dual-core lockstep processor fault-tolerant system according to claim 1, wherein the fault detection module pulls out and detects internal related signals of the master processor and the slave processor through hard wiring, wherein the related signals comprise signals of internal control state registers in the master processor and the slave processor, signals of a bus interface and signals of a Cache interface; wherein the internal control state register comprises: general purpose registers, program counters, status registers, and associated control status registers of the tightly coupled IP inside the processor.
3. The fault tolerant system of full hardware dual core lockstep processors according to claim 1, wherein said fault recovery module performing fault recovery comprises the following two steps:
a. when no fault occurs, the state information of the master processor and the slave processor on the correct node is stored in a rollback buffer area; the correct node is an execution point when the main processor and the slave processor run normally before a fault occurs and the states of the main processor and the slave processor are not inconsistent due to transient errors; the state information is control state register values in the master processor and the slave processor;
b. after the fault occurs, the master processor and the slave processor are reset by hardware, after the reset is completed, the master processor and the slave processor fetch the instruction from the 0 address again, simultaneously the content of the 0 address on the instruction bus is changed, and the state information stored on the correct node in the rollback buffer area is placed into the master processor and the slave processor, so that the master processor and the slave processor execute the instruction again from the correct node stored last time.
4. The fault tolerant system of full hardware dual core lockstep processors according to claim 3, wherein said state information embedding master processor and slave processor is specifically: finding out relevant control state registers in the master processor and the slave processor, adding a data source of state information to be recovered in the condition assignment of the control state registers, and successfully recovering the value in the control state registers after detecting a pulse signal of a set signal; the setting signal is a pulse signal after the hardware reset of the master processor and the slave processor is completed.
5. The full hardware dual core lockstep processor fault tolerant system according to claim 1, wherein each write operation to the memory by the master processor and the slave processor is temporarily stored in a write operation buffer; when the main processor and the slave processor initiate the write operation again after the three write operations are fully stored, the write operation with the write address not being 0 stored for the first time in the write operation buffer area is sent out, and the like; when the main processor and the slave processor need to read data from the memory, matching the read address with the address in the write operation buffer area, and returning the data stored in the write operation buffer area to the main processor and the slave processor if the addresses are matched and are not 0; when a fault occurs and the state rollback is needed, the write operation buffer area invalidates the write operation of which the corresponding PC has the same value as the PC in the current fault PC buffer area, namely setting the corresponding write operation address in the write operation buffer area to be 0; when the host except the main processor and the slave processor needs to access the memory, the software ensures that the main processor and the slave processor carry out three times of writing operation to the useless address of the memory, so that the writing operation reserved in the current writing operation buffer area is updated to the memory.
6. The full hardware dual-core lockstep processor fault tolerant system according to claim 1, wherein the peripheral interface and the system IP are mounted on a system bus of the master processor and the slave processor, a write operation of the master processor and the slave processor is delayed for three cycles, and a read operation timing is unchanged.
7. The full hardware dual core lockstep processor fault tolerant system according to claim 1, wherein the internal caches of the master processor and the slave processors are: in write-through mode of operation, when a failure occurs, then the following 8 cache lines are invalidated during the failure recovery:
when no data reading error exists in the cache, caching the last 4 write operation addresses by the main processor, and caching the last 4 write operation addresses by the auxiliary processor to be used as cache line addresses needing invalidation;
when read data errors occur in the cache, 1 address of the read data errors is used, the main processor caches the last 3 write operation addresses, and the secondary processor caches the last 4 write operation addresses as cache line addresses needing invalidation.
CN202010356342.2A 2020-04-29 2020-04-29 Full-hardware dual-core lock-step processor fault-tolerant system Active CN111581003B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010356342.2A CN111581003B (en) 2020-04-29 2020-04-29 Full-hardware dual-core lock-step processor fault-tolerant system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010356342.2A CN111581003B (en) 2020-04-29 2020-04-29 Full-hardware dual-core lock-step processor fault-tolerant system

Publications (2)

Publication Number Publication Date
CN111581003A CN111581003A (en) 2020-08-25
CN111581003B true CN111581003B (en) 2021-12-28

Family

ID=72126428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010356342.2A Active CN111581003B (en) 2020-04-29 2020-04-29 Full-hardware dual-core lock-step processor fault-tolerant system

Country Status (1)

Country Link
CN (1) CN111581003B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112667450B (en) * 2021-01-07 2022-05-06 浙江大学 Dynamically configurable fault-tolerant system with multi-core processor
US11645155B2 (en) 2021-02-22 2023-05-09 Nxp B.V. Safe-stating a system interconnect within a data processing system
US11782777B1 (en) 2022-06-22 2023-10-10 International Business Machines Corporation Preventing extraneous messages when exiting core recovery
CN116643935B (en) * 2023-07-21 2023-09-26 天津国芯科技有限公司 Dual-core lockstep chip capable of configuring delay time
CN116821038B (en) * 2023-08-28 2023-12-26 英特尔(中国)研究中心有限公司 Lock step control apparatus and method for processor

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544087A (en) * 2013-10-30 2014-01-29 中国航空工业集团公司第六三一研究所 Lockstep processor bus monitoring method and computer
CN105653411A (en) * 2015-12-28 2016-06-08 哈尔滨工业大学 Multi-core processor chip reconfigurable system capable of supporting local permanent fault recovery
CN110147343A (en) * 2019-05-09 2019-08-20 中国航空工业集团公司西安航空计算技术研究所 A kind of Lockstep processor architecture compared entirely

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060107116A1 (en) * 2004-10-25 2006-05-18 Michaelis Scott L System and method for reestablishing lockstep for a processor module for which loss of lockstep is detected
JP2006178616A (en) * 2004-12-21 2006-07-06 Nec Corp Fault tolerant system, controller used thereform, operation method and operation program
US7669073B2 (en) * 2005-08-19 2010-02-23 Stratus Technologies Bermuda Ltd. Systems and methods for split mode operation of fault-tolerant computer systems
CN104699550B (en) * 2014-12-05 2017-09-12 中国航空工业集团公司第六三一研究所 A kind of error recovery method based on lockstep frameworks
CN108228391B (en) * 2016-12-14 2021-08-03 中国航空工业集团公司西安航空计算技术研究所 LockStep processor and management method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544087A (en) * 2013-10-30 2014-01-29 中国航空工业集团公司第六三一研究所 Lockstep processor bus monitoring method and computer
CN105653411A (en) * 2015-12-28 2016-06-08 哈尔滨工业大学 Multi-core processor chip reconfigurable system capable of supporting local permanent fault recovery
CN110147343A (en) * 2019-05-09 2019-08-20 中国航空工业集团公司西安航空计算技术研究所 A kind of Lockstep processor architecture compared entirely

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
处理器Lockstep技术研究;陈浩;《数字技术与应用》;20120831(第8期);P56-58 *
陈浩.处理器Lockstep技术研究.《数字技术与应用》.2012,(第8期),第56-58页. *
面向商用APSoC器件的双核锁步机制;孙越;《科技创新导报》;20190911(第26期);P9+11 *

Also Published As

Publication number Publication date
CN111581003A (en) 2020-08-25

Similar Documents

Publication Publication Date Title
CN111581003B (en) Full-hardware dual-core lock-step processor fault-tolerant system
US5504859A (en) Data processor with enhanced error recovery
EP0363863B1 (en) Method and apparatus for fault recovery in a digital computing system
US5748873A (en) Fault recovering system provided in highly reliable computer system having duplicated processors
JP4294626B2 (en) Technology to convert merge buffer system kill error to process kill error
US8190951B2 (en) Handling of errors in a data processing apparatus having a cache storage and a replicated address storage
US20090044044A1 (en) Device and method for correcting errors in a system having at least two execution units having registers
US20080126718A1 (en) Method And Device For Monitoring A Memory Unit In A Mutliprocessor System
JP2006164277A (en) Device and method for removing error in processor, and processor
JP2022534418A (en) Error recovery method and device
CN114416435A (en) Microprocessor architecture and microprocessor fault detection method
US20150286544A1 (en) Fault tolerance in a multi-core circuit
CN116472512A (en) Enhanced endurance of a System On Chip (SOC)
US20060117147A1 (en) Managing multiprocessor operations
US10303566B2 (en) Apparatus and method for checking output data during redundant execution of instructions
CN110147343B (en) Full-comparison Lockstep processor architecture
Cantin et al. Dynamic verification of cache coherence protocols
US20030188219A1 (en) System and method for recovering from radiation induced memory errors
CN104699550A (en) Error recovery method based on lockstep architecture
CN111506451B (en) Software and hardware cooperative protection single particle design method based on satellite-borne operating system
JP3063334B2 (en) Highly reliable information processing equipment
CN104657229A (en) Multi-core processor rollback recovering system and method based on high-availability hardware checking point
CN107168827B (en) Dual-redundancy pipeline and fault-tolerant method based on check point technology
Li et al. Fault-tolerant Design of Power Edge Computing Processor Based on Full-hardware Dual-core Lockstep
US11934272B2 (en) Checkpoint saving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210420

Address after: 310013 No. 866 Tong Road, Xihu District, Zhejiang, Hangzhou, Yuhang

Applicant after: ZHEJIANG University

Applicant after: China Southern Power Grid Research Institute Co.,Ltd.

Address before: 310013 No. 866 Tong Road, Xihu District, Zhejiang, Hangzhou, Yuhang

Applicant before: ZHEJIANG University

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant