CN113836079A - Software and hardware cooperative processing reconfigurable circuit and self-repairing method thereof - Google Patents

Software and hardware cooperative processing reconfigurable circuit and self-repairing method thereof Download PDF

Info

Publication number
CN113836079A
CN113836079A CN202111113334.6A CN202111113334A CN113836079A CN 113836079 A CN113836079 A CN 113836079A CN 202111113334 A CN202111113334 A CN 202111113334A CN 113836079 A CN113836079 A CN 113836079A
Authority
CN
China
Prior art keywords
fault
module
reconfigurable
processor
logic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111113334.6A
Other languages
Chinese (zh)
Other versions
CN113836079B (en
Inventor
黄莉莉
张砦
王涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202111113334.6A priority Critical patent/CN113836079B/en
Publication of CN113836079A publication Critical patent/CN113836079A/en
Application granted granted Critical
Publication of CN113836079B publication Critical patent/CN113836079B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS

Abstract

The invention discloses a software and hardware cooperative processing reconfigurable circuit and a self-repairing method thereof, wherein the circuit comprises a logic end and a processor end, the logic end is connected with the processor end through an AXI bus, and a HWICAP controller is respectively connected with the AXI bus and the logic end; the logic end comprises an auxiliary selection module and three reconfigurable modules with the same function, and the reconfigurable modules transmit fault enabling signals to a fault interrupt responder of the processor end through an AXI bus; the auxiliary selection module selects the reconfigurable module which needs to be enabled currently, and simultaneously shields the reconfigurable module which does not work currently; the processor end comprises a fault interruption responder, a fault processor and a fault correction result judger. According to the invention, the fault can be quickly positioned through the functional module of the dual-mode comparison in the logic terminal reconfigurable module, the fault signal is transmitted to the processor terminal, and the processor terminal interrupts the internal circulating self-detection of the fault by responding to the fault signal, so that the self-repairing speed is improved.

Description

Software and hardware cooperative processing reconfigurable circuit and self-repairing method thereof
Technical Field
The invention belongs to the field of design of reconfigurable hardware fault tolerance methods, and particularly relates to a software and hardware cooperative processing reconfigurable circuit and a self-repairing method thereof.
Background
Commercial off-the-shelf SRAM (Static Random-Access Memory) type FPGA (Field-Programmable Gate Array) is increasingly applied to various fields such as aerospace, railway, autopilot, medical treatment and the like due to the characteristics of low cost, short development period, good performance, capability of on-track reconfiguration and the like. Different from the ground environment, the high-energy particle radiation in the aerospace environment threatens the electronic system fatally, and the SRAM type FPGA is more easily influenced by the radiation particles due to the process particularity, so that the single event effect occurs.
The faults generated by the spatial radiation effect can be divided into transient faults and permanent faults under the influence of the single event effect. Wherein the transient fault can be repaired by powering up again or refreshing the configuration information, and the memory cell structure of the device itself is not damaged. And the permanent fault is mostly expressed as permanent damage of the device and can not be repaired by re-electrifying and refreshing the configuration information.
SRAM type FPGAs can utilize limited hardware resources to implement different logic functions due to their reconfigurability. These alternative regions for implementing different functions are referred to as reconfigurable blocks. Because the bottom layer of the reconfigurable module is composed of SRAM units, the reconfigurable module is also easy to be subjected to space radiation to generate transient faults and permanent faults. In order to ensure that the reconfigurable module can continue to operate reliably after a fault occurs, fault-tolerant design needs to be carried out on the reconfigurable module.
At present, the fault tolerance method for the reconfigurable module is improved, and the fault tolerance method comprises the technologies of multimode redundancy, configuration refreshing, error correcting code detection and the like. The multimode redundancy is mainly realized by carrying out redundancy backup on the reconfigurable module, shielding an error module after a fault occurs, and continuously outputting a correct result by using the rest modules. The multi-mode redundancy is divided into cold backup multi-mode redundancy and hot backup multi-mode redundancy according to whether the redundancy module continuously operates. After being powered on, the redundancy module in the cold backup multi-mode redundancy does not run the corresponding function and is in a standby state, so that the running power consumption of the system can be effectively saved; the hot backup multi-mode redundancy continuously operates after being electrified and participates in comparison of the corresponding reconfigurable module, a correct result is output, the hot backup multi-mode redundancy can quickly output the correct result when a fault occurs, the starting process of the cold backup module is avoided, the power consumption is overlarge, and extra comparators are needed to participate in comparison of the output results of the reconfigurable module.
The configuration refreshing technology carries out real-time readback, comparison and refreshing on the configuration information of the configuration memory layer corresponding to the reconfigurable module through the processor system. The current common self-repairing method mainly adopts external global refreshing and identifies and positions key bits of global configuration information. The time required for refreshing once by the global refreshing mode is longer, the result obtained by analyzing and positioning the configuration information by the key frame positioning mode still comprises the configuration information of all utilized modules, and the quantity of the configuration information is large, so that the method is not suitable for the condition that the realization quantity of the real-time function of the SRAM type FPGA is small but the flexibility is high in the current aerospace environment.
The error detection and correction code technology can position and correct transient faults such as single event upset with limited bit number by adding extra configuration information, but the repair capability of the error detection and correction code technology to the fault bit number is limited, the error detection and correction code technology cannot process possible multi-bit single event upset in multiple redundant modules at the same time, and even cannot process permanent faults.
In conclusion, by combining the multi-mode redundancy technology and the configuration refreshing technology, not only can the fault area of the reconfigurable module be positioned, but also the transient fault and the permanent fault of the reconfigurable module can be self-repaired. However, the cycle self-test time of the configuration refreshing technology is long, and the power consumption of the multi-backup redundancy of the reconfigurable module is increased along with the increase of the redundant modules. Therefore, how to solve the problems of low repair power consumption, reducing the self-detection time of the fault, accelerating the self-repair speed of the fault and how to handle the permanent fault at the same time is urgently needed to be solved.
Disclosure of Invention
The purpose of the invention is as follows: in order to relieve the repair power consumption of a reconfigurable module circuit in an FPGA and reduce the self-detection time of a fault, thereby accelerating the self-repair speed of the fault and solving the problem of permanent fault self-repair, the invention provides a reconfigurable circuit structure based on software and hardware cooperative processing and a self-repair method thereof.
The technical scheme is as follows: the reconfigurable circuit comprises a logic end and a processor end, wherein the logic end is connected with the processor end through an AXI bus, and a HWICAP interface is respectively connected with the AXI bus and the logic end;
the logic end comprises an auxiliary selection module and three reconfigurable modules with the same function, and each reconfigurable module internally comprises three functional modules with the same structure; the reconfigurable module transmits a fault enabling signal to a fault interrupt responder at a processor end through an AXI bus;
the auxiliary selection module selects the reconfigurable module which needs to be enabled currently, and simultaneously shields the reconfigurable module which does not work currently; the auxiliary selection module is internally provided with a multiplexer, only one enabling signal is output at the same time, and the enabling signals of the other two reconfigurable modules are shielded;
the processor end comprises a fault interrupt responder, a fault processor and an error correction result judger;
the fault interrupt responder responds to a fault signal transmitted to the processor end by the logic end and interrupts the detection process of the current processor end on the fault; providing a quick response fault for the cooperative processing and entering a fault repairing state;
the fault processor is used for self-repairing the reconfigurable module to restore the reconfigurable module to a normal working state;
the error correction result judger starts to time and judge whether the reconfigurable module of the logic end has faults again after the configuration information of the reconfigurable module is refreshed and repaired through the processor end, and if a fault signal is continuously transmitted, the last repair is considered to be invalid and enters a permanent fault repair state;
and when the fault processing state is detected, the processor end reads correct configuration information from the external memory, the HWICAP controller reads the configuration information from the external memory to the ICAP interface, and the ICAP interface finishes automatic configuration refreshing of the configuration information.
The three reconfigurable modules with the same function are divided into a working module and two cold backup modules according to working states; the reconfigurable module in the working state is a working module and is used for outputting a real-time logic function result; the two cold backup modules are in a standby state after being electrified and used for replacing the logic function when the working module has permanent failure.
Each reconfigurable module comprises two hot backup functional modules and a cold backup functional module which form dual-mode redundancy; the hot backup function module of the dual-mode redundancy detects whether the function module fails in real time;
when no fault occurs, selecting the result of any hot backup functional module to output; and when a fault occurs, the cold backup functional module is started to form a triple modular redundancy structure.
The reconfigurable circuit self-repairing method comprises logic end self-repairing and processing end self-repairing, wherein the logic end and the processor end are subjected to classification cooperative processing to complete quick response to faults, the logic end detects the faults of a user logic layer through dual-mode comparison, and the processor end continuously monitors and responds to fault signals output by the logic end; if the fault signal is output, the processor end quickly enters a fault response stage; otherwise, the processor end carries out the circulating self-detection of the fault in real time;
(41) the self-repairing process of the logic terminal is as follows:
(411) after the system initialization is completed, the first reconfigurable module of the logic end enters a dual-mode comparison working and fault detection state and continuously monitors whether a fault occurs; when the dual-mode comparison result shows that a fault occurs, enabling the cold backup function module to simultaneously output a fault signal and outputting the fault signal to the processor end;
(412) after the cold backup function module is enabled, whether a fault module is positioned at the moment needs to be judged; if the fault module is located, the fault module is shielded, meanwhile, the correct result is output through the correct two functional modules, the normal working state is recovered in time, and meanwhile, the module is waited for refreshing, so that the fault module is recovered to be normal; if the fault module cannot be located, all three functional modules are immediately shielded, the situation that the operation of the system is influenced by error output is avoided, and refreshing and repairing of the reconfigurable module are waited;
(413) after the refreshing and repairing are finished, whether the fault is instantaneous fault repairing needs to be judged, and if the fault is instantaneous fault repairing, the working state of the first reconfigurable module is continuously recovered; and if the fault is a permanent fault, entering a second reconfigurable module working state.
(42) The implementation process of self-repair at the processor end is as follows:
(421) after the system is initialized, the processor continuously reads back and compares the configuration information, and detects the fault of the configuration information; in the cycle detection process, when a fault signal of a logic end is transmitted to a processor end, a fault interrupt responder is triggered, and the processor end refreshes and repairs the fault of the reconfigurable module through the configuration of the fault processor;
(422) after the repair is finished, timing is started, if the fault interrupt responder is triggered again within the specified time, the last fault is not repaired, the state of the reconfigurable module is replaced and repaired by the processor, and meanwhile, the configuration information of the reconfigurable module needing to be detected is changed by the processor; if the fault interrupt responder is not triggered again within the specified time, the last repair is a valid repair, and the processor returns to the fault detection state.
Further, the dual-mode comparison working and fault detection states are: when the functional module has no fault, selecting the result of any functional module in the two hot backups to output; when the functional module fails, a fault signal is output through dual-mode comparison, the cold backup functional module is started to form a triple-modular redundancy structure, a correct result is output through a voter, and the function of the system is prevented from being influenced by the wrong result for a long time;
the dual-mode comparison work is as follows: when the first functional module and the second functional module are used as hot backup functional modules, the first functional module and the second functional module work after the system is powered on, and output results are judged through a judger; the third functional module is used as a cold backup functional module, is not enabled after the system is powered on, and has no signal output inside; at the moment, only the first functional module and the second functional module need to be compared, and if no fault exists, the output results of the first functional module and the second functional module are consistent, and the system continues to operate normally; if any one of the first functional module and the second functional module has single event upset, the selector outputs an error signal at the moment, and the error signal enables the third functional module.
Further, when the processor finds that the first reconfigurable module has a permanent fault, the first reconfigurable module is shielded by adopting a replacement and refresh mode, so that the second reconfigurable module is enabled;
when the processor finds that the second reconfigurable module has a permanent fault, the second reconfigurable module is shielded by adopting a replacement and refresh mode, a third reconfigurable module is enabled, and the second reconfigurable module is shielded; at this time, if the third reconfigurable module has a permanent fault, the entire system fails.
Compared with the prior art, the invention has the following remarkable effects: 1. the fault can be quickly positioned through a functional module of dual-mode comparison in a logic end (hardware) reconfigurable module, a fault signal is transmitted to a processor end (software), and the processor end interrupts internal circulation self-detection of the fault by responding to the fault signal, so that the self-repairing speed is improved; 2. by adding the cold backup reconfigurable module at the logic end and pre-constraining the physical area of the reconfigurable module, the problem that the whole system fails due to the permanent fault of the reconfigurable module can be solved, so that the reconfigurable module can tolerate the permanent fault under the condition of solving the transient fault.
Drawings
FIG. 1 is a schematic diagram of a reconfigurable circuit architecture of the present invention;
FIG. 2 is a diagram of a fault self-repairing structure with software and hardware co-processing according to the present invention;
FIG. 3(a) is a diagram illustrating the operation state of the third functional block of the logic circuit of the present invention in the non-operational state,
(b) the invention is an operation diagram of fault judgment when a single-mode fault occurs in a logic end circuit,
(c) is an operational diagram of a first functional module of the logic circuit of the present invention not operating,
(d) the invention is an operational diagram of a second functional module of the logic end circuit which does not work;
FIG. 4 is a flow chart of a self-repairing method for a logic terminal of a reconfigurable circuit of a software and hardware system according to the present invention;
FIG. 5 is a flow chart of a self-repairing method of a processor side of a reconfigurable circuit of a software and hardware system according to the present invention;
FIG. 6(a) is a timing diagram of the instantaneous fault location of the reconfigurable module of the software and hardware co-processing of the present invention,
(b) the invention relates to a software and hardware coprocessing reconfigurable module instantaneous fault refreshing self-repairing time sequence diagram
(c) The invention relates to a software and hardware coprocessing reconfigurable module instantaneous fault function recovery timing diagram;
FIG. 7(a) is a timing diagram of permanent fault location of reconfigurable module in cooperation with software and hardware,
(b) the invention relates to a reconfigurable module permanent fault replacement self-repair time sequence chart for software and hardware cooperative processing
(c) The invention relates to a reconfigurable module permanent fault function recovery time sequence diagram based on software and hardware cooperative processing.
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description.
Fig. 1 is a schematic circuit structure diagram of the present invention, which includes a logic terminal and a processor terminal, where the logic terminal is connected to the processor terminal through an AXI bus, and both a fault signal and an enable signal of a reconfigurable module are transmitted through the AXI bus; the HWICAP interface is connected to the AXI bus and the logic terminal, respectively, to transfer configuration information from an external memory to the configuration memory.
The logic terminal comprises an auxiliary selection module and three reconfigurable modules with the same function. Each reconfigurable module internally comprises three functional modules with the same structure: two hot backup functional modules and a cold backup functional module which form dual-mode redundancy; and the dual-mode redundant hot backup function module detects whether the function module fails in real time through comparison of output results. When the functional module has no fault, selecting the result of any functional module in the two hot backups to output; when the functional module breaks down, the fault signal can be output through dual-mode comparison, the cold backup functional module is started to form a triple-modular redundancy structure, a correct result is output through the voter, and the function of the system is prevented from being influenced by the wrong result for a long time. And by means of logic-end dual-mode redundancy comparison, faults of a user logic layer are quickly positioned, and fault signals are transmitted to a processor end through an AXI bus, so that a foundation is provided for cooperative processing.
In each reconfigurable module, two functional modules in a working state detect whether a logic end has a fault through dual-mode comparison, and when the comparison result shows that the logic end has the fault, a fault enabling signal is transmitted to a fault interrupt responder of a processor end through an AXI bus, and the processor end interrupts the current working process and enters a fault response state; in the fault response state, the fault processor adopts instantaneous fault refreshing configuration processing and permanent fault replacing reconstruction processing to the fault according to the current fault type, and enters an error correction result judgment state after the self-repair process is finished; in the state of judging the error correction result, whether the fault is triggered again needs to be waited for in a specified time, if no fault exists, the last fault is effectively repaired; if the fault is triggered again within the specified time, the last repair is determined as an invalid repair, and the fault processing state is entered, and at the moment, the self-repairing is carried out by adopting a permanent fault replacement reconstruction method.
And the auxiliary selection module is used for selecting the Reconfigurable Module (RM) which needs to be enabled currently according to the reconfigurable module enabling signal output by the processor end and shielding the reconfigurable module which does not work currently. The reconfigurable module comprises a multiplexer, wherein the multiplexer only outputs one enable signal at the same time and shields the enable signals of the other two reconfigurable modules; when the system is just powered on, a signal enabling the first reconfigurable module is output, after the system has a permanent fault, the first reconfigurable module with the fault is shielded according to the signal transmitted from the processor end, the second reconfigurable module which is idle and effective is enabled, the second reconfigurable module continues to enter a stable operation state, a correct result is output, and the whole device fails until all the reconfigurable modules fail, so that the reconfigurable module enabling function is provided for cooperative processing.
The processor end comprises a fault interruption responder, a fault processor and a fault correction result judger.
And the fault interrupt responder is used for responding to the fault signal transmitted to the processor end by the logic end, interrupting the detection process of the current traditional processor end to the fault, providing a quick response fault for the cooperative processing and entering a fault repair state.
And the fault processor is used for self-repairing the current reconfigurable module in a proper mode so as to restore the reconfigurable module to a normal working state.
And the error correction result judger is used for starting to time and judge whether the reconfigurable module of the logic end fails again after the configuration information of the reconfigurable module is refreshed and repaired through the processor end, and if a failure signal is continuously transmitted, the last repair is considered to be invalid and the reconfigurable module enters a permanent failure repair state. During the whole cooperative processing process, the error correction result judger determines the type of the fault, namely the transient fault or the permanent fault, so that a basis can be provided for the subsequent fault classification.
And when the fault processing state is detected, the processor end reads correct configuration information from the external memory, the HWICAP controller reads the configuration information from the external memory to the ICAP interface, and the ICAP interface finishes automatic configuration refreshing of the configuration information.
As shown in fig. 2, each of the three reconfigurable modules of the logic terminal implements the same function, but only one reconfigurable module is in an operating state at the same time, and a first enable signal en _ RM1 is input; the other two modules are cold backup redundant modules (namely black boxes), and the second enable signal en _ RM2 and the third enable signal en _ RM3 are input, are not enabled in a cold backup state, and do not implement any functions.
As shown in fig. 3(a), each reconfigurable module includes three functional modules (a first functional module and a second functional module are working functional modules, and a third functional module is a cold backup functional module), the three functional modules implement the same function, and monitor a fault by means of dual-mode comparison, and the third functional module is in a cold backup state, where the cold backup state is a state where a system functional circuit is already mapped and completed in a configuration memory layer, but functions that need to be implemented for the functional modules are not enabled, and are in a cold backup state. And when the dual-mode comparison fails, the cold backup functional module is started, the functional module with the failure can be positioned through the three-mode comparison, the failure module is shielded, correct signals can be continuously output through the other two functional modules, the failure signals are output to the processor, and the processor classifies and self-repairs the failure.
The reconfigurable circuit provided by the invention is used for accelerating the self-repairing speed of the configuration memory layer in the reconfigurable module, and simultaneously can repair the instantaneous faults and the permanent faults of the user logic layer and the configuration memory layer, thereby realizing the rapid self-repairing of the double-layer multi-fault type of the reconfigurable module.
The traditional fault repairing method for the reconfigurable module configuration memory layer needs to repair faults in a global frame-by-frame read-back detection mode, the mode needs to traverse all frames, the time required for one-time loop detection is long, if an error occurs after a certain frame is just detected, the fault can be found only when the error needs to be repaired and then the frame is looped again after all the frames are traversed, and therefore the fault repairing time is influenced by the increase of the fault self-detection time. The invention discloses a reconfigurable circuit based on software and hardware cooperative processing, which realizes a rapid fault self-repairing method. The self-repairing method can not only quickly repair transient faults, but also quickly cope with permanent faults.
The circuit structure diagram of the self-repairing method of the logic terminal functional module of the invention is explained with reference to fig. 3:
fig. 3(a) the first functional module and the second functional module, as hot backup functional modules, operate after the system is powered on, and the output result is judged by the judger; the third functional module is used as a cold backup functional module and is not enabled after the system is powered on, so that the third functional module does not operate and has no signal output inside; at the moment, only the first functional module and the second functional module need to be compared, and if no fault exists, the output results of the first functional module and the second functional module are consistent, and the system continues to operate normally; if any one of the first functional module and the second functional module has a single event upset, the selector outputs an error signal error _ en at this time, and the error signal error _ en enables the third functional module, and then the single-mode fault operation state shown in fig. 3(b) is entered.
As shown in fig. 3(b), the first functional module and the second functional module are respectively compared with the third functional module (which is already activated and is in the working state at this time), and an error functional module is located; if the first functional module fails, the first functional module is disabled, a correct result is output through the two-out-of-three multiplexer, a single-mode fault operation state shown in fig. 3(c) is entered, and the processor waits for the refreshing and repairing of the whole reconfigurable module; if the second functional module fails, the second functional module is disabled, and meanwhile, a correct result is output through the two-out-of-three multiplexer, and a fault positioning state as shown in fig. 3(d) is entered, and the processor waits for the refreshing and repairing of the whole reconfigurable module; the refresh repair is completed to return to the normal operation state as shown in fig. 3 (a).
Fig. 4 is a flow chart of a logic terminal self-repairing method. After the system initialization is completed, the first reconfigurable module of the logic terminal enters a dual-mode (first hot backup and second hot backup) comparison working and fault detection state, and continuously monitors whether a fault occurs. When the dual-mode comparison result is wrong, the cold backup functional module is enabled to simultaneously output a fault signal, and the fault signal is output to the processor. After the cold backup functional module is enabled, whether the fault of the single module occurs at the moment needs to be judged, namely whether the fault module can be positioned: if the fault module is accurately positioned, the fault module is shielded, meanwhile, a correct result is output through the correct two functional modules, the normal working state is recovered in time, and meanwhile, the refreshing of the module is still waited, so that the fault module is recovered to be normal; and if the fault module cannot be positioned, shielding all the three functional modules immediately, avoiding the influence of error output on the operation of the system, and waiting for the refreshing and repairing of the reconfigurable module. After the refreshing and repairing are finished, whether the fault is instantaneous fault repairing needs to be judged, and if the fault is instantaneous fault repairing, the working state of the first reconfigurable module is continuously recovered; and if the fault is a permanent fault, entering a second reconfigurable module working state.
Fig. 5 is a flow chart of a self-repairing method at the processor end. The invention is self-repairing processed by software and hardware systems, so that the self-repairing of faults needs to be completed by matching of processor ends. After the system is initialized, the processor continuously reads back and compares the configuration information, and detects the fault of the configuration information, and the time for detecting once in the whole cycle is longer. Therefore, in the cycle detection process, when a fault signal of the logic end is transmitted to the processor end, the fault interrupt responder is triggered, and the processor end repairs the fault of the reconfigurable module through the configuration refreshing technology of the fault processor. After the repair is finished, timing is started, if the fault interrupt responder is triggered again within the specified time, the last fault is not repaired, the state of the reconfigurable module is replaced and repaired by the processor, and meanwhile, the configuration information of the reconfigurable module needing to be detected is changed by the processor; if the fault interrupt responder is not triggered again within the specified time, the last repair is a valid repair, and the processor returns to the fault detection state.
The invention can quickly locate and repair the generated instantaneous fault and can also repair the permanent fault through the cooperative processing of software and hardware. For self-repairing of permanent faults, described in conjunction with fig. 2, at the initial power-on of the system, only the first reconfigurable module is in an operating state, i.e., the first enable signal en _ RM 1; the second reconfigurable module and the third reconfigurable module are both in a cold backup state (black box state), no mapping circuit is arranged on a bottom layer hardware circuit, namely, the second enable signal en _ RM2 and the third enable signal en _ RM3 are not enabled, and when the processor finds that the first reconfigurable module has a permanent fault, the first reconfigurable module is shielded by adopting a replacement and refresh mode, so that the second reconfigurable module is enabled. In this way, when the second reconfigurable module has a permanent fault, the third reconfigurable module is enabled, the second reconfigurable module is shielded, and at this time, if the permanent fault occurs again, the whole system fails, so that the number of the reconfigurable modules depends on the service time of the whole chip and the aerospace environment where the whole chip is located.
According to the circuit structure and the self-repairing method, the transient fault generated by the functional module is tested. As shown in fig. 6(a) which is a signal observation diagram of the functional module instantaneous fault self-repair mode, fault signals injected with multi-bit inversion are detected at 4904 observation points, when at 4992 observation points, both the output signal s2p _ dout _1 in the first functional module and the output signal s2p _ dout _2 in the second functional module fail, and when at 4994 observation points, the output signal s2p _ dout _3 in the third functional module for starting cold backup is detected, but at this time, because two functional modules fail, fault localization cannot be performed by starting the cold backup module, and therefore, fault self-repair can be performed only by a reconstruction refresh mode. Therefore, at this time, a multi-module fault signal (en _ wrong _ multiple signal) is enabled, and meanwhile, the multi-bit flip fault self-repair starts to time, in fig. 6(b), a 4995 observation point processor end transmits the multi-bit flip fault signal through an AXI bus, and after the multi-bit flip fault signal is detected, a reconfigurable module enable signal (ps2pl) is set to be 0; refreshing and repairing are started at an observation point 4998, and meanwhile, the repairing completion timing time (repair _ time) is started to be timed; in fig. 6(c), when the refresh repair is finished at the 5000 observation point, the first reconfigurable module resumes normal operation, and en _ RM1 returns to 1, which indicates that the first reconfigurable module resumes normal operation, and the functional module continues dual-mode comparison at the 5316 observation point, and returns to a normal operation state, thereby completing self-repair of multi-bit transient faults of the dual-functional module of the primary functional cell.
When the first functional module detects a permanent fault, fault-tolerant repair is performed on the permanent fault by a method for shielding the first functional module, at this time, the second functional module and the third functional module continue to perform a monitoring working state of dual-mode comparison, and at this time, if the permanent fault occurs again, no cold backup functional module exists, and the permanent fault needs to be processed in a reconstruction replacement mode.
As shown in fig. 7(a), after the first permanent fault tolerance is completed, the reconfigurable module enters a single-mode shielding state, the second functional module with the fault is shielded, and only the first functional module and the third functional module run in real time. And triggering a second fault at 4984 observation points, entering a state reconfigurable module shielding state, shielding all functional modules of the first reconfigurable module at 5004 observation points, and failing the whole reconfigurable module.
In fig. 7(b), at 5000 observation points, the processor receives a signal that all the first reconfigurable modules fail (perm _ error is 4), and at this time, the enable signal (en _ RM1 signal) of the first reconfigurable module is set to 0, so that the first reconfigurable module is shielded, and the second reconfigurable module is enabled to repair the permanent fault of the first reconfigurable module.
As shown in fig. 7(c), at 5015 observation points, self-repairing of permanent faults of the first reconfigurable module is completed in a reconfiguration and replacement manner, at this time, a second reconfigurable module with the same function is enabled, and at 5717 observation points, the first functional module and the second functional module in the second reconfigurable module enter a normal operation detection state.
In order to explain the rapidity of the self-repair of the reconfigurable circuit by the software and hardware cooperative processing as a whole, the explanation is given by the repair time.
The transient fault response time of the reconfigurable circuit with the cooperative processing of software and hardware provided by the invention to a single module and a plurality of modules is 1372us, while 66520us is required for traditional frame-by-frame circulation self-detection by adopting a configuration memory. Although 66520us is the longest detection time, i.e. the worst case of the detection time of the transient fault, 923us is needed in the best case, because the best case is that when the transient fault occurs in a certain frame, the probability of the situation that the processor detects the last pin is very low, the invention can ensure the rapidity and stability of fault response.

Claims (6)

1. A software and hardware cooperative processing reconfigurable circuit is characterized by comprising a logic end and a processor end, wherein the logic end is connected with the processor end through an AXI bus, and an HWICAP interface is respectively connected with the AXI bus and the logic end;
the logic end comprises an auxiliary selection module and three reconfigurable modules with the same function, and each reconfigurable module internally comprises three functional modules with the same structure; the reconfigurable module transmits a fault enabling signal to a fault interrupt responder at a processor end through an AXI bus;
the auxiliary selection module selects the reconfigurable module which needs to be enabled currently, and simultaneously shields the reconfigurable module which does not work currently; the auxiliary selection module is internally provided with a multiplexer, only one enabling signal is output at the same time, and the enabling signals of the other two reconfigurable modules are shielded;
the processor end comprises a fault interrupt responder, a fault processor and an error correction result judger;
the fault interrupt responder responds to a fault signal transmitted to the processor end by the logic end and interrupts the detection process of the current processor end on the fault; providing a quick response fault for the cooperative processing and entering a fault repairing state;
the fault processor is used for self-repairing the reconfigurable module to restore the reconfigurable module to a normal working state;
the error correction result judger starts to time and judge whether the reconfigurable module of the logic end has faults again after the configuration information of the reconfigurable module is refreshed and repaired through the processor end, and if a fault signal is continuously transmitted, the last repair is considered to be invalid and enters a permanent fault repair state;
and when the fault processing state is detected, the processor end reads correct configuration information from the external memory, the HWICAP controller reads the configuration information from the external memory to the ICAP interface, and the ICAP interface finishes automatic configuration refreshing of the configuration information.
2. The software and hardware cooperative processing reconfigurable circuit according to claim 1, wherein three reconfigurable modules with the same function are divided into a working module and two cold backup modules according to working states; the reconfigurable module in the working state is a working module and is used for outputting a real-time logic function result; the two cold backup modules are in a standby state after being electrified and used for replacing the logic function when the working module has permanent failure.
3. The software and hardware co-processing reconfigurable circuit according to claim 1, wherein each reconfigurable module internally comprises two hot backup functional modules and a cold backup functional module which form dual-mode redundancy; the hot backup function module of the dual-mode redundancy detects whether the function module fails in real time;
when no fault occurs, selecting the result of any hot backup functional module to output; and when a fault occurs, the cold backup functional module is started to form a triple modular redundancy structure.
4. A software and hardware cooperative processing reconfigurable circuit self-repairing method is characterized by comprising logic end self-repairing and processing end self-repairing, wherein the logic end and the processor end complete quick response to faults through classification cooperative processing, the logic end detects the faults of a user logic layer through dual-mode comparison, and the processor end continuously monitors and responds to fault signals output by the logic end; if the fault signal is output, the processor end quickly enters a fault response stage; otherwise, the processor end carries out the circulating self-detection of the fault in real time;
(41) the self-repairing process of the logic terminal is as follows:
(411) after the system initialization is completed, the first reconfigurable module of the logic end enters a dual-mode comparison working and fault detection state and continuously monitors whether a fault occurs; when the dual-mode comparison result shows that a fault occurs, enabling the cold backup function module to simultaneously output a fault signal and outputting the fault signal to the processor end;
(412) after the cold backup function module is enabled, whether a fault module is positioned at the moment needs to be judged; if the fault module is located, the fault module is shielded, meanwhile, the correct result is output through the correct two functional modules, the normal working state is recovered in time, and meanwhile, the module is waited for refreshing, so that the fault module is recovered to be normal; if the fault module cannot be located, all three functional modules are immediately shielded, the situation that the operation of the system is influenced by error output is avoided, and refreshing and repairing of the reconfigurable module are waited;
(413) after the refreshing and repairing are finished, whether the fault is instantaneous fault repairing needs to be judged, and if the fault is instantaneous fault repairing, the working state of the first reconfigurable module is continuously recovered; and if the fault is a permanent fault, entering a second reconfigurable module working state.
(42) The implementation process of self-repair at the processor end is as follows:
(421) after the system is initialized, the processor continuously reads back and compares the configuration information, and detects the fault of the configuration information; in the cycle detection process, when a fault signal of a logic end is transmitted to a processor end, a fault interrupt responder is triggered, and the processor end refreshes and repairs the fault of the reconfigurable module through the configuration of the fault processor;
(422) after the repair is finished, timing is started, if the fault interrupt responder is triggered again within the specified time, the last fault is not repaired, the state of the reconfigurable module is replaced and repaired by the processor, and meanwhile, the configuration information of the reconfigurable module needing to be detected is changed by the processor; if the fault interrupt responder is not triggered again within the specified time, the last repair is a valid repair, and the processor returns to the fault detection state.
5. The software and hardware co-processing reconfigurable circuit self-repairing method according to claim 4, wherein the dual-mode comparison working and fault detection states are: when the functional module has no fault, selecting the result of any functional module in the two hot backups to output; when the functional module fails, a fault signal is output through dual-mode comparison, the cold backup functional module is started to form a triple-modular redundancy structure, a correct result is output through a voter, and the function of the system is prevented from being influenced by the wrong result for a long time;
the dual-mode comparison work is as follows: when the first functional module and the second functional module are used as hot backup functional modules, the first functional module and the second functional module work after the system is powered on, and output results are judged through a judger; the third functional module is used as a cold backup functional module, is not enabled after the system is powered on, and has no signal output inside; at the moment, only the first functional module and the second functional module need to be compared, and if no fault exists, the output results of the first functional module and the second functional module are consistent, and the system continues to operate normally; if any one of the first functional module and the second functional module has single event upset, the selector outputs an error signal at the moment, and the error signal enables the third functional module.
6. The software and hardware co-processing reconfigurable circuit self-repairing method according to claim 5, wherein when the processor finds that the first reconfigurable module has a permanent fault, the first reconfigurable module is shielded by adopting a replacement and refresh mode, so that the second reconfigurable module is enabled;
when the processor finds that the second reconfigurable module has a permanent fault, the second reconfigurable module is shielded by adopting a replacement and refresh mode, a third reconfigurable module is enabled, and the second reconfigurable module is shielded; at this time, if the third reconfigurable module has a permanent fault, the entire system fails.
CN202111113334.6A 2021-09-23 2021-09-23 Reconfigurable circuit for software and hardware cooperative processing and self-repairing method thereof Active CN113836079B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111113334.6A CN113836079B (en) 2021-09-23 2021-09-23 Reconfigurable circuit for software and hardware cooperative processing and self-repairing method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111113334.6A CN113836079B (en) 2021-09-23 2021-09-23 Reconfigurable circuit for software and hardware cooperative processing and self-repairing method thereof

Publications (2)

Publication Number Publication Date
CN113836079A true CN113836079A (en) 2021-12-24
CN113836079B CN113836079B (en) 2024-03-19

Family

ID=78969156

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111113334.6A Active CN113836079B (en) 2021-09-23 2021-09-23 Reconfigurable circuit for software and hardware cooperative processing and self-repairing method thereof

Country Status (1)

Country Link
CN (1) CN113836079B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101930052A (en) * 2010-07-21 2010-12-29 电子科技大学 Online detection fault-tolerance system of FPGA (Field programmable Gate Array) digital sequential circuit of SRAM (Static Random Access Memory) type and method
CN112269686A (en) * 2020-10-29 2021-01-26 南京航空航天大学 LUTRAM self-repairing structure and method based on cold backup dual-mode error detection code

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101930052A (en) * 2010-07-21 2010-12-29 电子科技大学 Online detection fault-tolerance system of FPGA (Field programmable Gate Array) digital sequential circuit of SRAM (Static Random Access Memory) type and method
CN112269686A (en) * 2020-10-29 2021-01-26 南京航空航天大学 LUTRAM self-repairing structure and method based on cold backup dual-mode error detection code

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ŞINCA RĂZVAN 等: "Software Redundancy Implementation Strategy in Reconfigurable Hardware Framework", 2019 8TH INTERNATIONAL CONFERENCE ON MODERN POWER SYSTEMS, 31 December 2019 (2019-12-31), pages 1 - 6 *
张砦 等, 航空学报, vol. 42, no. 7, 25 July 2021 (2021-07-25), pages 1 - 12 *

Also Published As

Publication number Publication date
CN113836079B (en) 2024-03-19

Similar Documents

Publication Publication Date Title
US10078565B1 (en) Error recovery for redundant processing circuits
US5923830A (en) Non-interrupting power control for fault tolerant computer systems
Nelson Fault-tolerant computing: Fundamental concepts
Rennels Fault-tolerant computing—Concepts and examples
CN101930052B (en) Online detection fault-tolerance system of FPGA (Field programmable Gate Array) digital sequential circuit of SRAM (Static Random Access Memory) type and method
CN111352338B (en) Dual-redundancy flight control computer and redundancy management method
KR20010005956A (en) Fault tolerant computer system
US20230350746A1 (en) Fault-tolerant system with multi-core cpus capable of being dynamically configured
CN104731670B (en) A kind of rotation formula spaceborne computer tolerant system towards satellite
US10754760B1 (en) Detection of runtime failures in a system on chip using debug circuitry
CN102521066A (en) On-board computer space environment event fault tolerance method
CN105279049A (en) Method for designing triple-modular redundancy type fault-tolerant computer IP core with fault spontaneous restoration function
CN109634171B (en) Dual-core dual-lock-step two-out-of-two framework and safety platform thereof
US9952579B2 (en) Control device
CN108958987B (en) Low-orbit small satellite fault-tolerant system and method
CN107807902B (en) FPGA dynamic reconfiguration controller resisting single event effect
CN113836079A (en) Software and hardware cooperative processing reconfigurable circuit and self-repairing method thereof
CN111856991B (en) Signal processing system and method with five-level protection on single event upset
CN111785310A (en) FPGA (field programmable Gate array) reinforcement system and method for resisting single event upset
Su et al. An overview of fault-tolerant digital system architecture
Xu et al. Fault tolerance technique based on state real-time synchronization
US20230064905A1 (en) Semiconductor device
CN117170279A (en) Design method based on dual-multi-core PSOC redundant flight control system
CN113721135B (en) SRAM type FPGA fault online fault tolerance method
CN110750391B (en) TMR monitoring type-based high-performance anti-irradiation reinforcing system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant