CN113836079B - Reconfigurable circuit for software and hardware cooperative processing and self-repairing method thereof - Google Patents

Reconfigurable circuit for software and hardware cooperative processing and self-repairing method thereof Download PDF

Info

Publication number
CN113836079B
CN113836079B CN202111113334.6A CN202111113334A CN113836079B CN 113836079 B CN113836079 B CN 113836079B CN 202111113334 A CN202111113334 A CN 202111113334A CN 113836079 B CN113836079 B CN 113836079B
Authority
CN
China
Prior art keywords
fault
module
reconfigurable
processor
logic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111113334.6A
Other languages
Chinese (zh)
Other versions
CN113836079A (en
Inventor
黄莉莉
张砦
王涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202111113334.6A priority Critical patent/CN113836079B/en
Publication of CN113836079A publication Critical patent/CN113836079A/en
Application granted granted Critical
Publication of CN113836079B publication Critical patent/CN113836079B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS

Abstract

The invention discloses a reconfigurable circuit for cooperative processing of software and hardware and a self-repairing method thereof, wherein the circuit comprises a logic end and a processor end, the logic end is connected with the processor end through an AXI bus, and a HWIAP controller is respectively connected with the AXI bus and the logic end; the logic end comprises an auxiliary selection module and three reconfigurable modules with the same functions, and the reconfigurable modules transmit fault enabling signals to a fault interrupt responder of the processor end through an AXI bus; the auxiliary selection module selects a reconfigurable module which is required to be enabled currently, and shields the reconfigurable module which does not work currently; the processor end comprises a fault interrupt responder, a fault processor and an error correction result judging device. The invention can rapidly locate faults through the dual-mode comparison functional module in the logic end reconfigurable module, simultaneously transmits fault signals to the processor end, and the processor end interrupts the internal cyclic self-detection of faults by responding to the fault signals, thereby improving the self-repairing speed.

Description

Reconfigurable circuit for software and hardware cooperative processing and self-repairing method thereof
Technical Field
The invention belongs to the field of design of fault-tolerant methods of reconfigurable hardware, and particularly relates to a reconfigurable circuit for cooperative processing of software and hardware and a self-repairing method thereof.
Background
Commercial spot SRAM (Static Random-Access Memory) FPGA (Field-Programmable Gate Array Field programmable Gate array) is increasingly used in various fields such as aerospace, railway, autopilot, medical treatment and the like due to the characteristics of low cost, short development period, good performance, on-track reconfiguration and the like. Different from the ground environment, high-energy particle radiation in the aerospace environment has deadly threat to an electronic system, and the SRAM type FPGA is more easily influenced by the radiation particles due to the process specificity, so that a single particle effect occurs.
Faults generated by the spatial radiation effect can be classified into transient faults and permanent faults, which are affected by the single event effect. The instantaneous fault can be repaired by powering up again or refreshing configuration information, and the memory cell structure of the device is not damaged. And permanent faults are often represented as permanent damage to the device itself, which cannot be repaired by powering up again and refreshing configuration information.
SRAM type FPGAs can implement different logic functions with limited hardware resources due to their reconfigurability. These alternative areas for implementing different functions are called reconfigurable modules. Since the bottom layer of the reconfigurable module is composed of SRAM cells, it is also susceptible to transient and permanent failures from space radiation. Fault tolerant designs of the reconfigurable modules are required to ensure that the reconfigurable modules can continue to operate reliably after a failure.
The current fault tolerance method for improving the reconfigurable module comprises the technologies of multimode redundancy, configuration refreshing, error detection and correction code and the like. The multimode redundancy is mainly realized by redundancy backup of the reconfigurable module, the error module is shielded after the fault occurs, and the correct result is continuously output by using the rest modules. Multimode redundancy is classified into cold-backup multimode redundancy and hot-backup multimode redundancy according to whether the redundancy module is continuously operated. The redundancy module in the cold backup multimode redundancy does not operate the corresponding function after being electrified and is in a standby state, so that the operation power consumption of the system can be effectively saved; the hot backup multimode redundancy continuously operates after being powered on and participates in comparison of the corresponding reconfigurable modules, a correct result is output, the hot backup multimode redundancy can rapidly output the correct result when faults occur, the starting process of the cold backup module is avoided, but the power consumption is overlarge, and more comparators are needed to participate in comparison of the output results of the reconfigurable modules.
The configuration refreshing technology carries out real-time readback, comparison and refreshing on the configuration information of the configuration memory layer corresponding to the reconfigurable module through the processor system. The current common self-repairing method mainly adopts external global refreshing and identifies and locates key bits of global configuration information. The global refreshing mode needs longer time for refreshing once, the analysis of the configuration information by the key frame positioning mode and the positioning result still comprise the configuration information of all utilized modules, and the configuration information is large in quantity, so that the method is not suitable for the conditions of small implementation quantity and high flexibility of the SRAM type FPGA real-time function in the current space environment.
The error detection and correction code technology can locate and correct transient faults such as single event upset of a limited number of bits by adding additional configuration information, but has limited capability of repairing the number of fault bits, and cannot process possible multi-bit single event upset in a multi-redundancy module at the same time, and cannot process permanent faults.
In summary, the combination of the multimode redundancy technique and the configuration refresh technique not only can locate the fault region of the reconfigurable module, but also can self-repair the transient fault and the permanent fault of the reconfigurable module. However, the configuration refresh technique cycles through longer detection times, and the power consumption of the multi-backup redundancy of the reconfigurable module increases with the increase of redundant modules. Therefore, a solution is needed to solve the problems of low repair power consumption, reduced self-detection time for faults, thereby accelerating self-repair speed for faults and handling permanent faults.
Disclosure of Invention
The invention aims to: in order to relieve the repair power consumption of a reconfigurable modular circuit in an FPGA, reduce the self-detection time of faults, thereby accelerating the self-repair speed of the faults and simultaneously solving the problem of permanent self-repair of the faults, the invention provides a reconfigurable circuit structure based on the cooperative processing of software and hardware and a self-repair method thereof.
The technical scheme is as follows: the reconfigurable circuit comprises a logic end and a processor end, wherein the logic end is connected with the processor end through an AXI bus, and a HWIAP interface is respectively connected with the AXI bus and the logic end;
the logic end comprises an auxiliary selection module and three reconfigurable modules with the same functions, and each reconfigurable module comprises three functional modules with the same structure; the reconfigurable module transmits a fault enabling signal to a fault interrupt responder at a processor end through an AXI bus;
the auxiliary selection module selects a reconfigurable module which is required to be enabled currently, and shields the reconfigurable module which does not work currently; a multiplexer is arranged in the auxiliary selection module, and only outputs one enabling signal and shields enabling signals of the other two reconfigurable modules at the same time;
the processor end comprises a fault interrupt responder, a fault processor and an error correction result judging device;
the fault interrupt responder responds to the fault signal transmitted to the processor end by the logic end and interrupts the current fault detection process of the processor end; providing a fast response fault for the cooperative processing and entering a fault repair state;
the fault processor is used for self-repairing the reconfigurable module to enable the reconfigurable module to recover to a normal working state;
the error correction result judging device starts timing to judge whether the reconfigurable module at the logic end fails again after the configuration information of the reconfigurable module is refreshed and repaired through the processor end, and if the fault signal is transmitted continuously, the last repair is considered to be invalid repair and enters a permanent fault repair state;
and in the fault processing state, the processor side reads the correct configuration information from the external memory, reads the configuration information from the external memory to an ICAP interface through the HWIAP controller, and completes the automatic configuration refreshing of the configuration information through the ICAP interface.
The three reconfigurable modules with the same function are divided into a working module and two cold backup modules according to working states; the reconfigurable module in the working state is a working module and is used for outputting the real-time logic function result; the two cold backup modules are in a standby state after being electrified and are used for replacing logic functions when the working modules have permanent faults.
Each reconfigurable module internally comprises two hot backup functional modules and one cold backup functional module which form dual-mode redundancy; the dual-mode redundant hot backup functional module detects whether the functional module fails in real time;
when no fault occurs, selecting any hot backup function module to output a result; when a fault occurs, the cold backup functional module is started to form a triple-modular redundancy structure.
The invention discloses a reconfigurable circuit self-repairing method, which comprises logic end self-repairing and processing end self-repairing, wherein the logic end and the processor end classify and cooperatively process to complete quick response to faults, the logic end detects faults of a user logic layer through dual-mode comparison, and the processor end continuously monitors and responds to fault signals output by the logic end; if the fault signal is output, the processor rapidly enters a fault response stage; otherwise, the processor end carries out the cyclic self-detection of the faults in real time;
(41) The implementation process of the logic end self-repairing is as follows:
(411) After the system initialization is completed, the first reconfigurable module at the logic end enters a dual-mode comparison work and fault detection state, and whether faults occur or not is continuously monitored; when the dual-mode comparison result is that a fault occurs, enabling the cold backup functional module to output a fault signal at the same time, and outputting the fault signal to the processor end;
(412) After enabling the cold backup function module, judging whether a fault module is positioned at the moment; if the fault module is located, shielding the fault module, outputting a correct result through the correct two functional modules, and recovering to a normal working state in time, and waiting for refreshing of the module to recover the fault module to be normal; if the fault module cannot be positioned, immediately shielding all three functional modules, avoiding that the error output influences the operation of the system, and waiting for refreshing and repairing the reconfigurable module;
(413) After the refresh repair is completed, whether the temporary fault repair is needed to be judged, if so, the operation state of the first reconfigurable module is continuously restored; and if the fault is a permanent fault, entering a working state of the second reconfigurable module.
(42) The implementation process of the self-repairing of the processor end is as follows:
(421) After the system is initialized, the processor end continuously compares the configuration information through readback and detects faults of the configuration information; in the cyclic detection process, when a fault signal of a logic end is transmitted to a processor end, triggering a fault interrupt responder, and refreshing and repairing the fault of a reconfigurable module by the processor end through the configuration of a fault processor;
(422) After the repair is finished, starting timing, if the fault interrupt responder is triggered again within a specified time, indicating that the last fault is not repaired, entering a state that the processor replaces the repaired reconfigurable module, and simultaneously changing the configuration information of the reconfigurable module to be detected by the processor; if the fault interrupt responder is not triggered again within the set time, the last repair is the effective repair, and the processor fault detection state is returned.
Further, the dual mode comparison operation and fault detection states are: when no fault occurs to the functional module, selecting any functional module in the two hot backups to output a result; when the functional module fails, a failure signal is output through dual-mode comparison, a cold backup functional module is started to form a triple-mode redundancy structure, and a correct result is output through a voter to prevent the incorrect result from affecting the function of the system for a long time;
the dual mode comparison operation is: when the first functional module and the second functional module are used as hot backup functional modules, the hot backup functional modules work after the system is electrified, and the output result is judged by a judging device; the third functional module is used as a cold backup functional module, is not enabled after the system is powered on, and has no signal output inside; at this time, only the first functional module and the second functional module need to be compared, if no fault exists, the output results of the first functional module and the second functional module are consistent, and the system continues to operate normally; if any one of the first functional module and the second functional module is turned over by single event, the selector outputs an error signal at this time, and the error signal enables the third functional module.
Further, when the processor finds that the first reconfigurable module has a permanent fault, the first reconfigurable module is shielded by adopting a replacement refreshing mode, so that the second reconfigurable module is enabled;
when the processor finds that the second reconfigurable module has a permanent fault, the second reconfigurable module is shielded by adopting a replacement refreshing mode, so that a third reconfigurable module is enabled, and the second reconfigurable module is shielded; at this time, if the third reconfigurable module fails permanently, the entire system fails.
Compared with the prior art, the invention has the following remarkable effects: 1. the fault can be rapidly positioned through the functional module of dual-mode comparison in the reconfigurable module of the logic end (hardware), meanwhile, the fault signal is transmitted to the processor end (software), and the processor end interrupts the internal cyclic self-detection of the fault by responding to the fault signal, so that the self-repairing speed is improved; 2. by adding the cold backup reconfigurable module at the logic end and pre-restraining the physical area of the reconfigurable module, the problem that the whole system fails due to the permanent fault of the reconfigurable module can be solved, so that the problem that the permanent fault can be tolerated under the condition of solving the transient fault can be solved.
Drawings
FIG. 1 is a schematic diagram of a reconfigurable circuit structure according to the present invention;
FIG. 2 is a fault self-repairing structure diagram of the software and hardware cooperative processing of the invention;
fig. 3 (a) is a diagram showing an operating state of the third functional module of the logic side circuit according to the present invention,
fig. 3 (b) is a diagram showing the operation of fault determination when the logic end circuit of the present invention has a single-mode fault,
fig. 3 (c) is a diagram showing the operation of the first functional module of the logic side circuit according to the present invention,
fig. 3 (d) is a diagram showing the second functional module of the logic end circuit of the present invention not operating;
FIG. 4 is a flow chart of a self-repairing method of a logic end of a reconfigurable circuit of a software and hardware system;
FIG. 5 is a flow chart of a self-repairing method of the reconfigurable circuit processor of the software and hardware system of the invention;
fig. 6 (a) is a timing diagram of transient fault localization of a reconfigurable module for software and hardware co-processing according to the present invention,
FIG. 6 (b) is a timing diagram of the transient fault refresh self-repair of the reconfigurable module for the software and hardware co-processing of the present invention
FIG. 6 (c) is a timing diagram of the instant fault function recovery of the reconfigurable module for the software and hardware co-processing of the present invention;
figure 7 (a) is a timing diagram of permanent fault localization of a reconfigurable module for software and hardware co-processing according to the present invention,
FIG. 7 (b) is a timing diagram of a reconfigurable module permanent fault replacement self-repairing system for software and hardware co-processing according to the present invention
Fig. 7 (c) is a timing diagram of a reconfigurable module permanent fault function recovery for software and hardware co-processing according to the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description.
FIG. 1 is a schematic diagram of a circuit structure of the present invention, including a logic end and a processor end, where the logic end is connected to the processor end through an AXI bus, and fault signals and enable signals of a reconfigurable module are transmitted through the AXI bus; the HWIAP interface is respectively connected with the AXI bus and the logic end to realize the transfer of the configuration information from the external memory to the configuration memory.
The logic end comprises an auxiliary selection module and three reconfigurable modules with the same function. Each reconfigurable module internally comprises three functional modules with the same structure: two hot backup functional modules and a cold backup functional module which form dual-mode redundancy; and the dual-mode redundant hot backup functional module detects whether the functional module fails or not in real time through comparison of output results. When no fault occurs to the functional module, selecting any functional module in the two hot backups to output a result; when the functional module fails, a failure signal can be output through dual-mode comparison, the cold backup functional module is started to form a triple-mode redundancy structure, and a correct result is output through the voter to prevent the incorrect result from affecting the function of the system for a long time. And by means of dual-mode redundancy comparison of the logic end, the fault of the user logic layer is rapidly positioned, and meanwhile, fault signals are transmitted to the processor end through an AXI bus, so that a foundation is provided for cooperative processing.
In each reconfigurable module, two functional modules in a working state detect whether a logic end fails through dual-mode comparison, and when the comparison result is that the logic end fails, a fault enabling signal is transmitted to a fault interrupt responder of a processor end through an AXI bus, and the processor end interrupts the current working process and enters a fault response state; in a fault response state, the fault processor adopts transient fault refreshing configuration processing and permanent fault replacement reconstruction processing for the fault according to the current fault type, and enters a fault correction result judging state after the self-repairing process is finished; in the error correction result judging state, waiting for whether a fault is triggered again or not in a specified time, and if the fault is not generated, effectively repairing the last fault; if the fault is triggered again within the set time, the last repair is determined to be invalid repair, and the fault processing state is entered, and the self-repair is performed by adopting a permanent fault replacement reconstruction method.
The auxiliary selection module is used for selecting the Reconfigurable Module (RM) which is required to be enabled currently according to the reconfigurable module enabling signal output by the processor end, and shielding the reconfigurable module which does not work currently. The inside of the device comprises a multiplexer which outputs only one enabling signal at the same time and shields enabling signals of the other two reconfigurable modules; when the system is just powered on, a first reconfigurable module enabling signal is output, after the system is permanently failed, the first reconfigurable module which is failed is shielded according to the signal transmitted by the processor end, an idle and effective second reconfigurable module is enabled, the stable running state is continuously entered, a correct result is output until all the reconfigurable modules are failed, the whole equipment is failed, and a reconfigurable module enabling function is provided for cooperative processing.
The processor side comprises a fault interrupt responder, a fault processor and an error correction result judging device.
The fault interrupt responder is used for responding to fault signals transmitted to the processor end by the logic end, interrupting the fault detection process of the traditional processor end, providing quick response to the fault for cooperative processing and entering a fault repair state.
And the fault processor is used for self-repairing the current reconfigurable module in a proper mode so as to restore the reconfigurable module to a normal working state.
And the error correction result judging device is used for starting timing to judge whether the reconfigurable module at the logic end fails again after the configuration information of the reconfigurable module is refreshed and repaired through the processor end, and if the fault signal is transmitted continuously, the last repair is considered to be invalid repair and enters a permanent fault repair state. In the whole co-processing process, the error correction result judging device determines the type of the fault, namely the transient fault or the permanent fault, so that a basis can be provided for the subsequent fault classification.
In fault processing state, the processor side reads correct configuration information from the external memory, reads the configuration information from the external memory to the ICAP interface through the HWIAP controller, and completes automatic configuration refreshing of the configuration information through the ICAP interface.
As shown in fig. 2, the three reconfigurable modules at the logic end each implement the same function, but only one reconfigurable module is in an operation state at the same moment, and the first enable signal en_rm1 is input; the other two modules are cold backup redundant modules (namely black boxes), a second enabling signal en_RM2 and a third enabling signal en_RM3 are input, and are not enabled in a cold backup state, and no function is realized.
As shown in fig. 3 (a), each reconfigurable module includes three functional modules (the first functional module and the second functional module are working functional modules, the third functional module is a cold backup functional module), the three functional modules implement the same function, the fault is monitored by a dual-mode comparison mode, the third functional module is in a cold backup state, and the cold backup state is that the system functional circuit is mapped in a configuration memory layer, but the function to be implemented for the functional module is not enabled, and is in a cold backup state. When the dual-mode comparison fails, the cold backup functional module is started, the functional module with the failure can be positioned through the three-mode comparison, the failure module is shielded, the correct signal can be continuously output through the other two functional modules, meanwhile, the failure signal is output to the processor end, and the processor is used for classifying the failure and self-repairing the failure.
The reconfigurable circuit provided by the invention is used for accelerating the self-repairing speed of the configuration memory layer in the reconfigurable module, and simultaneously repairing the transient faults and permanent faults of the user logic layer and the configuration memory layer, thereby realizing the rapid self-repairing of the double-layer multi-fault type of the reconfigurable module.
Conventionally, fault repair of a reconfigurable module configuration memory layer needs to repair a fault by global frame-by-frame read-back detection, this method needs to traverse all frames, and the time required for detecting a cycle is long, if an error occurs immediately after detection of a certain frame is completed, the fault can be found only when the error is repaired and the error is recycled to the frame after traversing all frames, so that the increase of the self-detection time of the fault will affect the repair time of the fault. The invention realizes a rapid fault self-repairing method based on a reconfigurable circuit cooperatively processed by software and hardware, when a logic end detects a fault through dual-mode comparison, the fault is immediately output to a processor core, the current process is interrupted, a fault processing mechanism is immediately responded, and the refreshing self-repairing of a reconfigurable module is completed. The self-repairing method not only can quickly repair transient faults, but also can quickly cope with permanent faults.
The circuit structure diagram of the self-repairing method of the logic end functional module of the invention is described with reference to fig. 3:
in fig. 3 (a), the first functional module and the second functional module are used as hot standby functional modules, and work is performed after the system is powered on, and the output result is judged by the judging device; the third functional module is used as a cold backup functional module and is not enabled after the system is powered on, so that the third functional module does not operate and no signal is output inside; at this time, only the first functional module and the second functional module need to be compared, if no fault exists, the output results of the first functional module and the second functional module are consistent, and the system continues to operate normally; if any one of the first functional module and the second functional module is turned over by single event, the selector outputs error signal error_en at this time, and the error signal error_en enables the third functional module, and enters the single-mode fault operation state shown in (b) of fig. 3 at this time.
As shown in fig. 3 (b), the first functional module and the second functional module are compared with the third functional module (which is already enabled, and is in an operating state at this time), respectively, and an erroneous functional module is located; if the first functional module fails, the first functional module is disabled, a correct result is output through the two-three-out-of-two multiplexer, the single-mode failure running state shown in (c) of fig. 3 is entered, and the processor waits for refreshing and repairing of the whole reconfigurable module; if the second functional module fails, the second functional module is disabled, and the correct result is output through the two-three-out-of-two multiplexer, so that the positioning failure state shown in (d) of fig. 3 is entered, and the processor waits for refreshing and repairing of the whole reconfigurable module; the refresh repair is completed to be restored to the normal operation state as shown in (a) of fig. 3.
FIG. 4 is a flow chart of a logic-side self-repairing method. After the system initialization is completed, the first reconfigurable module at the logic end enters a dual-mode (a first hot backup and a second hot backup) comparison work and fault detection state, and whether faults occur or not is continuously monitored. When the dual-mode comparison result is wrong, the cold backup functional module is enabled to output a fault signal at the same time, and the fault signal is output to the processor end. After the cold backup functional module is enabled, it needs to be determined whether a single module fault occurs at this time, that is, whether a fault module can be located: if the fault module is accurately positioned, shielding the fault module, outputting a correct result through the correct two functional modules, and recovering to a normal working state in time, and waiting for refreshing of the module to enable the fault module to recover to be normal; if the fault module cannot be positioned, the three functional modules are shielded immediately, so that the influence of error output on the operation of the system is avoided, and the refreshing and repairing of the reconfigurable module are waited. Judging whether the refreshing repair is instantaneous fault repair or not after the refreshing repair is finished, if so, continuing to restore to the working state of the first reconfigurable module; and if the fault is a permanent fault, entering a working state of the second reconfigurable module.
FIG. 5 is a flow chart of the self-repairing method of the processor side. Because the invention is self-repairing processed by the software and hardware system, the self-repairing of faults is completed by matching the processor end. After the system is initialized, the processor end continuously compares the configuration information through readback and detects faults of the configuration information, and the whole cycle detection time is long. Therefore, in the cyclic detection process, when a fault signal of the logic end is transmitted to the processor end, the fault interrupt responder is triggered, and the processor end repairs the fault of the reconfigurable module through the configuration refreshing technology of the fault processor. After the repair is finished, starting timing, if the fault interrupt responder is triggered again within a specified time, indicating that the last fault is not repaired, entering a state that the processor replaces the repaired reconfigurable module, and simultaneously changing the configuration information of the reconfigurable module to be detected by the processor; if the fault interrupt responder is not triggered again within the set time, the last repair is the effective repair, and the processor fault detection state is returned.
The invention can quickly locate and repair the transient faults through the cooperative processing of the software and the hardware, and can repair the permanent faults at the same time. Describing the self-repair of permanent faults in connection with fig. 2, at the beginning of the system power-up, only the first reconfigurable module is in operation, i.e. the first enable signal en_rm1; the second reconfigurable module and the third reconfigurable module are in a cold standby state (black box state), no mapping circuit exists on the bottom hardware circuit, namely, the second enabling signal en_RM2 and the third enabling signal en_RM3 are not enabled, and when the processor finds that the first reconfigurable module has a permanent fault, the first reconfigurable module is shielded in a replacement refreshing mode, so that the second reconfigurable module is enabled. In this way, when the second reconfigurable module fails permanently, the third reconfigurable module is enabled, shielding the second reconfigurable module, and if the permanent failure occurs again, the whole system fails, so the number of reconfigurable modules depends on the time of service and the space environment in which the whole chip is located.
According to the circuit structure and the self-repairing method, the transient faults generated by the functional module are tested. As shown in fig. 6 (a) is a signal observation diagram of a transient fault self-repairing mode of a functional module, fault signals of multi-bit inversion are injected at 4904 observation points, the output signals s2p_dout_1 in the first functional module and the output signals s2p_dout_2 in the second functional module are all faulty at 4992 observation points, and the output signals s2p_dout_3 in the third functional module of the cold backup are started at 4994 observation points, but at this time, because the two functional modules are faulty, fault location cannot be performed by starting the cold backup module, so that the fault can only be self-repaired by a reconstruction refreshing mode. Therefore, at this time, the multi-module fault signal (en_wrong_multiple signal) is enabled, and at the same time, the multi-bit flip fault self-repair starts to time, in fig. 6 (b), the 4995 observation point processor transmits the multi-bit flip fault signal through the AXI bus, and after detection, the reconfigurable module enable signal (ps 2 pl) is set to 0; at 4998 observation points, refresh repair is started, and meanwhile repair completion timing time (repair_time) begins to be timed; when the refresh repair is finished at the 5000 observation point in fig. 6 (c), the first reconfigurable module is restored to normal operation again, and en_rm1 is restored to 1, which indicates that the first reconfigurable module is restored to normal operation, and the function module continues the dual-mode comparison at the 5316 observation point and is restored to the normal operation state, thereby completing the self-repair of the multi-bit transient fault of the dual-function module of the primary function cell.
When the first functional module detects the permanent fault, fault-tolerant repair is carried out on the permanent fault by a method of shielding the first functional module, at the moment, the second functional module and the third functional module continue to monitor the working state of dual-mode comparison, at the moment, if the permanent fault occurs again, no cold backup functional module exists, and the permanent fault needs to be processed by a reconstruction replacement mode.
As in (a) of fig. 7, after the first permanent fault tolerance is completed, the reconfigurable module enters a single mode shielding state, the failed second functional module is shielded, and only the first functional module and the third functional module operate in real time. Triggering a second fault at 4984 observation points, entering a state reconfigurable module shielding state, shielding all functional modules of the first reconfigurable module at 5004 observation points, and enabling the whole reconfigurable module to fail.
At 5000 observation points in fig. 7 (b), the processor receives a signal (perm_error=4) that the first reconfigurable module has failed entirely, at which time the enable signal (en_rm1 signal) of the first reconfigurable module is set to 0, shielding the first reconfigurable module, and enabling the second reconfigurable module to repair the permanent failure of the first reconfigurable module.
As in (c) of fig. 7, at 5015 observation points, self-repair of permanent failure of the first reconfigurable module is completed by way of reconfiguration replacement, at which time a second reconfigurable module having the same function is enabled, and at 5717 observation points, the first and second functional modules in the second reconfigurable module enter a normal operation detection state.
To illustrate the rapidity of self-repair of the reconfigurable circuit for the whole software and hardware co-processing, the description is given by the repair time.
The reconfigurable circuit with the software and hardware cooperative processing has the transient fault response time of 1372us for a single module and a plurality of modules, and 66520us is needed for the traditional frame-by-frame cyclic self-detection by adopting a configuration memory. Although 66520us is the longest time to detect, i.e. the worst case of the transient fault detection time, 923us is required in the best case, and since the best case is when a transient fault occurs in a certain frame, the processor just detects the last needle, and the probability of the situation is low, so that the invention can ensure the rapidity and the stability of fault response.

Claims (4)

1. The reconfigurable circuit is characterized by comprising a logic end and a processor end, wherein the logic end is connected with the processor end through an AXI bus, and a HWIAP interface is respectively connected with the AXI bus and the logic end;
the logic end comprises an auxiliary selection module and three reconfigurable modules with the same functions, and each reconfigurable module comprises three functional modules with the same structure: two hot backup functional modules and a cold backup functional module which form dual-mode redundancy; the dual-mode redundant hot backup functional module detects whether the functional module fails in real time;
when no fault occurs, selecting any hot backup function module to output a result; when a fault occurs, a cold backup functional module is started to form a triple-modular redundancy structure;
the reconfigurable module transmits a fault enabling signal to a fault interrupt responder at a processor end through an AXI bus;
the auxiliary selection module selects a reconfigurable module which is required to be enabled currently, and shields the reconfigurable module which does not work currently; a multiplexer is arranged in the auxiliary selection module, and only outputs one enabling signal and shields enabling signals of the other two reconfigurable modules at the same time;
the processor end comprises a fault interrupt responder, a fault processor and an error correction result judging device;
the fault interrupt responder responds to the fault signal transmitted to the processor end by the logic end and interrupts the current fault detection process of the processor end; providing a fast response fault for the cooperative processing and entering a fault repair state;
the fault processor is used for self-repairing the reconfigurable module to enable the reconfigurable module to recover to a normal working state;
the error correction result judging device starts timing to judge whether the reconfigurable module at the logic end fails again after the configuration information of the reconfigurable module is refreshed and repaired through the processor end, and if the fault signal is transmitted continuously, the last repair is considered to be invalid repair and enters a permanent fault repair state;
and in the fault processing state, the processor side reads the correct configuration information from the external memory, reads the configuration information from the external memory to an ICAP interface through the HWIAP controller, and completes the automatic configuration refreshing of the configuration information through the ICAP interface.
2. The reconfigurable circuit for cooperative processing of software and hardware according to claim 1, wherein three reconfigurable modules with the same function are divided into a working module and two cold backup modules according to working states; the reconfigurable module in the working state is a working module and is used for outputting the real-time logic function result; the two cold backup modules are in a standby state after being electrified and are used for replacing logic functions when the working modules have permanent faults.
3. A self-repairing method for a reconfigurable circuit by cooperative processing of software and hardware is characterized by comprising self-repairing of a logic end and self-repairing of a processing end, wherein the classification cooperative processing of the logic end and the processing end is completed to quickly respond to faults, the logic end detects faults of a user logic layer through dual-mode comparison, and the processing end continuously monitors and responds to fault signals output by the logic end; if the fault signal is output, the processor rapidly enters a fault response stage; otherwise, the processor end carries out the cyclic self-detection of the faults in real time;
(41) The implementation process of the logic end self-repairing is as follows:
(411) After the system initialization is completed, the first reconfigurable module at the logic end enters a dual-mode comparison work and fault detection state, and whether faults occur or not is continuously monitored; when the dual-mode comparison result is that a fault occurs, enabling the cold backup functional module to output a fault signal at the same time, and outputting the fault signal to the processor end;
(412) After enabling the cold backup function module, judging whether a fault module is positioned at the moment; if the fault module is located, shielding the fault module, outputting a correct result through the correct two functional modules, and recovering to a normal working state in time, and waiting for refreshing of the module to recover the fault module to be normal; if the fault module cannot be positioned, immediately shielding all three functional modules, avoiding that the error output influences the operation of the system, and waiting for refreshing and repairing the reconfigurable module;
(413) After the refresh repair is completed, whether the temporary fault repair is needed to be judged, if so, the operation state of the first reconfigurable module is continuously restored; if the fault is a permanent fault, entering a working state of a second reconfigurable module;
(42) The implementation process of the self-repairing of the processor end is as follows:
(421) After the system is initialized, the processor end continuously compares the configuration information through readback and detects faults of the configuration information; in the cyclic detection process, when a fault signal of a logic end is transmitted to a processor end, triggering a fault interrupt responder, and refreshing and repairing the fault of a reconfigurable module by the processor end through the configuration of a fault processor;
(422) After the repair is finished, starting timing, if the fault interrupt responder is triggered again within a specified time, indicating that the last fault is not repaired, entering a state that the processor replaces the repaired reconfigurable module, and simultaneously changing the configuration information of the reconfigurable module to be detected by the processor; if the fault interrupt responder is not triggered again within the set time, the last repair is the effective repair, and the processor fault detection state is returned;
the dual-mode comparison work and fault detection states are: when no fault occurs to the functional module, selecting any functional module in the two hot backups to output a result; when the functional module fails, a failure signal is output through dual-mode comparison, a cold backup functional module is started to form a triple-mode redundancy structure, and a correct result is output through a voter to prevent the incorrect result from affecting the function of the system for a long time;
the dual mode comparison operation is: when the first functional module and the second functional module are used as hot backup functional modules, the hot backup functional modules work after the system is electrified, and the output result is judged by a judging device; the third functional module is used as a cold backup functional module, is not enabled after the system is powered on, and has no signal output inside; at this time, only the first functional module and the second functional module need to be compared, if no fault exists, the output results of the first functional module and the second functional module are consistent, and the system continues to operate normally; if any one of the first functional module and the second functional module is turned over by single event, the selector outputs an error signal at this time, and the error signal enables the third functional module.
4. The software and hardware co-processing reconfigurable circuit self-repairing method of claim 3, wherein when the processor finds that the first reconfigurable module has a permanent fault, the processor shields the first reconfigurable module by adopting a replacement refreshing mode to enable the second reconfigurable module;
when the processor finds that the second reconfigurable module has a permanent fault, the second reconfigurable module is shielded by adopting a replacement refreshing mode, so that a third reconfigurable module is enabled, and the second reconfigurable module is shielded; at this time, if the third reconfigurable module fails permanently, the entire system fails.
CN202111113334.6A 2021-09-23 2021-09-23 Reconfigurable circuit for software and hardware cooperative processing and self-repairing method thereof Active CN113836079B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111113334.6A CN113836079B (en) 2021-09-23 2021-09-23 Reconfigurable circuit for software and hardware cooperative processing and self-repairing method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111113334.6A CN113836079B (en) 2021-09-23 2021-09-23 Reconfigurable circuit for software and hardware cooperative processing and self-repairing method thereof

Publications (2)

Publication Number Publication Date
CN113836079A CN113836079A (en) 2021-12-24
CN113836079B true CN113836079B (en) 2024-03-19

Family

ID=78969156

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111113334.6A Active CN113836079B (en) 2021-09-23 2021-09-23 Reconfigurable circuit for software and hardware cooperative processing and self-repairing method thereof

Country Status (1)

Country Link
CN (1) CN113836079B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101930052A (en) * 2010-07-21 2010-12-29 电子科技大学 Online detection fault-tolerance system of FPGA (Field programmable Gate Array) digital sequential circuit of SRAM (Static Random Access Memory) type and method
CN112269686A (en) * 2020-10-29 2021-01-26 南京航空航天大学 LUTRAM self-repairing structure and method based on cold backup dual-mode error detection code

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101930052A (en) * 2010-07-21 2010-12-29 电子科技大学 Online detection fault-tolerance system of FPGA (Field programmable Gate Array) digital sequential circuit of SRAM (Static Random Access Memory) type and method
CN112269686A (en) * 2020-10-29 2021-01-26 南京航空航天大学 LUTRAM self-repairing structure and method based on cold backup dual-mode error detection code

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Software Redundancy Implementation Strategy in Reconfigurable Hardware Framework;Şinca Răzvan 等;2019 8th International Conference on Modern Power Systems;20191231;1-6 *
张砦 等.航空学报.2021,第42卷(第7期),1-12. *

Also Published As

Publication number Publication date
CN113836079A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
CN101930052B (en) Online detection fault-tolerance system of FPGA (Field programmable Gate Array) digital sequential circuit of SRAM (Static Random Access Memory) type and method
CN104731670B (en) A kind of rotation formula spaceborne computer tolerant system towards satellite
CN100375044C (en) Information processing apparatus and control method therefor
CN111352338B (en) Dual-redundancy flight control computer and redundancy management method
US10761925B2 (en) Multi-channel network-on-a-chip
US10078565B1 (en) Error recovery for redundant processing circuits
CN112667450B (en) Dynamically configurable fault-tolerant system with multi-core processor
CN102331786A (en) Dual-computer cold-standby system of attitude and orbit control computer
CN107347018A (en) A kind of triple redundance 1553B bus dynamic switching methods
CN102521066A (en) On-board computer space environment event fault tolerance method
CN110955571B (en) Fault management system for functional safety of vehicle-specification-level chip
CN109634171B (en) Dual-core dual-lock-step two-out-of-two framework and safety platform thereof
US9952579B2 (en) Control device
JP2008097164A (en) Fault monitoring method for system composed of a plurality of function element
CN108958987B (en) Low-orbit small satellite fault-tolerant system and method
CN109669823A (en) Anti- Multiple-bit upsets error chip reinforcement means based on modified triple-modular redundancy system
CN102404139A (en) Method for increasing fault tolerance performance of application level of fault tolerance server
CN113836079B (en) Reconfigurable circuit for software and hardware cooperative processing and self-repairing method thereof
CN107807902B (en) FPGA dynamic reconfiguration controller resisting single event effect
CN105607974A (en) High-reliability multicore processing system
Lahrach et al. A novel SRAM-based FPGA architecture for defect and fault tolerance of configurable logic blocks
CN105589768A (en) Self-healing fault-tolerant computer system
CN111856991B (en) Signal processing system and method with five-level protection on single event upset
CN104299301A (en) Nonporous electronic control security door fault-tolerant control system
JP3139738B2 (en) Logic circuit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant