CN107992376B

CN107992376B - Active fault tolerance method and device for data storage of DSP (digital Signal processor)

Info

Publication number: CN107992376B
Application number: CN201711192783.8A
Authority: CN
Inventors: 曹辉; 何卫强; 于飞
Original assignee: Xian Microelectronics Technology Institute
Current assignee: Xian Microelectronics Technology Institute
Priority date: 2017-11-24
Filing date: 2017-11-24
Publication date: 2020-10-30
Anticipated expiration: 2037-11-24
Also published as: CN107992376A

Abstract

The invention provides an active fault-tolerant method and device for a data memory of a DSP (digital signal processor). the device is arranged between a DSP processor core assembly line and a data memory in a core and is used for the active fault-tolerant refreshing of the data memory; the system comprises a LOAD instruction decoding module for loading a data memory, a STORE instruction decoding module for writing the data memory, a queue access module, an RSEC instruction decoding module, the data memory, a data error correction and detection module, a general register file, an error correctable state register, a circular Record queue, a data memory write operation module and an interrupt processing module for hard interrupt processing; by proper pipeline division, the frequency performance of the DSP processor is not affected basically. The invention can flexibly control the processing strategy and opportunity of the hardware to the fault tolerance, meet the system reliability with lower cost and ensure the execution efficiency of the DSP processor under the error abnormal condition.

Description

Active fault tolerance method and device for data storage of DSP (digital Signal processor)

Technical Field

The invention belongs to the technical field of microelectronics, and relates to a high-reliability and high-performance fault-tolerant structure of a processor, in particular to an active fault-tolerant method and device for a data memory of a DSP (digital signal processor).

Background

Memory is the most sensitive component in modern processors. In particular, as semiconductor manufacturing processes continue to advance, the feature sizes of integrated circuits shrink dramatically. On the one hand, the ever decreasing supply voltage, ever increasing operating frequency, ever decreasing node capacitance and high-speed increasing chip transistor capacity in nano-integrated circuits make memory cells increasingly sensitive to the operating environment. When the memory circuit is affected by high-energy particle impact, power supply noise, electromagnetic influence or cosmic ray, the content stored in the memory cell of the chip is damaged transiently or permanently. On the other hand, on-chip memories integrate a large number of transistors, often occupying a large amount of area in the overall processor. The large number of transistors and area increases the likelihood of memory disturb errors, reducing the overall reliability of the device. Therefore, in order to improve the reliability of the processor, the on-chip memory is reinforced in a targeted manner, which is an important reliability issue in designing the ASIC and the processor. Furthermore, in many processors where reliability requirements are not high, only on-chip memory is consolidated to improve reliability.

The reliability strengthening design of the memory comprises strengthening measures of a process level, a device layout level and a system level. The system level reinforcement has a higher protection level, and is not related to a specific implementation process, so that the system level reinforcement is a more common reinforcement measure. The system level reinforcement measures comprise the technologies of adopting parity check codes or ECC check codes to carry out error check coding protection, increasing redundant storage ranks to carry out built-in self repair, closing idle storage units, writing back data blocks in advance, isolating fault areas and the like. The multi-core DSP processor has the characteristics of real-time performance and high throughput rate due to the application fields of image processing, signal processing and the like facing data intensive processing and exchange.

In the literature currently disclosed, most of them are aimed at fault-tolerant reinforcement of the memories in electronic devices of the Central Processing Unit (CPU) type, and there are few references to measures for fault-tolerance of the memories in electronic devices of the Digital Signal Processor (DSP) type. A data storage strengthening method is disclosed in the document Gaisler,2002, wherein A portable and fault-tolerant microprocessor based on the spark v8 architecture. When data is accessed, the data is "parity" and if the data is checked for errors, the data is "voided" and re-accessed from external memory. The method is a passive fault-tolerant method, and is feasible for the integration of a single core on a Chip or a few processor cores of a shared memory, but for a processor based on Network-on-Chip (NoC) interconnection, the DSP processor has larger waiting delay when the external memory is accessed through the Network on the Chip. In addition, the DSP processor is oriented to data stream processing, and once data is wrong, the external memory is accessed, which may cause the DSP data stream to be disconnected, which is not favorable for improving the performance of the DSP processor. This document also discloses a pipeline processing method that, when a correctable error is detected in an operand read from a memory, clears the pipeline and writes the corrected data back to the memory. When the pipeline is long, the error checking and correcting process of the data is positioned before the write-back operation, and more instructions need to be cleared. However, if the subsequent instruction is not associated with the current instruction result, execution may continue. Simple flushing of the pipeline "wastes" instructions that have entered the pipeline.

Most DSP processors have only one level of memory structure, such as "MSC 8102Technical Data" published in 2005 by Freescale, Trimedia TM-1300 "published in 2000 by Philips, and" TMS320C6000 CPU and Instruction Set Reference Guide "published in 2000 by Texas Instruments, which do not update write back to the Data memory in time, resulting in cumulative effects of memory errors. If the on-chip primary storage is disabled to avoid error events, and the data is fetched from the external memory, although the correct data can be obtained, the memory access delay is large, and the performance of the DSP processor is not good.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an active fault-tolerant method and device for a data memory of a DSP processor, which can realize the 'active' error correction write-back after the error occurs to the storage content of the data memory under the condition of not interrupting an instruction production line and 'instruction rollback'; reloading after data access errors can be avoided, and therefore execution efficiency of the processor under the error exception condition is guaranteed.

The invention is realized by the following technical scheme:

the active fault-tolerant device of the DSP processor data memory is arranged between a DSP processor core assembly line and the data memory in the core and is used for actively fault-tolerant refreshing the data memory; the system comprises a LOAD instruction decoding module for loading a data memory, a STORE instruction decoding module for writing the data memory, a queue access module, an RSEC instruction decoding module, the data memory, a data error correction and detection module, a general register file, an error correctable state register, a circular Record queue, a data memory write operation module and an interrupt processing module for hard interrupt processing;

the LOAD instruction decoding is used for receiving a DSP processor program instruction, judging whether the current program instruction is a LOAD instruction or not, and outputting a decoding control logic of the LOAD instruction to the data memory; the STORE instruction decoding is used for receiving a DSP processor program instruction, judging whether the current program instruction is the STORE instruction or not, and outputting a decoding control logic of the STORE instruction to the data memory through the circulating Record queue, the queue access module and the data memory write operation module in sequence; the RSEC instruction decoding module receives a DSP processor program instruction, judges whether the current program instruction is the RSEC instruction or not, and outputs decoding control logic of the RSEC instruction to a circulating Record queue through the queue access module; the input of the data error correction module is data and data check codes output by the data memory, and the output is the error state of the current access data and the correctable correct data and check codes; the register file is used for receiving correctable data output by the data error detection and correction module and storing data accessed by the LOAD instruction processed by the data error detection and correction module; the correctable error state register is used for connecting the data error correction and detection module to sample the error state of the current data; the data memory write operation module outputs a write control signal of the data memory; the input of the interrupt processing module is a hardware interrupt request signal of the DSP processor, and the output is connected with the RSEC instruction decoding module through a data correctable error interrupt service program.

Preferably, the data error detection and correction module receives parallel input data and a check code from the data memory to complete the error detection and correction function of the read data; outputting Single _ Error state information which can correct errors and Multiple _ Error state information which can not correct errors, data after Error correction and Error detection and check code information; the correctable Error Single _ Error state is connected to the correctable Error state register and the interrupt flag register; the data after error correction and detection are connected to the circular Record queue and the destination register file; the check code after error correction and detection is connected to the circular Record queue;

the error correctable state register stores error correctable state information and identifies whether the current DSP processor enters a data error correctable state or not; the correctable Error state register receives correctable Error state information Single _ Error from the data Error correction and detection module and correctable Error state clearing control of the RSEC instruction decoding module; when the Single _ Error is effective, setting a correctable Error state register; when the correctable error state clearing control signal is effective, the correctable error state register is cleared; when the Single _ Error and the Error-correctable state clearing control signal are simultaneously effective, the Single _ Error control signal has higher priority, and the Error-correctable state register is cleared; the sequential plurality of data correctable error events repeatedly sets the correctable error status register before the correctable error status bit is cleared.

Preferably, the circular Record queue comprises a Record queue and a queue status register;

the queue status register is used for marking the current status of the queue, including whether the queue overflows and whether the queue is processed to the tail of the queue; the depth number of the circular Record queue is larger than the delay number of the interrupt response and the sum of the number of the pipeline sections between the pipeline decoding section and the pipeline section where the data error correction module is located; the circular Record queue has two parts of inputs, wherein the first part inputs error detection and correction data and error correction codes from a data error detection and correction module and Control information Core _ Read _ Control decoded by a current LOAD instruction on a DSP processor pipeline, and the second part inputs data and check codes from STORE instruction decoding output and Control information Core _ Write _ Control of a current memory Write operation instruction;

if the data error correction and detection module detects that the currently accessed data has correctable errors, writing the first part of input information into a circular Record queue, and inputting the correctable error state information into an correctable error state register and an updating queue state register; under the condition that the error correctable status register is set, if the operation of the STORE instruction exists in the subsequent operation instruction, inputting the second part into the circular Record queue and updating the queue status register;

the content of each Record of the circular Record queue comprises access parallelism Pon, a parallel access base address BaseAddr, a word/byte mode BW, and a check code which can correct the data after error correction or write back the data and the data; and the data memory write operation module writes the data and the check code of the STORE instruction back to the data memory, or updates and corrects the data and the check code in the Record back to the data memory.

Preferably, the interrupt processing module connects one of the hardware interrupt requests to an error correctable interrupt of the currently accessed data, and the error correctable interrupt of the data memory has a higher interrupt priority; the interrupt processing module comprises hardware interrupt processing logic, an interrupt marking register, an interrupt enabling register, an interrupt vector table and a hard interrupt service program area; the hardware interrupt processing logic interrupts the current DSP processor pipeline to the pipeline of the normally executed program, and jumps to the interrupt vector table to obtain the entry of the hardware interrupt service program; the interrupt marking register and the interrupt enabling register are matched with the hardware interrupt processing logic to enter interrupt service, under the condition that the interrupt enabling register effectively enables the data correctable error interrupt, the hardware interrupt processing logic judges whether the data correctable error interrupt marking of the interrupt marking register is effective or not, and under the condition that the correctable error interrupt marking is effective, the DSP processor enters a data correctable error processing service program; the correctable Error state output Single _ Error of the data Error correction and detection module is connected to the data correctable Error interrupt position of the interrupt flag register; the interrupt flag register samples and records the data correctable error state signal; the hard interrupt service program area is entered from an entry specified by an interrupt vector table of the hard interrupt process; when the data correctable error interrupt occurs and the enabling is effective, the DSP processor jumps to a data correctable error interrupt service program from a data correctable error interrupt processing inlet in the interrupt vector table.

Preferably, the RSEC instruction decoding module accesses one entry in the circular Record queue, outputs the entry to the data memory write operation module, and updates the data and the check bits stored in the Record entry back to the data memory through the data memory write operation module.

The active fault-tolerant method for the DSP processor data memory comprises the following steps:

step 1, initializing a DSP processor, and starting a correctable error interrupt response; enabling the DSP processor to respond to correctable hardware interrupts when correctable errors of the data memory occur;

step 2, accessing data from the data memory according to the LOAD instruction, and checking the error state of the data according to the expansion of the LOAD instruction; performing data correctable error/uncorrectable error state processing on different error states in the LOAD instruction execution process;

executing step 3 after the data can be corrected and processed by mistake; under the condition that uncorrectable errors of data occur, generating an uncorrectable error interrupt signal, finishing the execution of the current instruction of the DSP processor, and processing other instructions;

step 3, under the condition that data can be corrected wrongly, judging whether the STORE instruction exists in the current processor assembly line and the subsequent program instruction execution process; if yes, continuing the step 4; otherwise, turning to the step 5 for processing;

step 4, executing the STORE instruction, and according to the expansion operation of the STORE instruction, writing the data of the STORE instruction into the memory, and simultaneously recording the control information and the data of the STORE instruction into a circular Record queue;

step 5, the processor responds to the correctable error hardware interrupt; triggering hardware interrupt of a processor by correcting data errors, and entering data error correction interrupt service program for processing;

step 6, in the interrupt service program, a loop call instruction RSEC instruction processes the records in the loop Record queue until the queue is read empty; after correctable errors are corrected, writing data by the memory of the STORE instruction to refresh the data memory, and finishing updating corrected correctable errors back to the data memory;

step 7, the error interrupt can be corrected and returned;

and 8, normally operating the DSP processor.

Preferably, the LOAD instruction processing method expanded in step 2 includes,

a-1, accessing an instruction by a DSP processor;

a-2, judging whether an LODA instruction exists or not according to the instruction operation code; is a LOAD instruction, process step A-3; otherwise, processing step A-12;

a-3, decoding a LOAD instruction;

a-4, accessing a data memory to acquire data;

step A-5, data are sent to a data Error correction and detection module for Error correction and detection, and Single _ Error and Multiple _ Error which can correct errors are judged;

step A-6, whether an uncorrectable Error happens or not; if yes, turning to the step A-10 for processing; otherwise, turning to the step A-7;

step A-7, writing the data output by the data error detection and correction module into a target register group according to the target register index of the LOAD instruction;

step A-8, judging whether a correctable Error Single _ Error occurs; if yes, setting the error correctable register, and processing the step A-9; otherwise, turning to the step A-11;

step A-9, writing the error-corrected data output by the data error correction and detection module, the check code and the access control information on the LOAD instruction pipeline into a circular Record queue in sequence; turning to step A-11;

step A-10, generating an uncorrectable multi-bit error and generating an uncorrectable error interrupt signal;

step A-11, finishing the current instruction execution of the DSP processor;

step A-12, the DSP processor performs other instruction processing.

Preferably, the method for processing the extended STORE in step 4 includes:

b-1, fetching an instruction by the DSP processor;

b-2, judging whether the STORE instruction exists or not according to the instruction operation code; is a STORE instruction, process step B-3; otherwise, turning to the step B-9;

step B-3, decoding the STORE instruction; reading an operand from a register set;

b-4, generating an operand check code;

b-5, judging whether the current error correctable state register is set; if yes, processing the step B-6; otherwise, turning to the step B-7;

b-6, sequentially writing the operand data, the operand check code and the access control information on the STORE instruction pipeline into a circular Record queue;

b-7, writing the operand data and the operand check code into a data memory;

and step B-8, finishing the current instruction execution of the DSP processor.

And step B-9, the DSP processor performs other instruction processing.

Further, in the step a-9 and the step B-6, the method for circulating Record queue records is as follows:

step E-1, if Single _ Error is true, writing the data and check code corrected by Error correction method and the read data base address, parallelism and word/byte access mode of the data memory into the tail of the queue, and turning to step E-3; otherwise, processing step E-2;

step E-2, if the current correctable error state register is Valid and the Write data operation Valid Write _ Valid of the memory Write operation instruction STORE is Valid, writing the data and the check code generated according to the coding method, the Write data base address, the parallelism and the word/byte access mode BW of the current data memory into the tail of the queue, and turning to step E-3; otherwise, turning to the step E-3;

and E-3, detecting whether the queue overflows or not, and setting corresponding status bits.

Preferably, in step 6, when the loop call instruction RSEC instructs to process the Record in the loop Record queue, the RESC instructs to read a Record from the head of the loop Record queue, and update the data in the Record back to the data storage according to the access control information; the operation method of the RESC instruction comprises the following steps:

step C-1, instruction fetching;

step C-2, judging whether an RESC instruction exists or not according to the instruction operation code; is the RESC instruction, process step C-3; otherwise, turning to the step C-9;

step C-3, decoding the RESC instruction, and outputting a control signal to a queue access module;

step C-4, the queue access module reads a Record (head) from the head of the circular Record queue;

step C-5, sending the data and control information of record (head) to a data memory write operation module;

step C-6, writing the operand data and the operand check code into a data memory;

c-7, clearing the correctable error identification register;

c-8, finishing the instruction execution;

step C-9, other instruction processing.

Compared with the prior art, the invention has the following beneficial technical effects:

the fault-tolerant device of the DSP processor data memory of the invention places the data error correction and detection process on the DSP processor core pipeline, and basically does not influence the frequency performance of the DSP processor through proper pipeline division. Conventional approaches place data error correction and detection within the memory system, increasing the access latency of the memory data.

The invention provides a fault-tolerant method for a DSP processor data memory, which combines software and hardware of an interrupt service program, can flexibly control the processing strategy and the time of the hardware to the fault tolerance according to the reliability index of a system by combining the service program, and meets the reliability of the system with lower cost. After correctable errors of the access content of the data memory are detected, the DSP processor pipeline can actively correct errors, and corrected data are timely refreshed and written back to the data memory; the method avoids the condition that the cache miss is generated after the data content is in error and the corresponding whole block of data needs to be reloaded from the external memory or the next-level memory, thereby ensuring the execution efficiency of the DSP under the abnormal error condition.

Drawings

FIG. 1 is a diagram of a DSP processor data storage fault tolerant device of the present invention;

FIG. 2 is a process flow of the data memory read operation instruction LOAD according to the present invention;

FIG. 3 is a flow chart illustrating the processing of a data memory write command STORE according to the present invention;

FIG. 4 is a process flow of a queue access instruction RSEC instruction of the present invention;

FIG. 5 is a diagram illustrating an entry queue for correctable errors and memory write operations according to the present invention;

FIG. 6 is a flow chart of a fault-tolerant processing method for a processor data storage according to the present invention.

Detailed Description

The present invention will now be described in further detail with reference to specific examples, which are intended to be illustrative, but not limiting, of the invention.

The invention provides a fault-tolerant device and a fault-tolerant method for a DSP processor data memory, which are oriented to the field of high-reliability processor design and provide hardware structure support and a software solution for the application-oriented processor reliability design.

The invention discloses an active fault-tolerant device of a DSP processor data memory, which is a circuit arranged between a DSP processor core assembly line and a core data memory and used for actively fault-tolerant refreshing of the data memory. The active fault-tolerant device comprises the following circuit modules: the system comprises a LOAD instruction decoding module for loading a data memory, a STORE instruction decoding module for writing the data memory, a queue access module, a queue access write-back instruction decoding module (RSEC instruction decoding module), the data memory, a data error correction and detection module, a general register file, a correctable error state register for correctable error data storage, a circular Record queue, a data memory write operation module and an interrupt processing module for hard interrupt processing.

The connection relationship of each module is as follows: and the LOAD instruction decoding receives the DSP processor program instruction, judges whether the current program instruction is the LOAD instruction or not, and outputs the decoding control logic of the LOAD instruction to the data memory. The STORE instruction decoding receives the DSP processor program instruction, judges whether the current program instruction is the STORE instruction or not, and outputs the decoding control logic of the STORE instruction to the data memory. The RSEC instruction decoding module receives a DSP processor program instruction, judges whether the current program instruction is the RSEC instruction or not, and outputs decoding control logic of the RSEC instruction to the circular Record queue. The input of the data memory is data reading control of a LOAD instruction and the output is data and data check codes accessed under the control of the LOAD instruction; the data memory also inputs the data memory write operation logic of the data memory write operation module, and outputs write control information and write data. The input of the data error correction module is data and data check code output by the data memory, and the output is the error state of the current access data and the correctable correct data and check code. The register file receives correctable data output by the data error correction and detection module and stores data accessed by the LOAD instruction processed by the data error correction and detection module; the register file outputs the data that the STORE instruction is to write to the data memory. The error state of the current data sampled by the error state register can be corrected. The inputs to the circular Record queue are correctable data and data memory access control information for a LOAD instruction access and data memory access control information for a STORE instruction. The data memory write operation module inputs a memory write operation control signal decoded by a STORE instruction and a write operation control signal read out from the circular Record queue and outputs the memory write operation control signal as a write control signal of the data memory. The input of the hard interrupt process is a hardware interrupt request signal of the DSP processor.

Further, the data error correction and detection module receives parallel input data and a check code from the data memory to complete the error correction and detection function of the read-in data; and outputting the status information of Single _ Error and uncorrectable Error, data after Error correction and detection and check code information. The correctable Error Single _ Error state is connected to the correctable Error state register and the interrupt flag register; the data after error correction and detection are connected to the circular Record queue and the destination register file; and connecting the check code after error checking and detecting to a circular Record queue.

Further, the error correctable state register stores error correctable state information and identifies whether the current DSP processor enters a data error correctable state. The correctable Error state register receives correctable Error state information Single _ Error from the data Error correction module and correctable Error state clearing control from the RSEC instruction decoding module. When the Single _ Error is effective, setting a correctable Error state register; when the correctable error state clearing control signal is effective, the correctable error state register is cleared; when the Single _ Error and the Error correctable state clearing control signal are simultaneously effective, the Single _ Error control signal has higher priority, and the Error correctable state register is cleared. The sequential plurality of data correctable error events repeatedly sets the correctable error status register before the correctable error status bit is cleared.

The circular Record queue comprises a Record queue and a queue status register. The circular Record queue has two inputs, the first input inputs the error detection and correction data and error correction code from the data error detection and correction module and the Control information Core _ Read _ Control decoded by the current LOAD instruction on the DSP processor pipeline, and the second input inputs the data and check code from the decoded output of the STORE instruction and the Control information Core _ Write _ Control of the current memory Write operation instruction. If the data error correction and detection module detects that the currently accessed data has correctable errors, writing the first part of input information into a circular Record queue, and inputting the correctable error state information into an correctable error state register and an updating queue state register; in the case where the correctable error status register is set, if there is an operation of the STORE instruction in the subsequent operation instructions, a second portion of the inputs needs to be written into the circular Record queue and the queue status register needs to be updated.

The Control information Core _ Read _ Control of the memory Read operation instruction comprises: reading data access parallelism Pon, reading base address ReadBaseAddr, word/byte mode BW and reading data operation Valid Read _ Valid; the Control information Core _ Write _ Control of the memory Write operation instruction STORE includes: write data access parallelism Pon, Write base address WriteBaseAddr, word/byte mode BW, Write data operation Valid Write _ Valid.

The contents of each Record of the circular Record queue include: accessing the parallelism Pon, accessing the base address BaseAddr in parallel, accessing the word/byte mode BW, correcting the error-corrected data or writing back the data, and checking the code of the data.

Preferably, the circular Record queue records effective input Record information into Record in a first-in first-out mode_kIn a queue.

Further, the circular Record queue comprises a group of queue status registers. The queue status register is used to mark the current status of the queue. Including whether the queue overflows and has been processed to the end of the queue. The depth number of the circular Record queue should be greater than the delay number of the interrupt response + the number of pipeline sections between the pipeline decoding section and the pipeline section where the data error correction module is located.

Preferably, the STORE instruction decodes, detects whether the currently decoded instruction is a data memory Write operation instruction, and outputs Control information Core _ Write _ Control decoded for the STORE instruction and data and check code of the STORE, and inputs the Control information Core _ Write _ Control decoded for the STORE instruction and the data and check code of the STORE into the circular Record queue in the case that the correctable error status register is valid.

Furthermore, the RSEC instruction decoding module accesses a Record entry in the circular Record queue, outputs the Record entry to the data memory write operation module, and updates the data and the check bits stored in the Record entry back to the data memory through the data memory write operation module. Preferably, the RSEC instruction accesses the queue from the head of the circular Record queue and outputs a head Record (head) to the data memory write operation module.

The data memory write operation module generates the base address Addr, the access parallelism Pon and the control information of the byte mode BW of the data memory write operation according to the control information decoded by the STORE instruction and the control information output by the circular Record queue, and writes the data and the check code of the STORE instruction back to the data memory or updates and corrects the data and the check code in the Record of the Record back to the data memory.

Further, the interrupt processing module connects one of the hardware interrupt requests to the correctable error state interrupt of the currently accessed data, and the correctable error interrupt of the data memory has higher interrupt priority.

Preferably, the interrupt handling module includes hardware interrupt handling logic, an interrupt flag register, an interrupt enable register, an interrupt vector table, and a hard interrupt service routine area.

The hardware interrupt processing logic interrupts the current DSP processor pipeline to the pipeline of the normally executed program, and jumps to the interrupt vector table to obtain the entry of the hardware interrupt service program. Preferably, the interrupt flag register and the interrupt enable register cooperate with the hardware interrupt handling logic to enter an interrupt service, and when the interrupt enable register is enabled to enable the data correctable interrupt, the hardware interrupt handling logic determines whether the data correctable interrupt flag of the interrupt flag register is valid, and when the correctable interrupt flag is valid, the DSP processor enters a data correctable interrupt handling service program. The correctable Error state output Single _ Error of the data Error correction and detection module is connected to the data correctable Error interrupt position of the interrupt flag register; the interrupt flag register samples and records the data correctable error state signal.

Preferably, the hard interrupt service routine area is entered by an entry specified by an interrupt vector table of the hard interrupt process. When the data correctable error interrupt occurs and the enable is effective, the DSP processor jumps to the correctable error interrupt service program from the data correctable error interrupt processing inlet in the interrupt vector table.

The invention discloses an active fault-tolerant method for a DSP processor data memory, which comprises the following steps:

step 1, the DSP processor is initialized, and the error interrupt response can be corrected. Enabling the DSP processor to respond to correctable hardware interrupts when correctable errors of the data memory occur;

step 2, a data store access instruction (LOAD) will access data from the data store. And checking the error status of the data according to the extension of the LOAD instruction. Performing data correctable error/uncorrectable error state processing on different error states in the LOAD instruction execution process;

and 3, under the condition that data can be corrected wrongly, judging whether a memory write (STORE) instruction exists in the current processor pipeline and the subsequent program instruction execution process. If yes, continuing the step 4; otherwise, turning to the step 5 for processing;

and 5, responding to the correctable wrong hardware interrupt by the processor. Triggering hardware interrupt of a processor by correcting data errors, and entering data error correction interrupt service program for processing;

and 6, in the interrupt service program, processing the Record in the circular Record queue by a circular call instruction RSEC instruction until the queue is read to be empty. The corrected data is updated back to the data store. After correctable errors are corrected, the data written by the memory of the STORE instruction is refreshed back to the data memory;

step 7, the error interrupt can be corrected and returned;

and 8, normally operating the DSP processor.

The control method for the active fault tolerance of the processor data memory is used for expanding the processing method of a LOAD (LOAD) instruction of the data memory of a processor. Extended LOAD instruction processing method bits:

a-1, accessing an instruction by a DSP processor;

and step A-2, judging whether the LODA instruction exists or not according to the instruction operation code. Is a LOAD instruction, process step A-3; otherwise, processing step A-12;

a-3, decoding a LOAD instruction;

a-4, accessing a data memory to acquire data;

and step A-5, sending the data into a data Error correction and detection module to carry out Error correction and detection and judgment of Single _ Error and Multiple _ Error which can not be corrected.

And step A-6, whether an uncorrectable Error happens or not. If yes, turning to the step A-10 for processing; otherwise, turning to the step A-7;

and step A-8, judging whether a correctable Error Single _ Error occurs. If yes, setting the error correctable register, and processing the step A-9; otherwise, turning to the step A-11;

step A-10, an uncorrectable multi-bit error occurs, generating an uncorrectable error interrupt signal.

Step A-11, instruction execution ends.

Step A-12, other instruction processing.

In the step A-5 of the LOAD instruction operation flow, the data error detection and correction circuit is divided into N groups according to the data parallelism degree, each group corresponds to a 32-bit word, and data error detection and correction are carried out according to the word unit. The error checking and correcting logic uses 32-bit data bits

D＝d₃₂d₃₁d₂₀d₂₉d₂₈d₂₇d₂₆d₂₅d₂₄d₂₃d₂₂d₂₁d₂₀d₁₉d₁₈d₁₇d₁₆d₁₅d₁₄d₁₃d₁₂d₁₁d₁₀d₉d₈d₇d₆d₅d₄d₃d₂d₁

(d_{i,{1≤i≤32}}＝{0,1})

And 8-bit check bit P '═ P'₈p'₇p'₆p'₅p'₄p'₃p'₂p₁'(p_i'_,{1≤i≤8}＝{0,1})

Generating an 8-bit error identification bit P ═ P₈p₇p₆p₅p₄p₃p₂p₁(p_{i,{1≤i≤8}}＝{0,1})

The logic for generating the identification bits is:

wherein,

is an exclusive or operation; order to

The method for error correction and detection according to the P value is as follows:

(1) if P is 0000000, it indicates that the data and the check code are Error-free, Multiple _ Error is false, and Single _ Error is false;

(2) if weight is '0' and p₈p₇p₆p₅p₄p₃p₂p₁Not equal to "0000000", indicating that two bits of the data D or the check code P' have errors, Multiple _ Error ═ true; the data or the check code is not corrected;

(3) if weight is 1' and p₈p₇p₆p₅p₄p₃p₂p₁Not equal to "0000000", indicating that an Error occurs in one bit of the data D or the check code P', Single _ Error ═ true; the method for correcting the data bit error comprises the following steps:

wherein, "·" is a bitwise and operation; "+" is a bitwise OR operation;

is bitwise negation operation; d_i' i is more than or equal to 1 and less than or equal to 32 as a data bit d_iThe data bits after error correction can be corrected; k_iThe calculation method of (2) is as follows:

the method for correcting errors by using the check bits comprises the following steps:

if p is_iNot equal to '0', (1. ltoreq. i.ltoreq.8), and

then p'_jThe occurrence of a correctable error is likely,

wherein, p 'at the right end of equation'_jJ is more than or equal to 1 and less than or equal to 8, which is the result after the error correction of the check bit can be corrected; p 'of the left end of equation'_jJ is more than or equal to 1 and less than or equal to 8, which is the value before the error correction of the check bit can be corrected.

Furthermore, for the processor supporting multiple parallelism, the error detection and correction method is used for carrying out data error detection and correction by taking a word as a unit. The method for obtaining the error detection and correction states of the current parallel access data according to the parallelism Pon of the data comprises the following steps:

(1) if the data parallelism is 1, i.e. Pon 1, then Single Error s₀，Multiple_Error＝m₀(ii) a Wherein m is_iRespectively, the parallel data in word unit can correct error state, i is more than or equal to 0 and less than or equal to 3, s_iThe parallel data are in double error states by taking a word as a unit, and i is more than or equal to 0 and less than or equal to 3;

(2) if the data parallelism is 2, i.e., Pon 2, then Single Error s₀∨s₁，Multiple_Error＝m₀∨m₁(ii) a V-shaped is an on-position OR operation;

(3) if the data parallelism is 4, i.e. Pon 4, then Single Error s₀∨s₁∨s₂∨s₃，Multiple_Error＝m₀∨m₁∨m₂∨m₃(ii) a The V-shaped graph is operated according to the OR of the position.

The active fault-tolerant control method for the processor data memory is used for expanding the processing method of a data memory write (STORE) instruction of the processor. The processing method of the extended STORE comprises the following steps:

step B-1, instruction fetching;

and step B-2, judging whether the STORE instruction exists according to the instruction operation code. Is a STORE instruction, process step B-3; otherwise, turning to the step B-9;

b-4, generating an operand check code;

and step B-5, judging whether the current error correctable state register is set. If yes, processing the step B-6; otherwise, turning to the step B-7;

b-7, writing the operand data and the operand check code into a data memory;

and step B-8, finishing the instruction execution.

Step B-9, other instruction processing.

In step B-4 of the operation flow of the extended write to memory instruction STORE, the logic for generating the operand parity bits is:

wherein, the data check code is generated by taking a 32-bit data word D as a unit:

(d_{i,{1≤i≤32}}＝{0,1})

generating 8-bit check bit P '═ P'₇p'₆p'₅p'₄p'₃p'₂p₁'(p'_{i,{1≤i≤7}}＝{0,1})。

Further, for a processor supporting multiple parallelism, the check code of the data is generated in units of words by the method. The processor is provided with a parallel multi-bit error detection function by taking a word as a unit.

In the step A-9 and the step B-6, the method for circulating Record queue records comprises the following steps:

step E-1, if Single _ Error is true, writing the data D and the check code P' corrected by the Error correction method into the tail part of the queue together with the read data base address ReadBaseAddr, the parallelism Pon and the word/byte access mode BW of the data memory at this time

Record (tail) Record (Single _ Error) { D, P', readbase addr, Pon, BW }, go to step E-3; otherwise, processing step E-2;

step E-2, if the current correctable error status register is Valid and the Write data operation Valid Write _ Valid of the memory Write operation Instruction STORE is Valid, writing the data D and the check code P 'generated by the encoding method into the queue tail Record (tail) Record (Record) Record (Write _ Instruction) D, P', Write base addr, Pon, BW, together with the Write data base address Write _ base addr, parallelism Pon, and word/byte access mode BW of the current data memory, and turning to step E-3; otherwise, turning to the step E-3;

In step B-6 of the operation flow of the extended Write memory Instruction STORE, when a correctable error occurs in the data memory, after the correctable error status register is set, the control information Record (Write _ Instruction) of the Write Instruction of the subsequent data memory after the correctable error occurs is sequentially recorded in the circular Record queue. The purpose of recording Record (Write _ Instruction) is to avoid Write-after-Write correlation of the data memory when Record (Single _ Error) is written back to the data memory.

Further, in step 6, the RESC instruction reads a Record from the head of the circular Record queue, and updates the data in the Record back to the data storage according to the access control information. The operation method of the RESC instruction comprises the following steps:

step C-1, instruction fetching;

and C-2, judging whether the RESC instruction exists or not according to the instruction operation code. Is the RESC instruction, process step C-3; otherwise, turning to the step C-9;

step C-5, sending the data and control information of record (head) to a memory write operation module;

c-7, clearing the correctable error identification register;

and C-8, finishing the instruction execution.

Step C-9, other instruction processing.

Further, the memory write operation block in step C-5, like the hardware logic of step B-7 in the STORE instruction flow, multiplexes the hardware logic on the STORE instruction pipeline in the processor pipeline.

In the operation flow step C-4 of the RESC instruction, a method for reading entries from the circular Record queue includes:

f-1, detecting whether the current queue is empty or not, and juxtaposing corresponding state bits; if not, executing the step F-2, and if the queue head is empty, returning an empty queue head identifier;

and F-2, reading a head of line record (record), wherein the head of line points to the next record of the current record.

Further, in step 6, the method for processing the correctable-error interrupt service routine includes:

and D-1, judging whether the current circulating Record queue is read to be empty or not. If yes, turning to the step D-3; otherwise, processing step D-2;

and step D-2, executing the RESC instruction. Turning to the step D-1;

and D-3, ending the interrupt service and returning the interrupt.

In the preferred embodiment, the beneficial effects of the invention are illustrated by a high-performance DSP processor with a SIMD architecture. The processor employs a harvard architecture, integrating a 16KB data memory and a 16KB program memory. Wherein, the access to the data memory supports the access with the parallelism Pon ═ {1, 2, 4}, namely, 1 ×,2 × or 4 × 32bit data can be accessed in parallel every clock cycle. The instruction for accessing the data memory is a memory-register LOAD instruction LOAD; the instruction to write to the data memory is a register-memory write instruction STORE.

FIG. 1 illustrates an example of the present invention for supporting active fault tolerance of a data storage device. The invention relates to an active fault-tolerant device of a processor data memory, which is characterized in that a circuit for actively fault-tolerant refreshing of the data memory is arranged between a processor core assembly line and a data memory in a core. The fault tolerance device comprises the following circuit modules:

the LOAD instruction decoding receives the program instruction, judges whether the current instruction is the LOAD instruction or not, and outputs the decoding control logic of the LOAD instruction to the data memory. The STORE instruction decoding receives the program instruction, judges whether the current instruction is the STORE instruction or not, and outputs the decoding control logic of the STORE instruction to the data memory.

The data error checking and correcting module: the module receives N paths of parallel input data Nx 32bit data (111) and check code Nx 8bit check code 112 from a data memory, and finishes the functions of error correction and detection of read data; and outputs the status information of Single _ Error 101 and Multiple _ Error 102 of the current nxy data, the data after Error correction and detection and the check code information 103. The correctable Error Single _ Error state is connected to the correctable Error state register and the interrupt flag register; the data after error correction and detection are connected to the circular queue and the destination register file; and the check code after error correction and detection is connected to the circular queue.

A correctable error state register for storing correctable error state information identifying whether the current processor enters a data correctable error state. The correctable Error status register receives correctable Error status information Single _ Error from the data Error correction module and correctable Error status clear control 105 from the RSEC instruction decode module. When the Single _ Error is effective, setting a correctable Error state register; when the correctable error state clearing control signal is effective, the correctable error state register is cleared; when the Single _ Error and the Error correctable state clearing control signal are simultaneously effective, the Single _ Error control signal has higher priority, and the Error correctable state register is cleared. The sequential multiple data correctable error events repeatedly set the correctable error status register before the correctable error status bit is cleared.

And the circular Record queue comprises a Record queue and a queue status register. The circular Record queue has two inputs, the first input is the error correction and detection data and error correction code from the data error correction and detection module and the Control information Core _ Read _ Control 106 of the current memory Read operation instruction on the processor pipeline, and the second input is the data output from the memory Write operation instruction detection module, the check code 107 and the Control information Core _ Write _ Control 108 of the current memory Write operation instruction. If the current pipeline detects that the read-in data has correctable errors, writing the first part of input information into a circular Record queue, and writing a correctable error state register and an updating queue state register; in the case where the correctable error status register is set, if there is a write operation instruction of the data memory in the subsequent operation instruction, it is necessary to input the second portion into the write circular Record queue and update the queue status register. The output of the circular queue is connected to a queue access module.

The Control information Core _ Read _ Control of the memory Read operation instruction includes: reading data access parallelism Pon, reading base address ReadBaseAddr, word/byte mode BW and reading data operation Valid Read _ Valid; the Control information Core _ Write _ Control of the memory Write operation instruction STORE includes: write data access parallelism Pon, Write base address WriteBaseAddr, word/byte mode BW, Write data operation Valid Write _ Valid. The contents of each Record of the circular Record queue include: accessing the parallelism Pon, accessing the base address Read/WriteBaseAddr in parallel, the word/byte mode BW, and correcting the error-corrected data or writing back the data, and the check code of the data;

the circular Record queue records the input Record information at each time into the Record In a First-In First-Out (FIFO) mode_kAnd (4) queues.

Queue status register 113, marks the current status of the queue. Including whether the queue overflows and has been processed to the end of the queue. The depth number of the queue should be greater than the delay number of the interrupt response plus the number of pipeline sections between the pipeline decoding section and the pipeline section where the data error correction module is located.

And decoding the STORE instruction, detecting whether the current decoded instruction is a data memory Write operation instruction, outputting Control information Core _ Write _ Control decoded for the STORE instruction and the data and check code of the STORE, and inputting the Control information Core _ Write _ Control decoded for the STORE instruction into a circular Record queue under the condition that a correctable error state register is effective.

A queue access module: this block inputs the output control 109 from the RSEC instruction decode block and the output 114 of the circular Record queue. The queue access module accesses the queue from the head of the circular Record queue and outputs a head Record (head) to the memory write operation module.

A data memory write operation module: this module inputs output record (head)110 from the queue access module. And the data memory write operation module writes the data and the error correction code in the record (record) into the corresponding data memory and the error correction and detection memory according to the base address BaseAddr in the record (record), the access parallelism Pon and the control information of the byte mode BW.

RSEC instruction decoding module: the module inputs the RSEC instruction 115 for the error correctable interrupt service routine from the DSP processor program memory and outputs the error correctable clear state control signal 105 and the queue access control signal 109.

In the invention, interrupt processing is adopted, and the update write-back of correctable errors of the data memory is completed by an interrupt service program. Because the data after Error Correction can be recorded in the circular Record queue, a special instruction rsec (Record access and Single Error Correction instruction) instruction is added in the DSP processor instruction set for continuously reading the Record (Record) at the head of the queue and completing the update and write-back of the data memory according to the Record information. This example illustrates an RSEC instruction implemented on a DSP processor.

The error-correctable interrupt service routine is entered from an entry specified by an interrupt vector table of the processor interrupt processing module. The correctable Error state output Single _ Error 102 of the data Error correction module is connected to the highest priority interrupt position of the interrupt flag register; the interrupt flag register samples and records the data correctable error state signal. When the data correctable error interrupt occurs and the enabling is effective, the processor jumps to the correctable error interrupt service program from the data correctable error interrupt processing inlet in the interrupt vector table.

According to the arrangement of the device, the processing flows of the LOAD instruction and the STORE instruction are expanded, and are respectively shown in fig. 2 and fig. 3. The method comprises the following specific steps:

the processing flow of the read data memory instruction LOAD is as follows:

step A-1: fetching an instruction;

step A-2: and judging whether the LODA instruction exists or not according to the instruction operation code. Is a LOAD instruction, process step A-3; otherwise, processing step A-12;

step A-3: decoding a LOAD instruction;

step A-4: accessing a data memory to obtain data;

step A-5: and sending the data into a data Error correction and detection module to carry out Error correction and detection and judgment of Single _ Error and Multiple _ Error which can not be corrected.

Step A-6: whether an uncorrectable Error has occurred. If yes, turning to the step A-10 for processing; otherwise, turning to the step A-7;

step A-7: writing the data output by the data error correction and detection module into a target register group according to the target register index of the LOAD instruction;

step A-8: it is determined whether a correctable Error Single Error has occurred. If yes, setting the error correctable register, and processing the step A-9; otherwise, turning to the step A-11;

step A-9: sequentially writing the error-corrected data output by the data error correction and detection module, the check code and the access control information on the LOAD instruction pipeline into a circular Record queue; turning to step A-11;

step A-10: an uncorrectable multi-bit error occurs, generating an uncorrectable error interrupt signal.

Step A-11: the instruction execution ends.

Step A-12: other instructions are processed.

The processing flow of the extended STORE instruction STORE is as follows:

step B-1: fetching an instruction;

step B-2: and judging whether the STORE instruction exists or not according to the instruction operation code. Is a STORE instruction, process step B-3; otherwise, turning to the step B-9;

step B-3: STORE instruction decoding; reading an operand from a register set;

step B-4: generating an operand check code;

step B-5: and judging whether the current error correctable state register is set or not. If yes, processing the step B-6; otherwise, turning to the step B-7;

step B-6: sequentially writing the operand data, the operand check code and the access control information on the STORE instruction pipeline into a circular Record queue;

step B-7: writing the operand data and the operand check code into a data memory;

step B-8: the instruction execution ends.

Step B-9: other instructions are processed.

The processor of the embodiment of the invention supports a parallel access mode of three data of 1 x, 2 x or 4 x 32 bit. The check code generation and data check of data and error correction and detection, correctable error and double error detection are all processed by using 32-bit data word as unit. Therefore, the data error detection and correction module requires four ways of parallel data error detection and correction logic.

The data error detection and correction logic of the invention is implemented according to the following algorithm: the data information bits are composed of 32-bit data bits

(d_{i,{1≤i≤32}}＝{0,1})

And 8-bit check bit P '═ P'₈p'₇p'₆p'₅p'₄p'₃p'₂p₁'(p_i'_,{1≤i≤8}0, 1).

The logic for check bit generation in the STORE instruction flow is:

the Error correction and detection logic in the LOAD instruction flow compares the 8-bit check code with the 8-bit Error flag bit to generate Error correctable Single _ Error and Error uncorrectable Multiple _ Error status signals.

8-bit error identification bit P ═ P₈p₇p₆p₅p₄p₃p₂p₁(p_{i,{1≤i≤8}}The resulting logic for {0,1}) is:

wherein,

is an exclusive or operation; order to

(1) if P is "0000000", indicating that the data and the check code are Error-free, Multiple _ Error, Single _ Error;

(3) if weight is 1' and p₈p₇p₆p₅p₄p₃p₂p₁Not equal to "0000000", indicating that an Error occurs in one bit of the data D or the check code P', Single _ Error ═ true; the method of data bit error correction is performed as described in the summary of the invention.

The processor of the invention supports the access of 128 bit data with 4 multiplied parallelism at most, and because the data carries out error detection and correction by word, the processor can support the function of error detection and correction by word for the 128 bit data. On the other hand, the error correction and detection logic of the present invention avoids the use of 64-bit or 128-bit error correction and detection logic. Because 64-bit and 128-bit error correction and detection logic requires more check bits and the encoded and decoded logic requires more area and circuit delay than 32-bit error correction and detection logic.

The invention refreshes the corrected data back to the data memory through software configuration when the correctable error (such as correctable error, which is common error in electronic devices due to single particle, in the following of the embodiment, correctable error refers to correctable error) occurs in the data through the cooperative mechanism of software and hardware. In the invention, the data and control information which can be corrected by errors are recorded through the circular Record queue and are used for refreshing the data which can be corrected by errors back to the memory. When the error correctable in the EDAC verification occurs, no matter which group of the multiple groups of parallel data has the error correctable, the error correctable state register is set, and records of the currently corrected data, the EDAC verification code and the like are written into a Record queue for write-back updating of the data memory in subsequent fault-tolerant processing. Meanwhile, in order to avoid that the correctable wrong write back can cover the subsequent write operation data to the same address, in the DSP structure, the data of the write operation instruction after the correctable wrong write operation and the write control information are also recorded in the queue in sequence. The process of logging into the queue is shown in figure 5. The processor in FIG. 5 includes a 10-stage pipeline including PF, FE, DC, EX 1-4, EDAC, and EX 6. Each section of the pipeline comprises different functional units according to the flow of the instruction. Wherein the recording of the correctable error information and the recording of the write operation instruction information occur at an EDAC stage of the pipeline. The process of logging into the queue is illustrated by the following procedure example:

in the above example, if the LOAD instruction reads from memory: the first 32 bits of the data are data bits and the second 8 bits are check bits, 0x0000000080, AC and 0x0000000004, 4b. According to the error correction and detection logic, it can be judged that the first data has correctable error, and the error bit is d₃Corrected data is 0x0000000084, AC; the second data has correctable error, and the error bit is d₁₀The corrected data is 0x0000000204, 4B. Two errors occur in the data access, but both are correctable errors. Therefore, Error correction module data can correct Error state information Single _ Error ═ true. According to the access information of the current LOAD instruction, the following information needs to be recorded in the queue:

data and check	0x0000000084，AC；0x0000000204，4B
		Reading base addresses	AR5
Degree of parallelism	2
		Word/byte pattern	1 ('1' for word mode; '0' for byte mode)

The first STORE instruction STOREs the contents of R6 at the AR5 address. Because the data of the LOAD instruction can be corrected incorrectly, the information of the STORE instruction needs to be recorded into a queue:

data and check	0x0000000288，07；
		Reading base addresses	AR5
Degree of parallelism	1
		Word/byte pattern	1

The second STORE instruction STOREs the contents of R3 at the AR6 address. Because the data of the LOAD instruction can be corrected incorrectly, the information of the STORE instruction needs to be recorded into a queue:

data and check	0x00000040，4C；
		Reading base addresses	AR6
Degree of parallelism	1
		Word/byte pattern	1

The invention refreshes the correctable error data detected in the program example back to the data memory through software. Because the information of the 'refreshing' data memory is stored in the circular Record queue hardware, a special instruction, namely an RSEC instruction, is needed to be set for accessing the queue, and the data written back by the data, the error correcting code and the control information are sent to the memory write operation module after being 'decoded', so that the write back of the data is completed.

The RSEC instruction is a 32bit instruction. Because the access entrance of the queue can be regarded as a special register on the kernel structure of the processor, the RSEC instruction coding uses the register access instruction coding format, and the coding of the RSEC instruction is realized by accessing an idle register address to generate corresponding micro-operation. The processing flow of the RSEC instruction is shown in fig. 4. The method comprises the following specific steps:

step C-1: fetching an instruction;

step C-2: and judging whether the RESC instruction exists or not according to the instruction operation code. Is the RESC instruction, process step C-3; otherwise, turning to the step C-9;

step C-3: decoding the RESC instruction, and outputting a control signal to a queue access module;

step C-4: the queue access module reads a Record (head) from the head of the circular Record queue;

step C-5: sending record (head) data and control information to a memory write operation module;

step C-6: writing the operand data and the operand check code into a data memory;

step C-7: clearing the error correctable identification register;

step C-8: the instruction execution ends.

Step C-9: other instructions are processed.

The processing flow shown in fig. 6 is a control method for active fault tolerance of a processor data storage provided by the invention, and comprises the following steps:

step 1, initializing a DSP processor, and correcting error interrupt response. Enabling the DSP processor to respond to correctable hardware interrupts when correctable errors of the data memory occur;

and 2, accessing data from the data memory by a data memory access instruction (LOAD). And checking the error status of the data according to the extension of the LOAD instruction. Performing data correctable error/uncorrectable error state processing on different error states in the LOAD instruction execution process;

and 5, responding the correctable wrong hardware interrupt by the processor. Triggering hardware interrupt of a processor by correcting data errors, and entering data error correction interrupt service program for processing;

and 6, in the interrupt service program, processing records in the circular Record queue by a circular call instruction RSEC instruction until the queue is empty. The corrected data is updated back to the data store. After the correctable and correctable errors are corrected, the data written by the memory of the STORE instruction is refreshed back to the data memory;

step 7, the error interrupt can be corrected and returned;

and 8, normally operating the DSP processor.

In an implementation example, an interrupt service routine example is designed:

in the program example, the number of loops is set to the queue depth, i.e., the number of RSEC loop executions is fixed. Therefore, the read-write operation of the data memory caused by field protection in the interrupt service program can be avoided.

The invention provides an active fault-tolerant device and method for a data storage of a DSP (digital signal processor). by combining software and hardware, the data which can be corrected by mistake is refreshed to the data storage in time, and the fault-tolerant strategies of the software and the hardware can be controlled by a more flexible method so as to meet the reliability index of a system.

Claims

The active fault-tolerant device of the DSP processor data memory is characterized by being arranged between a DSP processor core assembly line and a data memory in a core and used for actively fault-tolerant refreshing of the data memory; the system comprises a LOAD instruction decoding module for loading a data memory, a STORE instruction decoding module for writing the data memory, a queue access module, an RSEC instruction decoding module, the data memory, a data error correction and detection module, a general register file, an error correctable state register, a circular Record queue, a data memory write operation module and an interrupt processing module for hard interrupt processing;

the LOAD instruction decoding is used for receiving a DSP processor program instruction, judging whether the current program instruction is a LOAD instruction or not, and outputting a decoding control logic of the LOAD instruction to the data memory;

the STORE instruction decoding is used for receiving a DSP processor program instruction, judging whether the current program instruction is the STORE instruction or not, and outputting a decoding control logic of the STORE instruction to the data memory through the circulating Record queue, the queue access module and the data memory write operation module in sequence;

the RSEC instruction decoding module receives a DSP processor program instruction, judges whether the current program instruction is the RSEC instruction or not, and outputs decoding control logic of the RSEC instruction to a circulating Record queue through the queue access module;

the input of the data error correction module is data and data check codes output by the data memory, and the output is the error state of the current access data and the correctable correct data and check codes;

the register file is used for receiving correctable data output by the data error detection and correction module and storing data accessed by the LOAD instruction processed by the data error detection and correction module;

the correctable error state register is used for connecting the data error correction and detection module to sample the error state of the current data;

the data memory write operation module outputs a write control signal of the data memory;

the input of the interrupt processing module is a hardware interrupt request signal of the DSP processor, and the output is connected with the RSEC instruction decoding module through a data correctable error interrupt service program.
2. The active fault-tolerant device of the DSP processor data storage device of claim 1, wherein the data error correction and detection module receives parallel input data and a check code from the data storage device to complete the error correction and detection function of the read-in data; outputting Single _ Error state information which can correct errors and Multiple _ Error state information which can not correct errors, data after Error correction and Error detection and check code information;

the correctable Error Single _ Error state is connected to the correctable Error state register and the interrupt flag register; the data after error correction and detection are connected to the circular Record queue and the destination register file; the check code after error correction and detection is connected to the circular Record queue;

the error correctable state register stores error correctable state information and identifies whether the current DSP processor enters a data error correctable state or not; the correctable Error state register receives correctable Error state information Single _ Error from the data Error correction and detection module and correctable Error state clearing control of the RSEC instruction decoding module;

when the Single _ Error is effective, setting a correctable Error state register; when the correctable error state clearing control signal is effective, the correctable error state register is cleared; when the Single _ Error and the Error-correctable state clearing control signal are simultaneously effective, the Single _ Error control signal has higher priority, and the Error-correctable state register is cleared; the sequential plurality of data correctable error events repeatedly sets the correctable error status register before the correctable error status bit is cleared.
3. The active fault tolerance device of DSP processor data storage according to claim 1, wherein said circular Record queue comprises a Record queue and a queue status register;

the queue status register is used for marking the current status of the queue, including whether the queue overflows and whether the queue is processed to the tail of the queue; the depth number of the circular Record queue is larger than the delay number of the interrupt response and the sum of the number of the pipeline sections between the pipeline decoding section and the pipeline section where the data error correction module is located;

the circular Record queue has two parts of inputs, wherein the first part inputs error detection and correction data and error correction codes from a data error detection and correction module and Control information Core _ Read _ Control decoded by a current LOAD instruction on a DSP processor pipeline, and the second part inputs data and check codes from STORE instruction decoding output and Control information Core _ Write _ Control of a current memory Write operation instruction;

if the data error correction and detection module detects that the currently accessed data has correctable errors, writing the first part of input information into a circular Record queue, and inputting the correctable error state information into an correctable error state register and an updating queue state register;

under the condition that the error correctable status register is set, if the operation of the STORE instruction exists in the subsequent operation instruction, inputting the second part into the circular Record queue and updating the queue status register;

the content of each Record of the circular Record queue comprises access parallelism Pon, a parallel access base address BaseAddr, a word/byte mode BW, and a check code which can correct the data after error correction or write back the data and the data;

and the data memory write operation module writes the data and the check code of the STORE instruction back to the data memory, or updates and corrects the data and the check code in the Record back to the data memory.
4. The active fault tolerance device of DSP processor data memory of claim 1 wherein the interrupt handling module connects one of the hardware interrupt requests to a correctable error state interrupt that currently accesses data, and the data memory correctable error interrupt has a higher interrupt priority;

the interrupt processing module comprises hardware interrupt processing logic, an interrupt marking register, an interrupt enabling register, an interrupt vector table and a hard interrupt service program area;

the hardware interrupt processing logic interrupts the current DSP processor pipeline to the pipeline of the normally executed program, and jumps to the interrupt vector table to obtain the entry of the hardware interrupt service program;

the interrupt marking register and the interrupt enabling register are matched with the hardware interrupt processing logic to enter interrupt service, under the condition that the interrupt enabling register effectively enables the data correctable error interrupt, the hardware interrupt processing logic judges whether the data correctable error interrupt marking of the interrupt marking register is effective or not, and under the condition that the correctable error interrupt marking is effective, the DSP processor enters a data correctable error processing service program; the correctable Error state output Single _ Error of the data Error correction and detection module is connected to the data correctable Error interrupt position of the interrupt flag register; the interrupt flag register samples and records the data correctable error state signal;

the hard interrupt service program area is entered from an entry specified by an interrupt vector table of the hard interrupt process; when the data correctable error interrupt occurs and the enabling is effective, the DSP processor jumps to a data correctable error interrupt service program from a data correctable error interrupt processing inlet in the interrupt vector table.
5. The active fault tolerance device of DSP processor data memory of claim 1 wherein the RSEC instruction decode module accesses a entry in the circular Record queue and outputs the entry to the data memory write operation module, and updates the data and parity bits stored in the Record entry back to the data memory through the data memory write operation module.
An active fault tolerance method for a data storage of a DSP processor, comprising the steps of:

step 1, initializing a DSP processor, and starting a correctable error interrupt response; enabling the DSP processor to respond to correctable hardware interrupts when correctable errors of the data memory occur;

step 2, accessing data from the data memory according to the LOAD instruction, and checking the error state of the data according to the expansion of the LOAD instruction; performing data correctable error/uncorrectable error state processing on different error states in the LOAD instruction execution process;

executing step 3 after the data can be corrected and processed by mistake; under the condition that uncorrectable errors of data occur, generating an uncorrectable error interrupt signal, finishing the execution of the current instruction of the DSP processor, and processing other instructions;

step 3, under the condition that data can be corrected wrongly, judging whether the STORE instruction exists in the current processor assembly line and the subsequent program instruction execution process; if yes, continuing the step 4; otherwise, turning to the step 5 for processing;

step 4, executing the STORE instruction, and according to the expansion operation of the STORE instruction, writing the data of the STORE instruction into the memory, and simultaneously recording the control information and the data of the STORE instruction into a circular Record queue;

step 5, the processor responds to the correctable error hardware interrupt; triggering hardware interrupt of a processor by correcting data errors, and entering data error correction interrupt service program for processing;

step 6, in the interrupt service program, a loop call instruction RSEC instruction processes the records in the loop Record queue until the queue is read empty; after correctable errors are corrected, writing data by the memory of the STORE instruction to refresh the data memory, and finishing updating corrected correctable errors back to the data memory;

step 7, the error interrupt can be corrected and returned;

and 8, normally operating the DSP processor.
7. The active fault tolerance method of DSP processor data storage according to claim 6, wherein the LOAD instruction processing method expanded in step 2 comprises,

a-1, accessing an instruction by a DSP processor;

a-2, judging whether an LODA instruction exists or not according to the instruction operation code; is a LOAD instruction, process step A-3; otherwise, processing step A-12;

a-3, decoding a LOAD instruction;

a-4, accessing a data memory to acquire data;

step A-5, data are sent to a data Error correction and detection module for Error correction and detection, and Single _ Error and Multiple _ Error which can correct errors are judged;

step A-6, whether an uncorrectable Error happens or not; if yes, turning to the step A-10 for processing; otherwise, turning to the step A-7;

step A-7, writing the data output by the data error detection and correction module into a target register group according to the target register index of the LOAD instruction;

step A-8, judging whether a correctable Error Single _ Error occurs; if yes, setting the error correctable register, and processing the step A-9; otherwise, turning to the step A-11;

step A-9, writing the error-corrected data output by the data error correction and detection module, the check code and the access control information on the LOAD instruction pipeline into a circular Record queue in sequence; turning to step A-11;

step A-10, generating an uncorrectable multi-bit error and generating an uncorrectable error interrupt signal;

step A-11, finishing the current instruction execution of the DSP processor;

step A-12, the DSP processor performs other instruction processing.
8. The active fault-tolerant method for data storage of DSP processor according to claim 6, wherein the processing method of extended STORE in step 4 comprises:

b-1, fetching an instruction by the DSP processor;

b-2, judging whether the STORE instruction exists or not according to the instruction operation code; is a STORE instruction, process step B-3; otherwise, turning to the step B-9;

step B-3, decoding the STORE instruction; reading an operand from a register set;

b-4, generating an operand check code;

b-5, judging whether the current error correctable state register is set; if yes, processing the step B-6; otherwise, turning to the step B-7;

b-6, sequentially writing the operand data, the operand check code and the access control information on the STORE instruction pipeline into a circular Record queue;

b-7, writing the operand data and the operand check code into a data memory;

b-8, finishing the execution of the current instruction of the DSP processor;

and step B-9, the DSP processor performs other instruction processing.
9. The active fault tolerance method for data storage of DSP processor according to claim 7 or 8, wherein in step A-9 or step B-6, the method for circulating Record queue records is:

step E-1, if Single _ Error is true, writing the data and check code corrected by Error correction method and the read data base address, parallelism and word/byte access mode of the data memory into the tail of the queue, and turning to step E-3; otherwise, processing step E-2;

step E-2, if the current correctable error state register is Valid and the Write data operation Valid Write _ Valid of the memory Write operation instruction STORE is Valid, writing the data and the check code generated according to the coding method, the Write data base address, the parallelism and the word/byte access mode BW of the current data memory into the tail of the queue, and turning to step E-3; otherwise, turning to the step E-3;

and E-3, detecting whether the queue overflows or not, and setting corresponding status bits.
10. The active fault tolerance method of DSP processor data storage according to claim 6,

in step 6, when the cyclic call instruction RSEC instructs to process the records in the cyclic Record queue, the RESC instructs to read one Record from the head of the cyclic Record queue, and the data in the Record is updated to the data storage according to the access control information; the operation method of the RESC instruction comprises the following steps:

step C-1, instruction fetching;

step C-2, judging whether an RESC instruction exists or not according to the instruction operation code; is the RESC instruction, process step C-3; otherwise, turning to the step C-9;

step C-3, decoding the RESC instruction, and outputting a control signal to a queue access module;

step C-4, the queue access module reads a Record (head) from the head of the circular Record queue;

step C-5, sending the data and control information of record (head) to a data memory write operation module;

step C-6, writing the operand data and the operand check code into a data memory;

c-7, clearing the correctable error identification register;

c-8, finishing the instruction execution;

step C-9, other instruction processing.