CN107992376B - Active fault tolerance method and device for data storage of DSP (digital Signal processor) - Google Patents

Active fault tolerance method and device for data storage of DSP (digital Signal processor) Download PDF

Info

Publication number
CN107992376B
CN107992376B CN201711192783.8A CN201711192783A CN107992376B CN 107992376 B CN107992376 B CN 107992376B CN 201711192783 A CN201711192783 A CN 201711192783A CN 107992376 B CN107992376 B CN 107992376B
Authority
CN
China
Prior art keywords
data
error
instruction
correctable
interrupt
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711192783.8A
Other languages
Chinese (zh)
Other versions
CN107992376A (en
Inventor
曹辉
何卫强
于飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Microelectronics Technology Institute
Original Assignee
Xian Microelectronics Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Microelectronics Technology Institute filed Critical Xian Microelectronics Technology Institute
Priority to CN201711192783.8A priority Critical patent/CN107992376B/en
Publication of CN107992376A publication Critical patent/CN107992376A/en
Application granted granted Critical
Publication of CN107992376B publication Critical patent/CN107992376B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)
  • Detection And Correction Of Errors (AREA)
  • Advance Control (AREA)

Abstract

The invention provides an active fault-tolerant method and device for a data memory of a DSP (digital signal processor). the device is arranged between a DSP processor core assembly line and a data memory in a core and is used for the active fault-tolerant refreshing of the data memory; the system comprises a LOAD instruction decoding module for loading a data memory, a STORE instruction decoding module for writing the data memory, a queue access module, an RSEC instruction decoding module, the data memory, a data error correction and detection module, a general register file, an error correctable state register, a circular Record queue, a data memory write operation module and an interrupt processing module for hard interrupt processing; by proper pipeline division, the frequency performance of the DSP processor is not affected basically. The invention can flexibly control the processing strategy and opportunity of the hardware to the fault tolerance, meet the system reliability with lower cost and ensure the execution efficiency of the DSP processor under the error abnormal condition.

Description

Active fault tolerance method and device for data storage of DSP (digital Signal processor)
Technical Field
The invention belongs to the technical field of microelectronics, and relates to a high-reliability and high-performance fault-tolerant structure of a processor, in particular to an active fault-tolerant method and device for a data memory of a DSP (digital signal processor).
Background
Memory is the most sensitive component in modern processors. In particular, as semiconductor manufacturing processes continue to advance, the feature sizes of integrated circuits shrink dramatically. On the one hand, the ever decreasing supply voltage, ever increasing operating frequency, ever decreasing node capacitance and high-speed increasing chip transistor capacity in nano-integrated circuits make memory cells increasingly sensitive to the operating environment. When the memory circuit is affected by high-energy particle impact, power supply noise, electromagnetic influence or cosmic ray, the content stored in the memory cell of the chip is damaged transiently or permanently. On the other hand, on-chip memories integrate a large number of transistors, often occupying a large amount of area in the overall processor. The large number of transistors and area increases the likelihood of memory disturb errors, reducing the overall reliability of the device. Therefore, in order to improve the reliability of the processor, the on-chip memory is reinforced in a targeted manner, which is an important reliability issue in designing the ASIC and the processor. Furthermore, in many processors where reliability requirements are not high, only on-chip memory is consolidated to improve reliability.
The reliability strengthening design of the memory comprises strengthening measures of a process level, a device layout level and a system level. The system level reinforcement has a higher protection level, and is not related to a specific implementation process, so that the system level reinforcement is a more common reinforcement measure. The system level reinforcement measures comprise the technologies of adopting parity check codes or ECC check codes to carry out error check coding protection, increasing redundant storage ranks to carry out built-in self repair, closing idle storage units, writing back data blocks in advance, isolating fault areas and the like. The multi-core DSP processor has the characteristics of real-time performance and high throughput rate due to the application fields of image processing, signal processing and the like facing data intensive processing and exchange.
In the literature currently disclosed, most of them are aimed at fault-tolerant reinforcement of the memories in electronic devices of the Central Processing Unit (CPU) type, and there are few references to measures for fault-tolerance of the memories in electronic devices of the Digital Signal Processor (DSP) type. A data storage strengthening method is disclosed in the document Gaisler,2002, wherein A portable and fault-tolerant microprocessor based on the spark v8 architecture. When data is accessed, the data is "parity" and if the data is checked for errors, the data is "voided" and re-accessed from external memory. The method is a passive fault-tolerant method, and is feasible for the integration of a single core on a Chip or a few processor cores of a shared memory, but for a processor based on Network-on-Chip (NoC) interconnection, the DSP processor has larger waiting delay when the external memory is accessed through the Network on the Chip. In addition, the DSP processor is oriented to data stream processing, and once data is wrong, the external memory is accessed, which may cause the DSP data stream to be disconnected, which is not favorable for improving the performance of the DSP processor. This document also discloses a pipeline processing method that, when a correctable error is detected in an operand read from a memory, clears the pipeline and writes the corrected data back to the memory. When the pipeline is long, the error checking and correcting process of the data is positioned before the write-back operation, and more instructions need to be cleared. However, if the subsequent instruction is not associated with the current instruction result, execution may continue. Simple flushing of the pipeline "wastes" instructions that have entered the pipeline.
Most DSP processors have only one level of memory structure, such as "MSC 8102Technical Data" published in 2005 by Freescale, Trimedia TM-1300 "published in 2000 by Philips, and" TMS320C6000 CPU and Instruction Set Reference Guide "published in 2000 by Texas Instruments, which do not update write back to the Data memory in time, resulting in cumulative effects of memory errors. If the on-chip primary storage is disabled to avoid error events, and the data is fetched from the external memory, although the correct data can be obtained, the memory access delay is large, and the performance of the DSP processor is not good.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an active fault-tolerant method and device for a data memory of a DSP processor, which can realize the 'active' error correction write-back after the error occurs to the storage content of the data memory under the condition of not interrupting an instruction production line and 'instruction rollback'; reloading after data access errors can be avoided, and therefore execution efficiency of the processor under the error exception condition is guaranteed.
The invention is realized by the following technical scheme:
the active fault-tolerant device of the DSP processor data memory is arranged between a DSP processor core assembly line and the data memory in the core and is used for actively fault-tolerant refreshing the data memory; the system comprises a LOAD instruction decoding module for loading a data memory, a STORE instruction decoding module for writing the data memory, a queue access module, an RSEC instruction decoding module, the data memory, a data error correction and detection module, a general register file, an error correctable state register, a circular Record queue, a data memory write operation module and an interrupt processing module for hard interrupt processing;
the LOAD instruction decoding is used for receiving a DSP processor program instruction, judging whether the current program instruction is a LOAD instruction or not, and outputting a decoding control logic of the LOAD instruction to the data memory; the STORE instruction decoding is used for receiving a DSP processor program instruction, judging whether the current program instruction is the STORE instruction or not, and outputting a decoding control logic of the STORE instruction to the data memory through the circulating Record queue, the queue access module and the data memory write operation module in sequence; the RSEC instruction decoding module receives a DSP processor program instruction, judges whether the current program instruction is the RSEC instruction or not, and outputs decoding control logic of the RSEC instruction to a circulating Record queue through the queue access module; the input of the data error correction module is data and data check codes output by the data memory, and the output is the error state of the current access data and the correctable correct data and check codes; the register file is used for receiving correctable data output by the data error detection and correction module and storing data accessed by the LOAD instruction processed by the data error detection and correction module; the correctable error state register is used for connecting the data error correction and detection module to sample the error state of the current data; the data memory write operation module outputs a write control signal of the data memory; the input of the interrupt processing module is a hardware interrupt request signal of the DSP processor, and the output is connected with the RSEC instruction decoding module through a data correctable error interrupt service program.
Preferably, the data error detection and correction module receives parallel input data and a check code from the data memory to complete the error detection and correction function of the read data; outputting Single _ Error state information which can correct errors and Multiple _ Error state information which can not correct errors, data after Error correction and Error detection and check code information; the correctable Error Single _ Error state is connected to the correctable Error state register and the interrupt flag register; the data after error correction and detection are connected to the circular Record queue and the destination register file; the check code after error correction and detection is connected to the circular Record queue;
the error correctable state register stores error correctable state information and identifies whether the current DSP processor enters a data error correctable state or not; the correctable Error state register receives correctable Error state information Single _ Error from the data Error correction and detection module and correctable Error state clearing control of the RSEC instruction decoding module; when the Single _ Error is effective, setting a correctable Error state register; when the correctable error state clearing control signal is effective, the correctable error state register is cleared; when the Single _ Error and the Error-correctable state clearing control signal are simultaneously effective, the Single _ Error control signal has higher priority, and the Error-correctable state register is cleared; the sequential plurality of data correctable error events repeatedly sets the correctable error status register before the correctable error status bit is cleared.
Preferably, the circular Record queue comprises a Record queue and a queue status register;
the queue status register is used for marking the current status of the queue, including whether the queue overflows and whether the queue is processed to the tail of the queue; the depth number of the circular Record queue is larger than the delay number of the interrupt response and the sum of the number of the pipeline sections between the pipeline decoding section and the pipeline section where the data error correction module is located; the circular Record queue has two parts of inputs, wherein the first part inputs error detection and correction data and error correction codes from a data error detection and correction module and Control information Core _ Read _ Control decoded by a current LOAD instruction on a DSP processor pipeline, and the second part inputs data and check codes from STORE instruction decoding output and Control information Core _ Write _ Control of a current memory Write operation instruction;
if the data error correction and detection module detects that the currently accessed data has correctable errors, writing the first part of input information into a circular Record queue, and inputting the correctable error state information into an correctable error state register and an updating queue state register; under the condition that the error correctable status register is set, if the operation of the STORE instruction exists in the subsequent operation instruction, inputting the second part into the circular Record queue and updating the queue status register;
the content of each Record of the circular Record queue comprises access parallelism Pon, a parallel access base address BaseAddr, a word/byte mode BW, and a check code which can correct the data after error correction or write back the data and the data; and the data memory write operation module writes the data and the check code of the STORE instruction back to the data memory, or updates and corrects the data and the check code in the Record back to the data memory.
Preferably, the interrupt processing module connects one of the hardware interrupt requests to an error correctable interrupt of the currently accessed data, and the error correctable interrupt of the data memory has a higher interrupt priority; the interrupt processing module comprises hardware interrupt processing logic, an interrupt marking register, an interrupt enabling register, an interrupt vector table and a hard interrupt service program area; the hardware interrupt processing logic interrupts the current DSP processor pipeline to the pipeline of the normally executed program, and jumps to the interrupt vector table to obtain the entry of the hardware interrupt service program; the interrupt marking register and the interrupt enabling register are matched with the hardware interrupt processing logic to enter interrupt service, under the condition that the interrupt enabling register effectively enables the data correctable error interrupt, the hardware interrupt processing logic judges whether the data correctable error interrupt marking of the interrupt marking register is effective or not, and under the condition that the correctable error interrupt marking is effective, the DSP processor enters a data correctable error processing service program; the correctable Error state output Single _ Error of the data Error correction and detection module is connected to the data correctable Error interrupt position of the interrupt flag register; the interrupt flag register samples and records the data correctable error state signal; the hard interrupt service program area is entered from an entry specified by an interrupt vector table of the hard interrupt process; when the data correctable error interrupt occurs and the enabling is effective, the DSP processor jumps to a data correctable error interrupt service program from a data correctable error interrupt processing inlet in the interrupt vector table.
Preferably, the RSEC instruction decoding module accesses one entry in the circular Record queue, outputs the entry to the data memory write operation module, and updates the data and the check bits stored in the Record entry back to the data memory through the data memory write operation module.
The active fault-tolerant method for the DSP processor data memory comprises the following steps:
step 1, initializing a DSP processor, and starting a correctable error interrupt response; enabling the DSP processor to respond to correctable hardware interrupts when correctable errors of the data memory occur;
step 2, accessing data from the data memory according to the LOAD instruction, and checking the error state of the data according to the expansion of the LOAD instruction; performing data correctable error/uncorrectable error state processing on different error states in the LOAD instruction execution process;
executing step 3 after the data can be corrected and processed by mistake; under the condition that uncorrectable errors of data occur, generating an uncorrectable error interrupt signal, finishing the execution of the current instruction of the DSP processor, and processing other instructions;
step 3, under the condition that data can be corrected wrongly, judging whether the STORE instruction exists in the current processor assembly line and the subsequent program instruction execution process; if yes, continuing the step 4; otherwise, turning to the step 5 for processing;
step 4, executing the STORE instruction, and according to the expansion operation of the STORE instruction, writing the data of the STORE instruction into the memory, and simultaneously recording the control information and the data of the STORE instruction into a circular Record queue;
step 5, the processor responds to the correctable error hardware interrupt; triggering hardware interrupt of a processor by correcting data errors, and entering data error correction interrupt service program for processing;
step 6, in the interrupt service program, a loop call instruction RSEC instruction processes the records in the loop Record queue until the queue is read empty; after correctable errors are corrected, writing data by the memory of the STORE instruction to refresh the data memory, and finishing updating corrected correctable errors back to the data memory;
step 7, the error interrupt can be corrected and returned;
and 8, normally operating the DSP processor.
Preferably, the LOAD instruction processing method expanded in step 2 includes,
a-1, accessing an instruction by a DSP processor;
a-2, judging whether an LODA instruction exists or not according to the instruction operation code; is a LOAD instruction, process step A-3; otherwise, processing step A-12;
a-3, decoding a LOAD instruction;
a-4, accessing a data memory to acquire data;
step A-5, data are sent to a data Error correction and detection module for Error correction and detection, and Single _ Error and Multiple _ Error which can correct errors are judged;
step A-6, whether an uncorrectable Error happens or not; if yes, turning to the step A-10 for processing; otherwise, turning to the step A-7;
step A-7, writing the data output by the data error detection and correction module into a target register group according to the target register index of the LOAD instruction;
step A-8, judging whether a correctable Error Single _ Error occurs; if yes, setting the error correctable register, and processing the step A-9; otherwise, turning to the step A-11;
step A-9, writing the error-corrected data output by the data error correction and detection module, the check code and the access control information on the LOAD instruction pipeline into a circular Record queue in sequence; turning to step A-11;
step A-10, generating an uncorrectable multi-bit error and generating an uncorrectable error interrupt signal;
step A-11, finishing the current instruction execution of the DSP processor;
step A-12, the DSP processor performs other instruction processing.
Preferably, the method for processing the extended STORE in step 4 includes:
b-1, fetching an instruction by the DSP processor;
b-2, judging whether the STORE instruction exists or not according to the instruction operation code; is a STORE instruction, process step B-3; otherwise, turning to the step B-9;
step B-3, decoding the STORE instruction; reading an operand from a register set;
b-4, generating an operand check code;
b-5, judging whether the current error correctable state register is set; if yes, processing the step B-6; otherwise, turning to the step B-7;
b-6, sequentially writing the operand data, the operand check code and the access control information on the STORE instruction pipeline into a circular Record queue;
b-7, writing the operand data and the operand check code into a data memory;
and step B-8, finishing the current instruction execution of the DSP processor.
And step B-9, the DSP processor performs other instruction processing.
Further, in the step a-9 and the step B-6, the method for circulating Record queue records is as follows:
step E-1, if Single _ Error is true, writing the data and check code corrected by Error correction method and the read data base address, parallelism and word/byte access mode of the data memory into the tail of the queue, and turning to step E-3; otherwise, processing step E-2;
step E-2, if the current correctable error state register is Valid and the Write data operation Valid Write _ Valid of the memory Write operation instruction STORE is Valid, writing the data and the check code generated according to the coding method, the Write data base address, the parallelism and the word/byte access mode BW of the current data memory into the tail of the queue, and turning to step E-3; otherwise, turning to the step E-3;
and E-3, detecting whether the queue overflows or not, and setting corresponding status bits.
Preferably, in step 6, when the loop call instruction RSEC instructs to process the Record in the loop Record queue, the RESC instructs to read a Record from the head of the loop Record queue, and update the data in the Record back to the data storage according to the access control information; the operation method of the RESC instruction comprises the following steps:
step C-1, instruction fetching;
step C-2, judging whether an RESC instruction exists or not according to the instruction operation code; is the RESC instruction, process step C-3; otherwise, turning to the step C-9;
step C-3, decoding the RESC instruction, and outputting a control signal to a queue access module;
step C-4, the queue access module reads a Record (head) from the head of the circular Record queue;
step C-5, sending the data and control information of record (head) to a data memory write operation module;
step C-6, writing the operand data and the operand check code into a data memory;
c-7, clearing the correctable error identification register;
c-8, finishing the instruction execution;
step C-9, other instruction processing.
Compared with the prior art, the invention has the following beneficial technical effects:
the fault-tolerant device of the DSP processor data memory of the invention places the data error correction and detection process on the DSP processor core pipeline, and basically does not influence the frequency performance of the DSP processor through proper pipeline division. Conventional approaches place data error correction and detection within the memory system, increasing the access latency of the memory data.
The invention provides a fault-tolerant method for a DSP processor data memory, which combines software and hardware of an interrupt service program, can flexibly control the processing strategy and the time of the hardware to the fault tolerance according to the reliability index of a system by combining the service program, and meets the reliability of the system with lower cost. After correctable errors of the access content of the data memory are detected, the DSP processor pipeline can actively correct errors, and corrected data are timely refreshed and written back to the data memory; the method avoids the condition that the cache miss is generated after the data content is in error and the corresponding whole block of data needs to be reloaded from the external memory or the next-level memory, thereby ensuring the execution efficiency of the DSP under the abnormal error condition.
Drawings
FIG. 1 is a diagram of a DSP processor data storage fault tolerant device of the present invention;
FIG. 2 is a process flow of the data memory read operation instruction LOAD according to the present invention;
FIG. 3 is a flow chart illustrating the processing of a data memory write command STORE according to the present invention;
FIG. 4 is a process flow of a queue access instruction RSEC instruction of the present invention;
FIG. 5 is a diagram illustrating an entry queue for correctable errors and memory write operations according to the present invention;
FIG. 6 is a flow chart of a fault-tolerant processing method for a processor data storage according to the present invention.
Detailed Description
The present invention will now be described in further detail with reference to specific examples, which are intended to be illustrative, but not limiting, of the invention.
The invention provides a fault-tolerant device and a fault-tolerant method for a DSP processor data memory, which are oriented to the field of high-reliability processor design and provide hardware structure support and a software solution for the application-oriented processor reliability design.
The invention discloses an active fault-tolerant device of a DSP processor data memory, which is a circuit arranged between a DSP processor core assembly line and a core data memory and used for actively fault-tolerant refreshing of the data memory. The active fault-tolerant device comprises the following circuit modules: the system comprises a LOAD instruction decoding module for loading a data memory, a STORE instruction decoding module for writing the data memory, a queue access module, a queue access write-back instruction decoding module (RSEC instruction decoding module), the data memory, a data error correction and detection module, a general register file, a correctable error state register for correctable error data storage, a circular Record queue, a data memory write operation module and an interrupt processing module for hard interrupt processing.
The connection relationship of each module is as follows: and the LOAD instruction decoding receives the DSP processor program instruction, judges whether the current program instruction is the LOAD instruction or not, and outputs the decoding control logic of the LOAD instruction to the data memory. The STORE instruction decoding receives the DSP processor program instruction, judges whether the current program instruction is the STORE instruction or not, and outputs the decoding control logic of the STORE instruction to the data memory. The RSEC instruction decoding module receives a DSP processor program instruction, judges whether the current program instruction is the RSEC instruction or not, and outputs decoding control logic of the RSEC instruction to the circular Record queue. The input of the data memory is data reading control of a LOAD instruction and the output is data and data check codes accessed under the control of the LOAD instruction; the data memory also inputs the data memory write operation logic of the data memory write operation module, and outputs write control information and write data. The input of the data error correction module is data and data check code output by the data memory, and the output is the error state of the current access data and the correctable correct data and check code. The register file receives correctable data output by the data error correction and detection module and stores data accessed by the LOAD instruction processed by the data error correction and detection module; the register file outputs the data that the STORE instruction is to write to the data memory. The error state of the current data sampled by the error state register can be corrected. The inputs to the circular Record queue are correctable data and data memory access control information for a LOAD instruction access and data memory access control information for a STORE instruction. The data memory write operation module inputs a memory write operation control signal decoded by a STORE instruction and a write operation control signal read out from the circular Record queue and outputs the memory write operation control signal as a write control signal of the data memory. The input of the hard interrupt process is a hardware interrupt request signal of the DSP processor.
Further, the data error correction and detection module receives parallel input data and a check code from the data memory to complete the error correction and detection function of the read-in data; and outputting the status information of Single _ Error and uncorrectable Error, data after Error correction and detection and check code information. The correctable Error Single _ Error state is connected to the correctable Error state register and the interrupt flag register; the data after error correction and detection are connected to the circular Record queue and the destination register file; and connecting the check code after error checking and detecting to a circular Record queue.
Further, the error correctable state register stores error correctable state information and identifies whether the current DSP processor enters a data error correctable state. The correctable Error state register receives correctable Error state information Single _ Error from the data Error correction module and correctable Error state clearing control from the RSEC instruction decoding module. When the Single _ Error is effective, setting a correctable Error state register; when the correctable error state clearing control signal is effective, the correctable error state register is cleared; when the Single _ Error and the Error correctable state clearing control signal are simultaneously effective, the Single _ Error control signal has higher priority, and the Error correctable state register is cleared. The sequential plurality of data correctable error events repeatedly sets the correctable error status register before the correctable error status bit is cleared.
The circular Record queue comprises a Record queue and a queue status register. The circular Record queue has two inputs, the first input inputs the error detection and correction data and error correction code from the data error detection and correction module and the Control information Core _ Read _ Control decoded by the current LOAD instruction on the DSP processor pipeline, and the second input inputs the data and check code from the decoded output of the STORE instruction and the Control information Core _ Write _ Control of the current memory Write operation instruction. If the data error correction and detection module detects that the currently accessed data has correctable errors, writing the first part of input information into a circular Record queue, and inputting the correctable error state information into an correctable error state register and an updating queue state register; in the case where the correctable error status register is set, if there is an operation of the STORE instruction in the subsequent operation instructions, a second portion of the inputs needs to be written into the circular Record queue and the queue status register needs to be updated.
The Control information Core _ Read _ Control of the memory Read operation instruction comprises: reading data access parallelism Pon, reading base address ReadBaseAddr, word/byte mode BW and reading data operation Valid Read _ Valid; the Control information Core _ Write _ Control of the memory Write operation instruction STORE includes: write data access parallelism Pon, Write base address WriteBaseAddr, word/byte mode BW, Write data operation Valid Write _ Valid.
The contents of each Record of the circular Record queue include: accessing the parallelism Pon, accessing the base address BaseAddr in parallel, accessing the word/byte mode BW, correcting the error-corrected data or writing back the data, and checking the code of the data.
Preferably, the circular Record queue records effective input Record information into Record in a first-in first-out modekIn a queue.
Further, the circular Record queue comprises a group of queue status registers. The queue status register is used to mark the current status of the queue. Including whether the queue overflows and has been processed to the end of the queue. The depth number of the circular Record queue should be greater than the delay number of the interrupt response + the number of pipeline sections between the pipeline decoding section and the pipeline section where the data error correction module is located.
Preferably, the STORE instruction decodes, detects whether the currently decoded instruction is a data memory Write operation instruction, and outputs Control information Core _ Write _ Control decoded for the STORE instruction and data and check code of the STORE, and inputs the Control information Core _ Write _ Control decoded for the STORE instruction and the data and check code of the STORE into the circular Record queue in the case that the correctable error status register is valid.
Furthermore, the RSEC instruction decoding module accesses a Record entry in the circular Record queue, outputs the Record entry to the data memory write operation module, and updates the data and the check bits stored in the Record entry back to the data memory through the data memory write operation module. Preferably, the RSEC instruction accesses the queue from the head of the circular Record queue and outputs a head Record (head) to the data memory write operation module.
The data memory write operation module generates the base address Addr, the access parallelism Pon and the control information of the byte mode BW of the data memory write operation according to the control information decoded by the STORE instruction and the control information output by the circular Record queue, and writes the data and the check code of the STORE instruction back to the data memory or updates and corrects the data and the check code in the Record of the Record back to the data memory.
Further, the interrupt processing module connects one of the hardware interrupt requests to the correctable error state interrupt of the currently accessed data, and the correctable error interrupt of the data memory has higher interrupt priority.
Preferably, the interrupt handling module includes hardware interrupt handling logic, an interrupt flag register, an interrupt enable register, an interrupt vector table, and a hard interrupt service routine area.
The hardware interrupt processing logic interrupts the current DSP processor pipeline to the pipeline of the normally executed program, and jumps to the interrupt vector table to obtain the entry of the hardware interrupt service program. Preferably, the interrupt flag register and the interrupt enable register cooperate with the hardware interrupt handling logic to enter an interrupt service, and when the interrupt enable register is enabled to enable the data correctable interrupt, the hardware interrupt handling logic determines whether the data correctable interrupt flag of the interrupt flag register is valid, and when the correctable interrupt flag is valid, the DSP processor enters a data correctable interrupt handling service program. The correctable Error state output Single _ Error of the data Error correction and detection module is connected to the data correctable Error interrupt position of the interrupt flag register; the interrupt flag register samples and records the data correctable error state signal.
Preferably, the hard interrupt service routine area is entered by an entry specified by an interrupt vector table of the hard interrupt process. When the data correctable error interrupt occurs and the enable is effective, the DSP processor jumps to the correctable error interrupt service program from the data correctable error interrupt processing inlet in the interrupt vector table.
The invention discloses an active fault-tolerant method for a DSP processor data memory, which comprises the following steps:
step 1, the DSP processor is initialized, and the error interrupt response can be corrected. Enabling the DSP processor to respond to correctable hardware interrupts when correctable errors of the data memory occur;
step 2, a data store access instruction (LOAD) will access data from the data store. And checking the error status of the data according to the extension of the LOAD instruction. Performing data correctable error/uncorrectable error state processing on different error states in the LOAD instruction execution process;
and 3, under the condition that data can be corrected wrongly, judging whether a memory write (STORE) instruction exists in the current processor pipeline and the subsequent program instruction execution process. If yes, continuing the step 4; otherwise, turning to the step 5 for processing;
step 4, executing the STORE instruction, and according to the expansion operation of the STORE instruction, writing the data of the STORE instruction into the memory, and simultaneously recording the control information and the data of the STORE instruction into a circular Record queue;
and 5, responding to the correctable wrong hardware interrupt by the processor. Triggering hardware interrupt of a processor by correcting data errors, and entering data error correction interrupt service program for processing;
and 6, in the interrupt service program, processing the Record in the circular Record queue by a circular call instruction RSEC instruction until the queue is read to be empty. The corrected data is updated back to the data store. After correctable errors are corrected, the data written by the memory of the STORE instruction is refreshed back to the data memory;
step 7, the error interrupt can be corrected and returned;
and 8, normally operating the DSP processor.
The control method for the active fault tolerance of the processor data memory is used for expanding the processing method of a LOAD (LOAD) instruction of the data memory of a processor. Extended LOAD instruction processing method bits:
a-1, accessing an instruction by a DSP processor;
and step A-2, judging whether the LODA instruction exists or not according to the instruction operation code. Is a LOAD instruction, process step A-3; otherwise, processing step A-12;
a-3, decoding a LOAD instruction;
a-4, accessing a data memory to acquire data;
and step A-5, sending the data into a data Error correction and detection module to carry out Error correction and detection and judgment of Single _ Error and Multiple _ Error which can not be corrected.
And step A-6, whether an uncorrectable Error happens or not. If yes, turning to the step A-10 for processing; otherwise, turning to the step A-7;
step A-7, writing the data output by the data error detection and correction module into a target register group according to the target register index of the LOAD instruction;
and step A-8, judging whether a correctable Error Single _ Error occurs. If yes, setting the error correctable register, and processing the step A-9; otherwise, turning to the step A-11;
step A-9, writing the error-corrected data output by the data error correction and detection module, the check code and the access control information on the LOAD instruction pipeline into a circular Record queue in sequence; turning to step A-11;
step A-10, an uncorrectable multi-bit error occurs, generating an uncorrectable error interrupt signal.
Step A-11, instruction execution ends.
Step A-12, other instruction processing.
In the step A-5 of the LOAD instruction operation flow, the data error detection and correction circuit is divided into N groups according to the data parallelism degree, each group corresponds to a 32-bit word, and data error detection and correction are carried out according to the word unit. The error checking and correcting logic uses 32-bit data bits
D=d32d31d20d29d28d27d26d25d24d23d22d21d20d19d18d17d16d15d14d13d12d11d10d9d8d7d6d5d4d3d2d1
(di,{1≤i≤32}={0,1})
And 8-bit check bit P '═ P'8p'7p'6p'5p'4p'3p'2p1'(pi',{1≤i≤8}={0,1})
Generating an 8-bit error identification bit P ═ P8p7p6p5p4p3p2p1(pi,{1≤i≤8}={0,1})
The logic for generating the identification bits is:
Figure GDA0001540800580000101
wherein,
Figure GDA0001540800580000102
is an exclusive or operation; order to
Figure GDA0001540800580000103
The method for error correction and detection according to the P value is as follows:
(1) if P is 0000000, it indicates that the data and the check code are Error-free, Multiple _ Error is false, and Single _ Error is false;
(2) if weight is '0' and p8p7p6p5p4p3p2p1Not equal to "0000000", indicating that two bits of the data D or the check code P' have errors, Multiple _ Error ═ true; the data or the check code is not corrected;
(3) if weight is 1' and p8p7p6p5p4p3p2p1Not equal to "0000000", indicating that an Error occurs in one bit of the data D or the check code P', Single _ Error ═ true; the method for correcting the data bit error comprises the following steps:
Figure GDA0001540800580000104
wherein, "·" is a bitwise and operation; "+" is a bitwise OR operation;
Figure GDA0001540800580000105
is bitwise negation operation; di' i is more than or equal to 1 and less than or equal to 32 as a data bit diThe data bits after error correction can be corrected; kiThe calculation method of (2) is as follows:
Figure GDA0001540800580000111
the method for correcting errors by using the check bits comprises the following steps:
if p isiNot equal to '0', (1. ltoreq. i.ltoreq.8), and
Figure GDA0001540800580000114
then p'jThe occurrence of a correctable error is likely,
Figure GDA0001540800580000115
wherein, p 'at the right end of equation'jJ is more than or equal to 1 and less than or equal to 8, which is the result after the error correction of the check bit can be corrected; p 'of the left end of equation'jJ is more than or equal to 1 and less than or equal to 8, which is the value before the error correction of the check bit can be corrected.
Furthermore, for the processor supporting multiple parallelism, the error detection and correction method is used for carrying out data error detection and correction by taking a word as a unit. The method for obtaining the error detection and correction states of the current parallel access data according to the parallelism Pon of the data comprises the following steps:
(1) if the data parallelism is 1, i.e. Pon 1, then Single Error s0,Multiple_Error=m0(ii) a Wherein m isiRespectively, the parallel data in word unit can correct error state, i is more than or equal to 0 and less than or equal to 3, siThe parallel data are in double error states by taking a word as a unit, and i is more than or equal to 0 and less than or equal to 3;
(2) if the data parallelism is 2, i.e., Pon 2, then Single Error s0∨s1,Multiple_Error=m0∨m1(ii) a V-shaped is an on-position OR operation;
(3) if the data parallelism is 4, i.e. Pon 4, then Single Error s0∨s1∨s2∨s3,Multiple_Error=m0∨m1∨m2∨m3(ii) a The V-shaped graph is operated according to the OR of the position.
The active fault-tolerant control method for the processor data memory is used for expanding the processing method of a data memory write (STORE) instruction of the processor. The processing method of the extended STORE comprises the following steps:
step B-1, instruction fetching;
and step B-2, judging whether the STORE instruction exists according to the instruction operation code. Is a STORE instruction, process step B-3; otherwise, turning to the step B-9;
step B-3, decoding the STORE instruction; reading an operand from a register set;
b-4, generating an operand check code;
and step B-5, judging whether the current error correctable state register is set. If yes, processing the step B-6; otherwise, turning to the step B-7;
b-6, sequentially writing the operand data, the operand check code and the access control information on the STORE instruction pipeline into a circular Record queue;
b-7, writing the operand data and the operand check code into a data memory;
and step B-8, finishing the instruction execution.
Step B-9, other instruction processing.
In step B-4 of the operation flow of the extended write to memory instruction STORE, the logic for generating the operand parity bits is:
Figure GDA0001540800580000121
wherein, the data check code is generated by taking a 32-bit data word D as a unit:
D=d32d31d20d29d28d27d26d25d24d23d22d21d20d19d18d17d16d15d14d13d12d11d10d9d8d7d6d5d4d3d2d1
(di,{1≤i≤32}={0,1})
generating 8-bit check bit P '═ P'7p'6p'5p'4p'3p'2p1'(p'i,{1≤i≤7}={0,1})。
Further, for a processor supporting multiple parallelism, the check code of the data is generated in units of words by the method. The processor is provided with a parallel multi-bit error detection function by taking a word as a unit.
In the step A-9 and the step B-6, the method for circulating Record queue records comprises the following steps:
step E-1, if Single _ Error is true, writing the data D and the check code P' corrected by the Error correction method into the tail part of the queue together with the read data base address ReadBaseAddr, the parallelism Pon and the word/byte access mode BW of the data memory at this time
Record (tail) Record (Single _ Error) { D, P', readbase addr, Pon, BW }, go to step E-3; otherwise, processing step E-2;
step E-2, if the current correctable error status register is Valid and the Write data operation Valid Write _ Valid of the memory Write operation Instruction STORE is Valid, writing the data D and the check code P 'generated by the encoding method into the queue tail Record (tail) Record (Record) Record (Write _ Instruction) D, P', Write base addr, Pon, BW, together with the Write data base address Write _ base addr, parallelism Pon, and word/byte access mode BW of the current data memory, and turning to step E-3; otherwise, turning to the step E-3;
and E-3, detecting whether the queue overflows or not, and setting corresponding status bits.
In step B-6 of the operation flow of the extended Write memory Instruction STORE, when a correctable error occurs in the data memory, after the correctable error status register is set, the control information Record (Write _ Instruction) of the Write Instruction of the subsequent data memory after the correctable error occurs is sequentially recorded in the circular Record queue. The purpose of recording Record (Write _ Instruction) is to avoid Write-after-Write correlation of the data memory when Record (Single _ Error) is written back to the data memory.
Further, in step 6, the RESC instruction reads a Record from the head of the circular Record queue, and updates the data in the Record back to the data storage according to the access control information. The operation method of the RESC instruction comprises the following steps:
step C-1, instruction fetching;
and C-2, judging whether the RESC instruction exists or not according to the instruction operation code. Is the RESC instruction, process step C-3; otherwise, turning to the step C-9;
step C-3, decoding the RESC instruction, and outputting a control signal to a queue access module;
step C-4, the queue access module reads a Record (head) from the head of the circular Record queue;
step C-5, sending the data and control information of record (head) to a memory write operation module;
step C-6, writing the operand data and the operand check code into a data memory;
c-7, clearing the correctable error identification register;
and C-8, finishing the instruction execution.
Step C-9, other instruction processing.
Further, the memory write operation block in step C-5, like the hardware logic of step B-7 in the STORE instruction flow, multiplexes the hardware logic on the STORE instruction pipeline in the processor pipeline.
In the operation flow step C-4 of the RESC instruction, a method for reading entries from the circular Record queue includes:
f-1, detecting whether the current queue is empty or not, and juxtaposing corresponding state bits; if not, executing the step F-2, and if the queue head is empty, returning an empty queue head identifier;
and F-2, reading a head of line record (record), wherein the head of line points to the next record of the current record.
Further, in step 6, the method for processing the correctable-error interrupt service routine includes:
and D-1, judging whether the current circulating Record queue is read to be empty or not. If yes, turning to the step D-3; otherwise, processing step D-2;
and step D-2, executing the RESC instruction. Turning to the step D-1;
and D-3, ending the interrupt service and returning the interrupt.
In the preferred embodiment, the beneficial effects of the invention are illustrated by a high-performance DSP processor with a SIMD architecture. The processor employs a harvard architecture, integrating a 16KB data memory and a 16KB program memory. Wherein, the access to the data memory supports the access with the parallelism Pon ═ {1, 2, 4}, namely, 1 ×,2 × or 4 × 32bit data can be accessed in parallel every clock cycle. The instruction for accessing the data memory is a memory-register LOAD instruction LOAD; the instruction to write to the data memory is a register-memory write instruction STORE.
FIG. 1 illustrates an example of the present invention for supporting active fault tolerance of a data storage device. The invention relates to an active fault-tolerant device of a processor data memory, which is characterized in that a circuit for actively fault-tolerant refreshing of the data memory is arranged between a processor core assembly line and a data memory in a core. The fault tolerance device comprises the following circuit modules:
the LOAD instruction decoding receives the program instruction, judges whether the current instruction is the LOAD instruction or not, and outputs the decoding control logic of the LOAD instruction to the data memory. The STORE instruction decoding receives the program instruction, judges whether the current instruction is the STORE instruction or not, and outputs the decoding control logic of the STORE instruction to the data memory.
The data error checking and correcting module: the module receives N paths of parallel input data Nx 32bit data (111) and check code Nx 8bit check code 112 from a data memory, and finishes the functions of error correction and detection of read data; and outputs the status information of Single _ Error 101 and Multiple _ Error 102 of the current nxy data, the data after Error correction and detection and the check code information 103. The correctable Error Single _ Error state is connected to the correctable Error state register and the interrupt flag register; the data after error correction and detection are connected to the circular queue and the destination register file; and the check code after error correction and detection is connected to the circular queue.
A correctable error state register for storing correctable error state information identifying whether the current processor enters a data correctable error state. The correctable Error status register receives correctable Error status information Single _ Error from the data Error correction module and correctable Error status clear control 105 from the RSEC instruction decode module. When the Single _ Error is effective, setting a correctable Error state register; when the correctable error state clearing control signal is effective, the correctable error state register is cleared; when the Single _ Error and the Error correctable state clearing control signal are simultaneously effective, the Single _ Error control signal has higher priority, and the Error correctable state register is cleared. The sequential multiple data correctable error events repeatedly set the correctable error status register before the correctable error status bit is cleared.
And the circular Record queue comprises a Record queue and a queue status register. The circular Record queue has two inputs, the first input is the error correction and detection data and error correction code from the data error correction and detection module and the Control information Core _ Read _ Control 106 of the current memory Read operation instruction on the processor pipeline, and the second input is the data output from the memory Write operation instruction detection module, the check code 107 and the Control information Core _ Write _ Control 108 of the current memory Write operation instruction. If the current pipeline detects that the read-in data has correctable errors, writing the first part of input information into a circular Record queue, and writing a correctable error state register and an updating queue state register; in the case where the correctable error status register is set, if there is a write operation instruction of the data memory in the subsequent operation instruction, it is necessary to input the second portion into the write circular Record queue and update the queue status register. The output of the circular queue is connected to a queue access module.
The Control information Core _ Read _ Control of the memory Read operation instruction includes: reading data access parallelism Pon, reading base address ReadBaseAddr, word/byte mode BW and reading data operation Valid Read _ Valid; the Control information Core _ Write _ Control of the memory Write operation instruction STORE includes: write data access parallelism Pon, Write base address WriteBaseAddr, word/byte mode BW, Write data operation Valid Write _ Valid. The contents of each Record of the circular Record queue include: accessing the parallelism Pon, accessing the base address Read/WriteBaseAddr in parallel, the word/byte mode BW, and correcting the error-corrected data or writing back the data, and the check code of the data;
the circular Record queue records the input Record information at each time into the Record In a First-In First-Out (FIFO) modekAnd (4) queues.
Queue status register 113, marks the current status of the queue. Including whether the queue overflows and has been processed to the end of the queue. The depth number of the queue should be greater than the delay number of the interrupt response plus the number of pipeline sections between the pipeline decoding section and the pipeline section where the data error correction module is located.
And decoding the STORE instruction, detecting whether the current decoded instruction is a data memory Write operation instruction, outputting Control information Core _ Write _ Control decoded for the STORE instruction and the data and check code of the STORE, and inputting the Control information Core _ Write _ Control decoded for the STORE instruction into a circular Record queue under the condition that a correctable error state register is effective.
A queue access module: this block inputs the output control 109 from the RSEC instruction decode block and the output 114 of the circular Record queue. The queue access module accesses the queue from the head of the circular Record queue and outputs a head Record (head) to the memory write operation module.
A data memory write operation module: this module inputs output record (head)110 from the queue access module. And the data memory write operation module writes the data and the error correction code in the record (record) into the corresponding data memory and the error correction and detection memory according to the base address BaseAddr in the record (record), the access parallelism Pon and the control information of the byte mode BW.
RSEC instruction decoding module: the module inputs the RSEC instruction 115 for the error correctable interrupt service routine from the DSP processor program memory and outputs the error correctable clear state control signal 105 and the queue access control signal 109.
In the invention, interrupt processing is adopted, and the update write-back of correctable errors of the data memory is completed by an interrupt service program. Because the data after Error Correction can be recorded in the circular Record queue, a special instruction rsec (Record access and Single Error Correction instruction) instruction is added in the DSP processor instruction set for continuously reading the Record (Record) at the head of the queue and completing the update and write-back of the data memory according to the Record information. This example illustrates an RSEC instruction implemented on a DSP processor.
The error-correctable interrupt service routine is entered from an entry specified by an interrupt vector table of the processor interrupt processing module. The correctable Error state output Single _ Error 102 of the data Error correction module is connected to the highest priority interrupt position of the interrupt flag register; the interrupt flag register samples and records the data correctable error state signal. When the data correctable error interrupt occurs and the enabling is effective, the processor jumps to the correctable error interrupt service program from the data correctable error interrupt processing inlet in the interrupt vector table.
According to the arrangement of the device, the processing flows of the LOAD instruction and the STORE instruction are expanded, and are respectively shown in fig. 2 and fig. 3. The method comprises the following specific steps:
the processing flow of the read data memory instruction LOAD is as follows:
step A-1: fetching an instruction;
step A-2: and judging whether the LODA instruction exists or not according to the instruction operation code. Is a LOAD instruction, process step A-3; otherwise, processing step A-12;
step A-3: decoding a LOAD instruction;
step A-4: accessing a data memory to obtain data;
step A-5: and sending the data into a data Error correction and detection module to carry out Error correction and detection and judgment of Single _ Error and Multiple _ Error which can not be corrected.
Step A-6: whether an uncorrectable Error has occurred. If yes, turning to the step A-10 for processing; otherwise, turning to the step A-7;
step A-7: writing the data output by the data error correction and detection module into a target register group according to the target register index of the LOAD instruction;
step A-8: it is determined whether a correctable Error Single Error has occurred. If yes, setting the error correctable register, and processing the step A-9; otherwise, turning to the step A-11;
step A-9: sequentially writing the error-corrected data output by the data error correction and detection module, the check code and the access control information on the LOAD instruction pipeline into a circular Record queue; turning to step A-11;
step A-10: an uncorrectable multi-bit error occurs, generating an uncorrectable error interrupt signal.
Step A-11: the instruction execution ends.
Step A-12: other instructions are processed.
The processing flow of the extended STORE instruction STORE is as follows:
step B-1: fetching an instruction;
step B-2: and judging whether the STORE instruction exists or not according to the instruction operation code. Is a STORE instruction, process step B-3; otherwise, turning to the step B-9;
step B-3: STORE instruction decoding; reading an operand from a register set;
step B-4: generating an operand check code;
step B-5: and judging whether the current error correctable state register is set or not. If yes, processing the step B-6; otherwise, turning to the step B-7;
step B-6: sequentially writing the operand data, the operand check code and the access control information on the STORE instruction pipeline into a circular Record queue;
step B-7: writing the operand data and the operand check code into a data memory;
step B-8: the instruction execution ends.
Step B-9: other instructions are processed.
The processor of the embodiment of the invention supports a parallel access mode of three data of 1 x, 2 x or 4 x 32 bit. The check code generation and data check of data and error correction and detection, correctable error and double error detection are all processed by using 32-bit data word as unit. Therefore, the data error detection and correction module requires four ways of parallel data error detection and correction logic.
The data error detection and correction logic of the invention is implemented according to the following algorithm: the data information bits are composed of 32-bit data bits
D=d32d31d20d29d28d27d26d25d24d23d22d21d20d19d18d17d16d15d14d13d12d11d10d9d8d7d6d5d4d3d2d1
(di,{1≤i≤32}={0,1})
And 8-bit check bit P '═ P'8p'7p'6p'5p'4p'3p'2p1'(pi',{1≤i≤8}0, 1).
The logic for check bit generation in the STORE instruction flow is:
Figure GDA0001540800580000171
the Error correction and detection logic in the LOAD instruction flow compares the 8-bit check code with the 8-bit Error flag bit to generate Error correctable Single _ Error and Error uncorrectable Multiple _ Error status signals.
8-bit error identification bit P ═ P8p7p6p5p4p3p2p1(pi,{1≤i≤8}The resulting logic for {0,1}) is:
Figure GDA0001540800580000172
wherein,
Figure GDA0001540800580000173
is an exclusive or operation; order to
Figure GDA0001540800580000174
The method for error correction and detection according to the P value is as follows:
(1) if P is "0000000", indicating that the data and the check code are Error-free, Multiple _ Error, Single _ Error;
(2) if weight is '0' and p8p7p6p5p4p3p2p1Not equal to "0000000", indicating that two bits of the data D or the check code P' have errors, Multiple _ Error ═ true; the data or the check code is not corrected;
(3) if weight is 1' and p8p7p6p5p4p3p2p1Not equal to "0000000", indicating that an Error occurs in one bit of the data D or the check code P', Single _ Error ═ true; the method of data bit error correction is performed as described in the summary of the invention.
The processor of the invention supports the access of 128 bit data with 4 multiplied parallelism at most, and because the data carries out error detection and correction by word, the processor can support the function of error detection and correction by word for the 128 bit data. On the other hand, the error correction and detection logic of the present invention avoids the use of 64-bit or 128-bit error correction and detection logic. Because 64-bit and 128-bit error correction and detection logic requires more check bits and the encoded and decoded logic requires more area and circuit delay than 32-bit error correction and detection logic.
The invention refreshes the corrected data back to the data memory through software configuration when the correctable error (such as correctable error, which is common error in electronic devices due to single particle, in the following of the embodiment, correctable error refers to correctable error) occurs in the data through the cooperative mechanism of software and hardware. In the invention, the data and control information which can be corrected by errors are recorded through the circular Record queue and are used for refreshing the data which can be corrected by errors back to the memory. When the error correctable in the EDAC verification occurs, no matter which group of the multiple groups of parallel data has the error correctable, the error correctable state register is set, and records of the currently corrected data, the EDAC verification code and the like are written into a Record queue for write-back updating of the data memory in subsequent fault-tolerant processing. Meanwhile, in order to avoid that the correctable wrong write back can cover the subsequent write operation data to the same address, in the DSP structure, the data of the write operation instruction after the correctable wrong write operation and the write control information are also recorded in the queue in sequence. The process of logging into the queue is shown in figure 5. The processor in FIG. 5 includes a 10-stage pipeline including PF, FE, DC, EX 1-4, EDAC, and EX 6. Each section of the pipeline comprises different functional units according to the flow of the instruction. Wherein the recording of the correctable error information and the recording of the write operation instruction information occur at an EDAC stage of the pipeline. The process of logging into the queue is illustrated by the following procedure example:
Figure GDA0001540800580000181
in the above example, if the LOAD instruction reads from memory: the first 32 bits of the data are data bits and the second 8 bits are check bits, 0x0000000080, AC and 0x0000000004, 4b. According to the error correction and detection logic, it can be judged that the first data has correctable error, and the error bit is d3Corrected data is 0x0000000084, AC; the second data has correctable error, and the error bit is d10The corrected data is 0x0000000204, 4B. Two errors occur in the data access, but both are correctable errors. Therefore, Error correction module data can correct Error state information Single _ Error ═ true. According to the access information of the current LOAD instruction, the following information needs to be recorded in the queue:
data and check 0x0000000084,AC;0x0000000204,4B
Reading base addresses AR5
Degree of parallelism 2
Word/byte pattern 1 ('1' for word mode; '0' for byte mode)
The first STORE instruction STOREs the contents of R6 at the AR5 address. Because the data of the LOAD instruction can be corrected incorrectly, the information of the STORE instruction needs to be recorded into a queue:
data and check 0x0000000288,07;
Reading base addresses AR5
Degree of parallelism 1
Word/byte pattern 1
The second STORE instruction STOREs the contents of R3 at the AR6 address. Because the data of the LOAD instruction can be corrected incorrectly, the information of the STORE instruction needs to be recorded into a queue:
data and check 0x00000040,4C;
Reading base addresses AR6
Degree of parallelism 1
Word/byte pattern 1
The invention refreshes the correctable error data detected in the program example back to the data memory through software. Because the information of the 'refreshing' data memory is stored in the circular Record queue hardware, a special instruction, namely an RSEC instruction, is needed to be set for accessing the queue, and the data written back by the data, the error correcting code and the control information are sent to the memory write operation module after being 'decoded', so that the write back of the data is completed.
The RSEC instruction is a 32bit instruction. Because the access entrance of the queue can be regarded as a special register on the kernel structure of the processor, the RSEC instruction coding uses the register access instruction coding format, and the coding of the RSEC instruction is realized by accessing an idle register address to generate corresponding micro-operation. The processing flow of the RSEC instruction is shown in fig. 4. The method comprises the following specific steps:
step C-1: fetching an instruction;
step C-2: and judging whether the RESC instruction exists or not according to the instruction operation code. Is the RESC instruction, process step C-3; otherwise, turning to the step C-9;
step C-3: decoding the RESC instruction, and outputting a control signal to a queue access module;
step C-4: the queue access module reads a Record (head) from the head of the circular Record queue;
step C-5: sending record (head) data and control information to a memory write operation module;
step C-6: writing the operand data and the operand check code into a data memory;
step C-7: clearing the error correctable identification register;
step C-8: the instruction execution ends.
Step C-9: other instructions are processed.
The processing flow shown in fig. 6 is a control method for active fault tolerance of a processor data storage provided by the invention, and comprises the following steps:
step 1, initializing a DSP processor, and correcting error interrupt response. Enabling the DSP processor to respond to correctable hardware interrupts when correctable errors of the data memory occur;
and 2, accessing data from the data memory by a data memory access instruction (LOAD). And checking the error status of the data according to the extension of the LOAD instruction. Performing data correctable error/uncorrectable error state processing on different error states in the LOAD instruction execution process;
and 3, under the condition that data can be corrected wrongly, judging whether a memory write (STORE) instruction exists in the current processor pipeline and the subsequent program instruction execution process. If yes, continuing the step 4; otherwise, turning to the step 5 for processing;
step 4, executing the STORE instruction, and according to the expansion operation of the STORE instruction, writing the data of the STORE instruction into the memory, and simultaneously recording the control information and the data of the STORE instruction into a circular Record queue;
and 5, responding the correctable wrong hardware interrupt by the processor. Triggering hardware interrupt of a processor by correcting data errors, and entering data error correction interrupt service program for processing;
and 6, in the interrupt service program, processing records in the circular Record queue by a circular call instruction RSEC instruction until the queue is empty. The corrected data is updated back to the data store. After the correctable and correctable errors are corrected, the data written by the memory of the STORE instruction is refreshed back to the data memory;
step 7, the error interrupt can be corrected and returned;
and 8, normally operating the DSP processor.
In an implementation example, an interrupt service routine example is designed:
Figure GDA0001540800580000211
in the program example, the number of loops is set to the queue depth, i.e., the number of RSEC loop executions is fixed. Therefore, the read-write operation of the data memory caused by field protection in the interrupt service program can be avoided.
The invention provides an active fault-tolerant device and method for a data storage of a DSP (digital signal processor). by combining software and hardware, the data which can be corrected by mistake is refreshed to the data storage in time, and the fault-tolerant strategies of the software and the hardware can be controlled by a more flexible method so as to meet the reliability index of a system.

Claims (10)

  1. The active fault-tolerant device of the DSP processor data memory is characterized by being arranged between a DSP processor core assembly line and a data memory in a core and used for actively fault-tolerant refreshing of the data memory; the system comprises a LOAD instruction decoding module for loading a data memory, a STORE instruction decoding module for writing the data memory, a queue access module, an RSEC instruction decoding module, the data memory, a data error correction and detection module, a general register file, an error correctable state register, a circular Record queue, a data memory write operation module and an interrupt processing module for hard interrupt processing;
    the LOAD instruction decoding is used for receiving a DSP processor program instruction, judging whether the current program instruction is a LOAD instruction or not, and outputting a decoding control logic of the LOAD instruction to the data memory;
    the STORE instruction decoding is used for receiving a DSP processor program instruction, judging whether the current program instruction is the STORE instruction or not, and outputting a decoding control logic of the STORE instruction to the data memory through the circulating Record queue, the queue access module and the data memory write operation module in sequence;
    the RSEC instruction decoding module receives a DSP processor program instruction, judges whether the current program instruction is the RSEC instruction or not, and outputs decoding control logic of the RSEC instruction to a circulating Record queue through the queue access module;
    the input of the data error correction module is data and data check codes output by the data memory, and the output is the error state of the current access data and the correctable correct data and check codes;
    the register file is used for receiving correctable data output by the data error detection and correction module and storing data accessed by the LOAD instruction processed by the data error detection and correction module;
    the correctable error state register is used for connecting the data error correction and detection module to sample the error state of the current data;
    the data memory write operation module outputs a write control signal of the data memory;
    the input of the interrupt processing module is a hardware interrupt request signal of the DSP processor, and the output is connected with the RSEC instruction decoding module through a data correctable error interrupt service program.
  2. 2. The active fault-tolerant device of the DSP processor data storage device of claim 1, wherein the data error correction and detection module receives parallel input data and a check code from the data storage device to complete the error correction and detection function of the read-in data; outputting Single _ Error state information which can correct errors and Multiple _ Error state information which can not correct errors, data after Error correction and Error detection and check code information;
    the correctable Error Single _ Error state is connected to the correctable Error state register and the interrupt flag register; the data after error correction and detection are connected to the circular Record queue and the destination register file; the check code after error correction and detection is connected to the circular Record queue;
    the error correctable state register stores error correctable state information and identifies whether the current DSP processor enters a data error correctable state or not; the correctable Error state register receives correctable Error state information Single _ Error from the data Error correction and detection module and correctable Error state clearing control of the RSEC instruction decoding module;
    when the Single _ Error is effective, setting a correctable Error state register; when the correctable error state clearing control signal is effective, the correctable error state register is cleared; when the Single _ Error and the Error-correctable state clearing control signal are simultaneously effective, the Single _ Error control signal has higher priority, and the Error-correctable state register is cleared; the sequential plurality of data correctable error events repeatedly sets the correctable error status register before the correctable error status bit is cleared.
  3. 3. The active fault tolerance device of DSP processor data storage according to claim 1, wherein said circular Record queue comprises a Record queue and a queue status register;
    the queue status register is used for marking the current status of the queue, including whether the queue overflows and whether the queue is processed to the tail of the queue; the depth number of the circular Record queue is larger than the delay number of the interrupt response and the sum of the number of the pipeline sections between the pipeline decoding section and the pipeline section where the data error correction module is located;
    the circular Record queue has two parts of inputs, wherein the first part inputs error detection and correction data and error correction codes from a data error detection and correction module and Control information Core _ Read _ Control decoded by a current LOAD instruction on a DSP processor pipeline, and the second part inputs data and check codes from STORE instruction decoding output and Control information Core _ Write _ Control of a current memory Write operation instruction;
    if the data error correction and detection module detects that the currently accessed data has correctable errors, writing the first part of input information into a circular Record queue, and inputting the correctable error state information into an correctable error state register and an updating queue state register;
    under the condition that the error correctable status register is set, if the operation of the STORE instruction exists in the subsequent operation instruction, inputting the second part into the circular Record queue and updating the queue status register;
    the content of each Record of the circular Record queue comprises access parallelism Pon, a parallel access base address BaseAddr, a word/byte mode BW, and a check code which can correct the data after error correction or write back the data and the data;
    and the data memory write operation module writes the data and the check code of the STORE instruction back to the data memory, or updates and corrects the data and the check code in the Record back to the data memory.
  4. 4. The active fault tolerance device of DSP processor data memory of claim 1 wherein the interrupt handling module connects one of the hardware interrupt requests to a correctable error state interrupt that currently accesses data, and the data memory correctable error interrupt has a higher interrupt priority;
    the interrupt processing module comprises hardware interrupt processing logic, an interrupt marking register, an interrupt enabling register, an interrupt vector table and a hard interrupt service program area;
    the hardware interrupt processing logic interrupts the current DSP processor pipeline to the pipeline of the normally executed program, and jumps to the interrupt vector table to obtain the entry of the hardware interrupt service program;
    the interrupt marking register and the interrupt enabling register are matched with the hardware interrupt processing logic to enter interrupt service, under the condition that the interrupt enabling register effectively enables the data correctable error interrupt, the hardware interrupt processing logic judges whether the data correctable error interrupt marking of the interrupt marking register is effective or not, and under the condition that the correctable error interrupt marking is effective, the DSP processor enters a data correctable error processing service program; the correctable Error state output Single _ Error of the data Error correction and detection module is connected to the data correctable Error interrupt position of the interrupt flag register; the interrupt flag register samples and records the data correctable error state signal;
    the hard interrupt service program area is entered from an entry specified by an interrupt vector table of the hard interrupt process; when the data correctable error interrupt occurs and the enabling is effective, the DSP processor jumps to a data correctable error interrupt service program from a data correctable error interrupt processing inlet in the interrupt vector table.
  5. 5. The active fault tolerance device of DSP processor data memory of claim 1 wherein the RSEC instruction decode module accesses a entry in the circular Record queue and outputs the entry to the data memory write operation module, and updates the data and parity bits stored in the Record entry back to the data memory through the data memory write operation module.
  6. An active fault tolerance method for a data storage of a DSP processor, comprising the steps of:
    step 1, initializing a DSP processor, and starting a correctable error interrupt response; enabling the DSP processor to respond to correctable hardware interrupts when correctable errors of the data memory occur;
    step 2, accessing data from the data memory according to the LOAD instruction, and checking the error state of the data according to the expansion of the LOAD instruction; performing data correctable error/uncorrectable error state processing on different error states in the LOAD instruction execution process;
    executing step 3 after the data can be corrected and processed by mistake; under the condition that uncorrectable errors of data occur, generating an uncorrectable error interrupt signal, finishing the execution of the current instruction of the DSP processor, and processing other instructions;
    step 3, under the condition that data can be corrected wrongly, judging whether the STORE instruction exists in the current processor assembly line and the subsequent program instruction execution process; if yes, continuing the step 4; otherwise, turning to the step 5 for processing;
    step 4, executing the STORE instruction, and according to the expansion operation of the STORE instruction, writing the data of the STORE instruction into the memory, and simultaneously recording the control information and the data of the STORE instruction into a circular Record queue;
    step 5, the processor responds to the correctable error hardware interrupt; triggering hardware interrupt of a processor by correcting data errors, and entering data error correction interrupt service program for processing;
    step 6, in the interrupt service program, a loop call instruction RSEC instruction processes the records in the loop Record queue until the queue is read empty; after correctable errors are corrected, writing data by the memory of the STORE instruction to refresh the data memory, and finishing updating corrected correctable errors back to the data memory;
    step 7, the error interrupt can be corrected and returned;
    and 8, normally operating the DSP processor.
  7. 7. The active fault tolerance method of DSP processor data storage according to claim 6, wherein the LOAD instruction processing method expanded in step 2 comprises,
    a-1, accessing an instruction by a DSP processor;
    a-2, judging whether an LODA instruction exists or not according to the instruction operation code; is a LOAD instruction, process step A-3; otherwise, processing step A-12;
    a-3, decoding a LOAD instruction;
    a-4, accessing a data memory to acquire data;
    step A-5, data are sent to a data Error correction and detection module for Error correction and detection, and Single _ Error and Multiple _ Error which can correct errors are judged;
    step A-6, whether an uncorrectable Error happens or not; if yes, turning to the step A-10 for processing; otherwise, turning to the step A-7;
    step A-7, writing the data output by the data error detection and correction module into a target register group according to the target register index of the LOAD instruction;
    step A-8, judging whether a correctable Error Single _ Error occurs; if yes, setting the error correctable register, and processing the step A-9; otherwise, turning to the step A-11;
    step A-9, writing the error-corrected data output by the data error correction and detection module, the check code and the access control information on the LOAD instruction pipeline into a circular Record queue in sequence; turning to step A-11;
    step A-10, generating an uncorrectable multi-bit error and generating an uncorrectable error interrupt signal;
    step A-11, finishing the current instruction execution of the DSP processor;
    step A-12, the DSP processor performs other instruction processing.
  8. 8. The active fault-tolerant method for data storage of DSP processor according to claim 6, wherein the processing method of extended STORE in step 4 comprises:
    b-1, fetching an instruction by the DSP processor;
    b-2, judging whether the STORE instruction exists or not according to the instruction operation code; is a STORE instruction, process step B-3; otherwise, turning to the step B-9;
    step B-3, decoding the STORE instruction; reading an operand from a register set;
    b-4, generating an operand check code;
    b-5, judging whether the current error correctable state register is set; if yes, processing the step B-6; otherwise, turning to the step B-7;
    b-6, sequentially writing the operand data, the operand check code and the access control information on the STORE instruction pipeline into a circular Record queue;
    b-7, writing the operand data and the operand check code into a data memory;
    b-8, finishing the execution of the current instruction of the DSP processor;
    and step B-9, the DSP processor performs other instruction processing.
  9. 9. The active fault tolerance method for data storage of DSP processor according to claim 7 or 8, wherein in step A-9 or step B-6, the method for circulating Record queue records is:
    step E-1, if Single _ Error is true, writing the data and check code corrected by Error correction method and the read data base address, parallelism and word/byte access mode of the data memory into the tail of the queue, and turning to step E-3; otherwise, processing step E-2;
    step E-2, if the current correctable error state register is Valid and the Write data operation Valid Write _ Valid of the memory Write operation instruction STORE is Valid, writing the data and the check code generated according to the coding method, the Write data base address, the parallelism and the word/byte access mode BW of the current data memory into the tail of the queue, and turning to step E-3; otherwise, turning to the step E-3;
    and E-3, detecting whether the queue overflows or not, and setting corresponding status bits.
  10. 10. The active fault tolerance method of DSP processor data storage according to claim 6,
    in step 6, when the cyclic call instruction RSEC instructs to process the records in the cyclic Record queue, the RESC instructs to read one Record from the head of the cyclic Record queue, and the data in the Record is updated to the data storage according to the access control information; the operation method of the RESC instruction comprises the following steps:
    step C-1, instruction fetching;
    step C-2, judging whether an RESC instruction exists or not according to the instruction operation code; is the RESC instruction, process step C-3; otherwise, turning to the step C-9;
    step C-3, decoding the RESC instruction, and outputting a control signal to a queue access module;
    step C-4, the queue access module reads a Record (head) from the head of the circular Record queue;
    step C-5, sending the data and control information of record (head) to a data memory write operation module;
    step C-6, writing the operand data and the operand check code into a data memory;
    c-7, clearing the correctable error identification register;
    c-8, finishing the instruction execution;
    step C-9, other instruction processing.
CN201711192783.8A 2017-11-24 2017-11-24 Active fault tolerance method and device for data storage of DSP (digital Signal processor) Active CN107992376B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711192783.8A CN107992376B (en) 2017-11-24 2017-11-24 Active fault tolerance method and device for data storage of DSP (digital Signal processor)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711192783.8A CN107992376B (en) 2017-11-24 2017-11-24 Active fault tolerance method and device for data storage of DSP (digital Signal processor)

Publications (2)

Publication Number Publication Date
CN107992376A CN107992376A (en) 2018-05-04
CN107992376B true CN107992376B (en) 2020-10-30

Family

ID=62033186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711192783.8A Active CN107992376B (en) 2017-11-24 2017-11-24 Active fault tolerance method and device for data storage of DSP (digital Signal processor)

Country Status (1)

Country Link
CN (1) CN107992376B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108845832B (en) * 2018-05-29 2023-05-30 西安微电子技术研究所 Pipeline subdivision device for improving main frequency of processor
CN109002322B (en) * 2018-06-26 2021-07-23 飞腾技术(长沙)有限公司 Register allocation and release method and component for performing component module level verification
CN110647357B (en) * 2018-06-27 2021-12-03 展讯通信(上海)有限公司 Synchronous multithread processor
CN113626235A (en) 2018-06-28 2021-11-09 华为技术有限公司 Fault tolerance processing method and device and server
CN110727401B (en) * 2019-09-09 2021-03-02 无锡江南计算技术研究所 Memory access system
CN111798917B (en) * 2020-06-30 2021-07-30 湘潭大学 Data processing method and device for dynamic test result of single event effect of memory
CN112230995B (en) * 2020-10-13 2024-04-09 广东省新一代通信与网络创新研究院 Instruction generation method and device and electronic equipment
CN113671924B (en) * 2021-10-25 2022-01-25 西安热工研究院有限公司 DCS real-time value setting method and system, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8190973B2 (en) * 2007-12-21 2012-05-29 Arm Limited Apparatus and method for error correction of data values in a storage device
CN103139060A (en) * 2013-03-01 2013-06-05 哈尔滨工业大学 High-fault-tolerance controller area network (CAN) bus digital gateway based on double digital signal processors (DSPs)
UA83799U (en) * 2013-04-26 2013-09-25 Леонид Григорьевич Гулега Fault-tolerant hydroacoustic station
CN104239120A (en) * 2014-08-28 2014-12-24 华为技术有限公司 State information synchronization method, state information synchronization device and state information synchronization system for virtual machine
CN105511984A (en) * 2015-11-27 2016-04-20 中国航天科技集团公司第九研究院第七七一研究所 Processor fault-tolerant structure based on active link backup data, and method thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10248337B2 (en) * 2015-12-08 2019-04-02 Ultrata, Llc Object memory interfaces across shared links

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8190973B2 (en) * 2007-12-21 2012-05-29 Arm Limited Apparatus and method for error correction of data values in a storage device
CN103139060A (en) * 2013-03-01 2013-06-05 哈尔滨工业大学 High-fault-tolerance controller area network (CAN) bus digital gateway based on double digital signal processors (DSPs)
UA83799U (en) * 2013-04-26 2013-09-25 Леонид Григорьевич Гулега Fault-tolerant hydroacoustic station
CN104239120A (en) * 2014-08-28 2014-12-24 华为技术有限公司 State information synchronization method, state information synchronization device and state information synchronization system for virtual machine
CN105511984A (en) * 2015-11-27 2016-04-20 中国航天科技集团公司第九研究院第七七一研究所 Processor fault-tolerant structure based on active link backup data, and method thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Sensorless direct torque control ofinduction motors with fault-;Xin Wang;《2017 IEEE Energy Conversion Congress and Exposition (ECCE)》;20171107;全文 *
面向数字信号处理的自修复可重构阵列设计;汪金林、王友仁、张砦;《电子测量与仪器学报》;20140831;全文 *

Also Published As

Publication number Publication date
CN107992376A (en) 2018-05-04

Similar Documents

Publication Publication Date Title
CN107992376B (en) Active fault tolerance method and device for data storage of DSP (digital Signal processor)
US7069494B2 (en) Application of special ECC matrix for solving stuck bit faults in an ECC protected mechanism
US7272773B2 (en) Cache directory array recovery mechanism to support special ECC stuck bit matrix
US6480975B1 (en) ECC mechanism for set associative cache array
KR101374455B1 (en) Memory errors and redundancy
US6718494B1 (en) Method and apparatus for preventing and recovering from TLB corruption by soft error
US7533321B2 (en) Fault tolerant encoding of directory states for stuck bits
US9058178B2 (en) Selective posted data error detection based on request type
US8190973B2 (en) Apparatus and method for error correction of data values in a storage device
US10268539B2 (en) Apparatus and method for multi-bit error detection and correction
US9298458B2 (en) Performance of emerging applications in a virtualized environment using transient instruction streams
US9389867B2 (en) Speculative finish of instruction execution in a processor core
US20070044003A1 (en) Method and apparatus of detecting and correcting soft error
CN107885611B (en) Fault-tolerant method and device for hierarchical instruction memory structure capable of actively writing back
US6571317B2 (en) Replacement data error detector
US20110161783A1 (en) Method and apparatus on direct matching of cache tags coded with error correcting codes (ecc)
US8990643B2 (en) Selective posted data error detection based on history
US8739012B2 (en) Co-hosted cyclical redundancy check calculation
TWI242120B (en) Method and apparatus for providing error correction within a register file of a CPU
US7689891B2 (en) Method and system for handling stuck bits in cache directories
US6701484B1 (en) Register file with delayed parity check
US8495452B2 (en) Handling corrupted background data in an out of order execution environment
US7607048B2 (en) Method and apparatus for protecting TLB's VPN from soft errors
CN117251313A (en) L0BTB fault tolerance design system and method based on full connection structure
Li et al. A Write-Buffer Scheme to Protect Cache Memories Against Multiple-Bit Errors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant