US11557365B2 - Combined ECC and transparent memory test for memory fault detection - Google Patents
Combined ECC and transparent memory test for memory fault detection Download PDFInfo
- Publication number
- US11557365B2 US11557365B2 US16/542,776 US201916542776A US11557365B2 US 11557365 B2 US11557365 B2 US 11557365B2 US 201916542776 A US201916542776 A US 201916542776A US 11557365 B2 US11557365 B2 US 11557365B2
- Authority
- US
- United States
- Prior art keywords
- data
- bit
- memory
- xor
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C29/00—Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
- G11C29/04—Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
- G11C29/08—Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
- G11C29/12—Built-in arrangements for testing, e.g. built-in self testing [BIST] or interconnection details
- G11C29/38—Response verification devices
- G11C29/42—Response verification devices using error correcting codes [ECC] or parity check
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1008—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
- G06F11/1048—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using arrangements adapted for a specific error detection or correction feature
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1008—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
- G06F11/1068—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices in sector programmable memories, e.g. flash disk
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C29/00—Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
- G11C29/52—Protection of memory contents; Detection of errors in memory contents
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C11/00—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
- G11C11/21—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
- G11C11/34—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
- G11C11/40—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
- G11C11/401—Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
- G11C11/4063—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing
- G11C11/407—Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing for memory cells of the field-effect type
- G11C11/409—Read-write [R-W] circuits
- G11C11/4096—Input/output [I/O] data management or control circuits, e.g. reading or writing circuits, I/O drivers or bit-line switches
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C29/00—Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
- G11C29/04—Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
- G11C2029/0409—Online test
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C29/00—Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
- G11C29/56—External testing equipment for static stores, e.g. automatic test equipment [ATE]; Interfaces therefor
- G11C2029/5602—Interface to device under test
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C29/00—Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
- G11C29/04—Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
- G11C29/08—Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
- G11C29/10—Test algorithms, e.g. memory scan [MScan] algorithms; Test patterns, e.g. checkerboard patterns
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C29/00—Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
- G11C29/04—Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
- G11C29/08—Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
- G11C29/12—Built-in arrangements for testing, e.g. built-in self testing [BIST] or interconnection details
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C29/00—Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
- G11C29/04—Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
- G11C29/08—Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
- G11C29/12—Built-in arrangements for testing, e.g. built-in self testing [BIST] or interconnection details
- G11C29/44—Indication or identification of errors, e.g. for repair
Definitions
- This technical field relates to memory fault detection.
- ECC error correction codes
- ECC algorithms are standard mechanisms to cope with transient and permanent faults in memories. Cyclic redundancy codes (CRC) is a common ECC algorithm that adds additional bits (e.g., 8 bits of CRC) to each data word (e.g., 64 bits of data) to provide single-bit-error correction and dual-bit-error detection for a memory. ECC algorithms are also used in part to correct permanent faults that are detected in production test. This use of ECC algorithms in production increases yield; however, the correction and detection capability of the post-production ECC algorithms is then not completely accessible to correct errors occurring during operation in the field.
- CRC Cyclic redundancy codes
- Memory scrubbing which refers to scanning all of the memory contents in regular time intervals, in combination with ECC is used to detect and correct faults, especially soft errors. Memory scrubbing is often used to prevent the accumulation of single, correctable errors that ultimately lead to uncorrectable multi-bit errors within a data word. Scrubbing rates vary widely depending upon memory sizes and solutions from scanning the whole memory about every 10 seconds, to every 45 minutes, to once a day or more.
- Bit spreading of logically adjacent cells is another common technique. Because single event upsets often disturb geometrically neighboring memory cells, bit spreading is used to spread geometrically neighboring memory cells to different memory words. Thus, multiple faults in geometrically neighboring cells lead to correctable single-bit errors in different data words instead of uncorrectable multi-bit errors in one data word.
- NBTI negative bias temperature instability
- BTI bias temperature instability
- the fault rate is also expected to rise due to increased aging and reduction of supply voltage and hence reduced noise margin.
- BIST built-in self-test circuitry
- warnings can be generated and counter-measures can be initiated before undetectable multi-bit faults lead to a system failure.
- some prior memories have included memory built-in self-test (MBIST) that is executed in the field at regular time intervals to generate memory failure warnings.
- MBIST memory built-in self-test
- TMBIST transparent memory BIST
- TMBIST can include successive exclusive-OR (XOR) operations that are applied to maintain memory contents instead of overwriting the memory with a separate fixed pattern.
- FIG. 1 A provides an example embodiment 100 for a standard, non-transparent MBIST process with respect to example memory cells.
- the memory cells have an initial state 102 and are designated b11, b12, b13, b21, b22, b23, b31, b32, and b33.
- the “xy” numbers in the “bxy” designations used herein represent row and column numbers for the location of the cell in a cell array that makes up the memory.
- the memory cells are overwritten in different marches 112 by various background patterns.
- a checkerboard pattern of 1s and 0s is written to the memory cells in a first march 104 .
- an all-1 background pattern is written to the cells.
- the particular patterns are successively written to and read out of the memory cells.
- the final state 110 of the memory cells includes the last pattern written or all 1s in the example embodiment 100 .
- a “1” represents a high voltage logic level and a “0” as used herein represents a low voltage logic level.
- a “1” represents a low voltage logic level and a “0” represents a low voltage logic level. Either positive logic implementations or negative logic implementations can be used with respect to the techniques described herein.
- FIG. 1 B provides an example embodiment 150 for a TMBIST process with respect to example memory cells.
- embodiment 150 nine (9) adjacent memory cells are again shown.
- the memory cells are designated as b11, b12, b13, b21, b22, b23, b31, b32, and b33 and have an initial state 102 .
- the TMBIST process reads and processes memory contents through successive bit-wise XOR operations with a background pattern and results are written back to the cell.
- the memory cell contents are XOR'ed with a checkerboard pattern of 1s and 0s as represented by the “x” between the 1s and 0s and memory contents. The result of the XOR operation is written to each cell.
- a second march 156 an inverse checkerboard pattern of 0s and 1s is XOR'ed with the contents of the memory cells. The “x” again represents the XOR operation, and the result of the XOR operation is written to each cell.
- an all-1 background pattern is XOR'ed with the contents of the memory cells. The “x” again represents the XOR operation, and the result of the XOR operation is written to each cell.
- each memory cell After finishing all marches of a TMBIST run, each memory cell has been inverted an even number of times and is therefore restored to its original state as represented by the final state 160 .
- the contents of the memory cells can be read out and subjected to error checking, such as through an ECC decoder 162 .
- This TMBIST technique is advantageous in cases where the memory needs to be tested concurrently or intermittent to an application mode, and TMBIST has been shown to achieve at least the same test coverage as a standard MBIST.
- This prior art TMBIST process still changes the contents of the memory cells and causes concurrent testing to interfere with active application modes.
- FIG. 1 A (Prior Art) provides an example embodiment for a standard, non-transparent MBIST process with respect to example memory cells.
- FIG. 1 B (Prior Art) provides an example embodiment for a TMBIST process with respect to example memory cells.
- FIG. 2 is a block diagram of an example embodiment for an integrated circuit including a memory system that combines TMBIST and ECC with memory scrubbing to achieve improved memory fault detection and correction.
- FIG. 3 is a diagram of an example embodiment for a prior art solution where TMBIST marches are performed directly one after another without time gaps.
- FIG. 4 is a diagram of an example embodiment for distributed marching where TMBIST marches are distributed over the entire allotted run time for each TMBIST run.
- FIG. 5 is a flow diagram of an example embodiment where TMBIST and ECC processes are combined with memory scrubbing to achieve improved memory fault detection and correction.
- FIG. 6 is a block diagram of an example embodiment for an integrated circuit similar to FIG. 2 except that the TMBIST controller determines the even bit (M even ) and the odd bit (M odd ) used in XOR operations for each access.
- FIG. 7 A provides an example embodiment where increasing addresses are used for the TMBIST marches of FIG. 6 .
- FIG. 7 B provides an example embodiment that is the same as embodiment in FIG. 7 A except that decreasing addresses are used for the TMBIST marches.
- Embodiments are disclosed that combine ECC and transparent memory tests for memory fault detection.
- the disclosed embodiments combine TMBIST and ECC with memory scrubbing to achieve improved memory fault detection and correction that occurs concurrently with active application modes.
- Each read access of the memory is followed by an XOR operation as part of a TMBIST process.
- Two status bits for each data word are used to store the information about which bits have been XOR'ed, or these status bits are generated when a data word is accessed.
- a data word modified by the TMBIST process can be corrected in case it is read out or overwritten by a concurrently running application.
- TMBIST and ECC with memory scrubbing as described herein are all executed in parallel or quasi-parallel and concurrently to the application.
- FIG. 2 is a block diagram of an example embodiment for an integrated circuit 200 including a memory system 202 that combines TMBIST and ECC with memory scrubbing to achieve improved memory fault detection and correction.
- the integrated circuit 200 in part includes a processor 250 and a memory system 202 .
- the processor 250 executes an application 260 that accesses the memory 230 .
- the application 260 can be program instructions stored in a non-transitory data storage medium that when executed by the processor 250 cause the application functions to be performed.
- the application functions include reads from and writes to the memory 230 .
- the memory 230 within the memory system 202 includes memory cells that each store a bit of data (e.g., 1-high voltage logic level, 0-low voltage logic level).
- Each data word 231 includes a payload data 228 (e.g., 64 bits) and ECC data 226 (e.g., 8 bits).
- each data word 231 also includes TMBIST data 224 (e.g., 2 bits).
- the TMBIST data can include an even bit (M even ) 244 and odd bit (M odd ) 245 .
- the even bit (M even ) 244 is used in XOR operations for cells in the memory 230 where the row and column numbers for the cell location add up to an even number.
- the odd bit (M odd ) 245 is used in XOR operations for cells in the memory where the row and address column numbers for the cell location add up to an odd number.
- data words 231 do not include the TMBIST data 224 , and the controller 204 determines the even bit (M even ) 244 and odd bit (M odd ) 245 when each data word 231 is accessed.
- the memory 230 can be accessed by a processor 250 for normal application mode operations and can be accessed by the controller 204 for TMBIST operations as described herein.
- the controller 204 in part includes an address counter 262 and a march counter 264 .
- Application data 208 to be written to memory 230 is provided to multiplexer 210 .
- Multiplexer 210 also receives test data 209 from controller 204 .
- the application data 208 and the test data 209 provide the payload data 228 for memory writes to data words 231 within the memory 230 .
- Application address and control signals 212 associated with memory accesses (e.g., writes or reads) by an application are provided to multiplexer 214 .
- Multiplexer 214 also receives test address and control signals 213 associated with memory accesses (e.g., writes or reads) for TMBIST operations from controller 204 .
- the processor 250 provides control signals 216 to multiplexers 210 and 214 to select application data/address/control signals 208 / 212 or to select test data/address/control signals 209 / 213 .
- the selected test data 209 or application data 208 is output to the ECC encoder 218
- the selected test address/control signals 213 or application address/control signals 212 is output to the memory 230 .
- the memory 230 includes internal control circuits that use the test or application address/control signals 212 / 213 to select the data word 231 to access using the address signals and to determine the operation to perform (e.g., read, write) on the accessed data word 231 using the control signals.
- the ECC encoder 218 receives the data output from multiplexer 210 . As indicated above, this data output provides the payload data 228 for a data word 231 within memory 230 .
- the ECC encoder 218 generates the ECC data 226 for the data word 231 using one or more ECC algorithms. For example, the ECC encoder 218 processes the bits within a data word 231 and generates a code, such as an 8-bit code, that represents the payload data 228 within the data word 231 .
- the ECC decoder 236 uses the same ECC algorithm(s) to generate a check code that is compared to the ECC data 226 to detect and/or correct errors within the data word 231 .
- the ECC encoder 218 provides the ECC data 226 to the memory 230 through output 219 , and the ECC encoder 218 provides even cell data 242 and odd cell data 243 for the payload data 228 to input XOR circuits 220 and 222 , respectively.
- the XOR circuit 220 also receives the even bit (M even ) 244 for XOR operations from the TMBIST controller 204
- the XOR circuit 222 also receives the odd bit (M odd ) 245 for XOR operations from the TMBIST controller 204 .
- the input XOR circuits 220 / 222 perform an XOR operation on the inputs and provides outputs to the even cells and odd cells within memory 230 , respectively.
- even cell data 246 for the payload 228 is provided to XOR circuit 232
- odd cell data 247 for the payload 228 is provided to XOR circuit 234 .
- the output XOR circuits 232 and 234 also receive the even bit (M even ) 244 and the odd bit (M odd ) 245 from the TMBIST controller 204 .
- the output of the XOR circuits 232 / 234 is provided to ECC decoder 236 , and the ECC decoder 236 outputs this XOR data as output data 238 for the data word 231 being read.
- the ECC decoder 236 also receives the ECC data 226 from the memory 230 through an output 235 .
- the ECC decoder 236 generates a check code using the same ECC algorithm(s) used by the ECC encoder 218 . If the check code matches the ECC data 226 , then no error is indicated. If the check code does not match the ECC data 226 , then an error is detected and an error message 240 is generated and output to another circuit such as the processor 250 . For example, The ECC decoder 236 can output warnings in case of correctable errors and error messages in case of uncorrectable errors. Different or additional error messages can also be output.
- the XOR operations performed by the XOR circuits 220 / 222 / 232 / 234 provide the following outputs with respect to two inputs (A, B) as shown in the XOR truth table below.
- a input is assumed to be the even bit (M even ) 244 or the odd bit (M odd ) 245
- the B input is assumed to be the even cell data 242 / 246 or odd cell data 243 / 247 .
- the TMBIST controller 204 performs a memory test using multiple kinds of marches and related data patterns.
- the marches 206 are used.
- an all-0 pattern is used with the even bit (M even ) 244 set to 0 and the odd bit (M odd ) 245 set to 0.
- a checkerboard pattern is used with the even bit (M even ) 244 set to 1 and the odd bit (M odd ) 245 set to 0.
- an inverse checkerboard pattern is used with the even bit (M even ) 244 set to 0 and the odd bit (M odd ) 245 set to 1.
- an all-1 pattern is used with the even bit (M even ) 244 set to 1 and the odd bit (M odd ) 245 set to 1.
- M even even bit
- M odd odd bit
- all of the memory 230 is accessed one data word 231 at a time, and XOR operations are performed on the bits within these data words 231 by XOR circuits 220 / 222 / 232 / 234 .
- each of the marches 206 through the whole memory 230 is either an upward march with increasing addresses or a downward march with decreasing addresses.
- control signals 216 from the processor 250 control the selections made by the multiplexers 210 and 214 .
- the address and control signals 212 / 213 can include the memory address for the data word 231 being accessed in the memory 230 and also include control signals such as read commands, write commands, chip-enable commands, or other memory control signals.
- the only delay introduced is due to the multiplexers 210 / 214 that select between TMBIST test signals 209 / 213 and the application signals 208 / 212 .
- the even bit (M even ) 244 and the odd bit (M odd ) 245 are first read for the data word 231 being accessed, for example, under control of the processor 250 .
- bits 244 / 245 are then used for the XOR operations by XOR circuits 220 / 222 on the bits for the data word 231 to be written to the memory 230 . As such, the write access takes more time than the read access. It is further noted that for one embodiment described further below, the even bit (M even ) 244 and the odd bit (M odd ) 245 are determined by the controller 204 rather than being stored in and read from the data word 231 .
- processor 250 can be a microcontroller, a microprocessor, a programmable logic device, or other programmable circuit that executes program instructions stored in a non-volatile data storage device to carry out the functions described herein.
- controller 204 can also be implemented as a dedicated logic circuit, dedicated controller, or other hardware digital solution that implements the control actions and functions described herein.
- the controller 204 can also be implemented as a microcontroller, a microprocessor, a programmable logic device, or other programmable circuit that executes program instructions stored in a non-volatile data storage device to carry out the control actions and functions described herein. Other variations can also be implemented.
- Benefits provided by the disclosed embodiments include reduced requirements for chip area, power, and latency along with synergies of enhanced fault detection.
- Any correctable error or uncorrectable error found by the ECC decoder 236 leads to warnings or error indications through messages 240 .
- errors are detected and counteractions can be applied to accumulating hard errors in inactive memory regions, to single event upsets, to soft errors, and to other detected errors.
- the warnings or other error indications through messages 240 can result in flagging the system as unreliable, generating a system reset, generating a system reboot, outputting a warning message, initiating a system shutdown, or other action.
- the disclosed embodiments provide word-based TMBIST processing as compared to the prior TMBIST processing described above with respect to FIG. 1 B (Prior Art).
- each n-bit wide data word in memory is modified and re-written n times during each march.
- only one modification is required for each data word 231 in memory 230 for in each march.
- the overhead with respect to execution time and power consumption is greatly reduced as compared to the prior solution.
- prior solutions execute all TMBIST marches directly one after each other without gaps as described below with respect to FIG. 3 (Prior Art).
- the disclosed embodiments preferably distribute the marches 206 over the allotted run time for one TMBIST run as described below with respect to FIG. 4 .
- This distributed marching provides advantages over the non-distributed marches of prior solutions. For example, the distributed marching tends to reverse performance degradations due to BTI. Further, the distributed marching can distribute the memory scrubbing and the related soft-error correction evenly over the allotted run time of one TMBIST run.
- the disclosed embodiments can be implemented without the distributed marching, it is preferred due to these and other potential advantages.
- FIG. 3 is a diagram of an example embodiment 300 for a prior solution where TMBIST marches 306 are performed directly one after another without time gaps.
- the TMBIST runs 304 are performed within a time period 310 with a delay interval 308 between the end of a previous TMBIST run 304 and the next TMBIST run 304 .
- the marches 306 within each TMBIST run 304 are performed directly one after another without time gaps.
- a second march (MARCH 2 ) 314 begins at the end of the first march (MARCH 1 ) 312 .
- a third march (MARCH 3 ) 316 begins at the end of the second (MARCH 2 ) 314 .
- MARCH n n th march
- FIG. 4 is a diagram of an example embodiment 400 for distributed marching where TMBIST marches 206 are distributed over the entire allotted run time 405 for each TMBIST run 404 .
- the TMBIST runs 404 are performed within the run time 405 , and the marches 206 are distributed across this run time 405 with time gaps 414 between each march.
- a second march (MARCH 2 ) 408 begins after a time gap 414 from the end of the first march (MARCH 1 ) 406 .
- a third march (MARCH 3 ) 408 begins after a time gap 414 from the end of the second (MARCH 2 ) 408 . This continues until an n th march (MARCH n) 412 is completed.
- Each of the marches 206 will finish relatively quickly (e.g., 2 to 10 milliseconds).
- each time gap 414 is preferably in the range of one or more seconds, one or more minutes, or one or more hours.
- the time gaps 414 between the marches 206 can be made the same so that the marches 206 are evenly spaced and distributed over the whole run time 405 .
- one or more different amounts of time can be used for time gaps 414 so that they are not the same and so that the marches 206 are distributed over the whole run time 405 but are not evenly spaced within the whole run time 405 .
- Other timing variations could also be implemented while still using relatively large time gaps 414 between each march within the marches 206 .
- FIG. 5 is a flow diagram of an example embodiment 500 where TMBIST and ECC processes are combined with memory scrubbing to achieve improved memory fault detection and correction.
- the integrated circuit 200 including the memory system 202 is powered up.
- a march counter 264 is set to an initial state (e.g., address 0).
- an address counter 262 is set to an initial state (e.g., count 0); the march counter is incremented; and the even bit (M even ) 244 and the odd bit (M odd ) 245 are set based upon the current march.
- a determination is made whether an application is requesting a write access.
- Block 510 the data is corrected depending upon the current even bit (M even ) 244 and the odd bit (M odd ) 245 for the data word 231 being accessed. The data is then written to the data word 231 within the memory 230 . If “NO,” then flow passes to block 512 .
- block 512 a determination is made whether the application is requesting a read access. If “YES,” then flow passes to block 514 where data is from the data word 231 being accessed within the memory 230 . The data is then corrected depending upon the current even bit (M even ) 244 and the odd bit (M odd ) 245 for the data word 231 being read from memory 230 . If “NO,” then flow passes to block 516 .
- Blocks 508 and 512 effectively prioritize accesses to the memory by the application over test accesses by the TMBIST controller 204 . If block 516 is reached, then the application is not currently requesting a write access or a read access.
- block 516 When block 516 is reached, the address counter 262 is incremented. In block 518 , ECC data 226 and payload data 228 are read from the data word 231 associated with the address in the address counter 262 .
- the ECC decoder 236 performs an ECC operation to generate a check code that is compared to the ECC data 226 from the data word 231 being read. In block 510 , a determination is made whether uncorrectable errors have been detected from this comparison. If “YES,” then block 522 is reached where one or more fault containment actions are taken. For example, error messages 240 can be sent to the processor 250 or other destinations to indicate that an uncorrectable error has been detected, and action can be taken in response.
- block 524 the XOR operations for the current TMBIST march is performed by XOR circuits 220 / 222 on the data and this XOR'ed data is written to the memory 230 .
- block 526 a determination is made whether the address counter is at the end of the memory 230 such that the entire memory 230 has been processed. If “NO,” then flow passes back to block 508 . If “YES,” then flow passes to block 528 where the march counter 264 is incremented.
- block 530 a determination is made whether the march counter is at the last march such that the TBMIST process has completed. If “NO,” then flow passes back to block 506 . If “YES,” then flow passes back to block 504 .
- each march from block 504 is controlled in block 532 so that the marches are distributed across the allotted run time 405 for the TMBIST process.
- the processor 250 or the controller 204 provides this march distribution control represented by block 532 , although other control techniques could also be implemented. It is further noted that different or additional process steps could also be implemented with respect to embodiment 500 while still taking advantage of the techniques described herein.
- embodiments can be implemented that do not store the TMBIST data 224 including the even bit (M even ) 244 and the odd bit (M odd ) 245 within each data word 231 .
- the even bit (M even ) 244 and the odd bit (M odd ) 245 are used for XOR operations with the data word 231 being written or read.
- the even bit (M even ) 244 and the odd bit (M odd ) 245 for the data word 231 being accessed is not read from the memory 230 but is determined by the TMBIST controller 204 based upon the location of the data being accessed and the state of the TMBIST marches.
- FIG. 6 is a block diagram of an example embodiment for an integrated circuit 600 similar to integrated circuit 200 of FIG. 2 except that the TMBIST controller 204 determines the even bit (M even ) 244 and the odd bit (M odd ) 245 for each access.
- the embodiment of FIG. 6 is the same as the embodiment of FIG. 2 except that the TMBIST data 224 is not stored with each data word 231 , and the controller 204 includes a logic engine 602 to determine the even bit (M even ) 244 and the odd bit (M odd ) 245 for each data word 231 being accessed.
- One advantage of the embodiment of FIG. 6 has over the embodiment of FIG. 2 is that the memory 230 does not need to store the additional TMBIST data 224 .
- read accesses by the processor 250 are faster because the even bit (M even ) 244 and the odd bit (M odd ) 245 are not read from the memory 230 .
- the even bit (M even ) 244 and the odd bit (M odd ) 245 will change as soon as the data word 231 is addressed in the current march.
- the address of the data word 231 requested is before or after the address within address counter 262 for of the TMBIST controller 204 . It is noted that “before” and “after” depend upon the direction of incrementing or decrementing the TMBIST address within the address counter 262 in the current march.
- the logic engine 602 makes the before/after determination by comparing the address of the data word 231 being accessed by the application with the address value within the address counter 262 .
- the logic engine 602 can take into account the direction of the TMBIST address counter 262 in the current march when making this comparison. For the calculation of the even bit (M even ) 244 and the odd bit (M odd ) 245 , the logic engine 602 determines the even bit (M even ) 244 and the odd bit (M odd ) 245 for the backgrounds of all previously completed marches for a particular memory cell, the direction of the address counter in the current march, and the relation of the address counter 262 with respect to the address requested by the processor 250 .
- FIG. 7 A provides an example embodiment 700 where increasing addresses are used for the TMBIST marches 206 of FIG. 6 .
- the memory cells are designated b11, b12, b13, b21, b22, b23, b31, b32, and b33 and have an initial state 702 .
- the “xy” numbers in the “bxy” designations used herein represent row and column numbers for the location of the cell in a cell array that makes up the memory.
- the first march 252 from FIG. 6 is not shown.
- a checkerboard pattern is used with the even bit (M even ) 244 set to 1 and the odd bit (M odd ) 245 set to 0.
- an inverse checkerboard pattern is used with the even bit (M even ) 244 set to 0 and the odd bit (M odd ) 245 set to 1.
- an all-1 pattern is used with the even bit (M even ) 244 set to 1 and the odd bit (M odd ) 245 set to 1.
- the current memory cell contents are XOR'ed with the even bit (M even ) 244 and the odd bit (M odd ) 245 for that march as represented by the “x” between the 1s and 0s and memory contents. The result of the XOR operation is written to each cell.
- each memory cell After finishing all marches 206 of a TMBIST run, each memory cell has been inverted an even number of times and is therefore restored to its original state as shown in the final state 704 . Before and during these marches 206 , the contents of the memory cells are subjected to error checking and correction through the ECC decoder 236 as described above.
- the even bit (M even ) 244 and the odd bit (M odd ) 245 for cell (b 21 ) 708 is determined from the backgrounds of the second march 254 and the third march 256 .
- the even bit (M even ) 244 and the odd bit (M odd ) 245 for cell (b 33 ) 720 are determined from the background of only the second march 254 .
- the even bit (M even ) 244 and the odd bit (M odd ) 245 for cells in any particular data word 231 are determined based upon the marches that have been completed or not completed with respect to the data word 231 being processed.
- FIG. 7 B provides an example embodiment 750 that is the same as embodiment 700 in FIG. 7 A except that decreasing addresses are used for one of the TMBIST marches 206 .
- embodiment 750 it is again assumed that the second march 254 has been completed, and the TMBIST process is in the middle of the third march 256 .
- This third march 256 is implemented with a decreasing address as represented by the direction of the arrows in line 756 . It is assumed that cells b 11 . . . b 31 have not yet been XOR'ed and that cells b 21 . . . b 33 have been XOR'ed for the third march 256 .
- these cells will have different values for the even bit (M even ) 244 and the odd bit (M odd ) 245 .
- cells associated with lower addresses than the current address in the address counter 262 will not have been processed by the current march.
- the even bit (M even ) 244 and the odd bit (M odd ) 245 for cell (b 21 ) 758 is determined from the background of only the second march 254 .
- the even bit (M even ) 244 and the odd bit (M odd ) 245 for cell (b 33 ) 760 is determined from the backgrounds of both the second march 254 and the third march 256 .
- the even bit (M even ) 244 and the odd bit (M odd ) 245 for cells in any particular data word 231 are determined based upon the marches that have been completed or not completed with respect to the data word 231 being processed.
- a system within an integrated circuit including a memory, an ECC encoder, input XOR circuits, output XOR circuits, an ECC decoder, a controller, and a processor.
- the memory has a plurality of data words, and each data word includes payload data and error correction code (ECC) data.
- ECC error correction code
- the ECC encoder is coupled to receive input data and to provide the ECC data to the memory for each data word, and the ECC data is based upon the input data.
- the input XOR circuits are coupled to receive the input data from the ECC encoder and to output XOR'ed data as the payload data to the memory.
- the output XOR circuits are coupled to receive the payload data from the memory and to output XOR'ed data.
- the ECC decoder is coupled to receive the ECC data from the memory and the XOR'ed data from the output XOR circuits, and the ECC decoder has output data and error messages as an output.
- the controller is configured to run a transparent memory built-in self-test (TMBIST) process and having test data and test address and control signals as outputs.
- the processor is configured to execute an application to generate application data and application address and control signals.
- the system also includes a first multiplexer circuit coupled to select either the test data or the application data to provide as the input data to the ECC encoder, and a second multiplexer circuit coupled to select either the test address and control signals or the application address and control signals to provide to the memory. Further, access to the memory is provided to the processor and the controller during active operation of the application executed by the processor, and the controller is configured to use the TMBIST process to test the memory during the active operation of the application.
- test address and control signals and the application address and control signals are configured to determine a data word to be accessed within the memory.
- the TMBIST process includes a plurality of marches with each march including a pattern to XOR with contents of cells within the memory.
- the pattern for each march includes an even bit for XOR operations with even cells in the memory and an odd bit for XOR operations with odd cells in the memory.
- the even bit and the odd bit for a current march are stored with each data word that is accessed.
- the even bit and the odd bit are determined by the controller when each data word is accessed.
- the marches include at least one march having a checkerboard pattern, at least one march having an inverse checkerboard pattern, and at least on march having all 1s.
- the marches are distributed over a run time for the TMBIST process with time gaps between the marches.
- each march takes 2 to 10 milliseconds and the run time is one or more seconds.
- the marches are distributed evenly over the run time for the TMBIST process.
- the error messages are based upon comparisons of the ECC data from the memory with check codes generated by the ECC decoder from the payload data from the memory. In further embodiments, the error messages include at least one of detection of an uncorrectable error or detection of a correctable error.
- a method for an integrated circuit including storing a plurality of data words in a memory with each data word including payload data and error correction code (ECC) data, receiving input data with an ECC encoder and outputting the ECC data to the memory for each data word with the ECC data being based upon the input data, performing with input XOR circuits an XOR operation on the input data from the ECC encoder and outputting the XOR'ed data as payload data to the memory, performing with output XOR circuits an XOR operation on the payload data from the memory and outputting XOR'ed data, and receiving the XOR'ed data from the output XOR circuits and the ECC data from the memory with an ECC decoder and outputting output data and error messages.
- ECC error correction code
- the method also includes running a transparent memory built-in self-test (TMBIST) process with a controller and generating test data and test address and control signals as outputs, and executing an application with a processor and generating application data and application address and control signals. Further, the method includes selecting with a first multiplexer either the test data or the application data to provide as the input data to the ECC encoder, and selecting with a second multiplexer circuit either the test address and control signals or the application address and control signals to provide to the memory. In addition, the method includes allowing access to the memory by the processor and the controller during active operation of the application executed by the processor, and testing the memory with the TMBIST process during the active operation of the application.
- TMBIST transparent memory built-in self-test
- the TMBIST process includes a plurality of marches where each march includes a pattern to XOR with contents of cells within the memory, an even bit for XOR operations with even cells in the memory, an odd bit for XOR operations with odd cells in the memory.
- the method includes storing the even bit and the odd bit for a current march with each data word that is accessed. In other further embodiments, the method includes determining with the controller the even bit and the odd bit when each data word is accessed.
- the marches include at least one march having a checkerboard pattern, at least one march having an inverse checkerboard pattern, and at least on march having all 1s.
- the marches are distributed over a run time for the TMBIST process. In other additional embodiments, the marches are distributed evenly over the run time for the TMBIST process.
- the method includes generating check codes with the ECC decoder from payload data for data words and generating the error messages based upon comparisons of the ECC data from the memory with the check codes.
- the functional blocks, components, systems, devices, or circuitry described herein can be implemented using hardware, software, or a combination of hardware and software along with analog circuitry as needed.
- the disclosed embodiments can be implemented using one or more programmed integrated circuits that are programmed to perform the functions, tasks, methods, actions, or other operational features described herein for the disclosed embodiments.
- the one or more programmed integrated circuits can include, for example, one or more processors or configurable logic devices (CLDs) or a combination thereof.
- the one or more processors can be, for example, one or more central processing units (CPUs), controllers, microcontrollers, microprocessors, hardware accelerators, ASICs (application specific integrated circuit), or other integrated processing devices.
- the one or more CLDs can be, for example, one or more CPLDs (complex programmable logic devices), FPGAs (field programmable gate arrays), PLAs (programmable logic array), reconfigurable logic circuits, or other integrated logic devices.
- the programmed integrated circuits, including the one or more processors can be programmed to execute software, firmware, code, or other program instructions that are embodied in one or more non-transitory tangible computer-readable mediums to perform the functions, tasks, methods, actions, or other operational features described herein for the disclosed embodiments.
- the programmed integrated circuits can also be programmed using logic code, logic definitions, hardware description languages, configuration files, or other logic instructions that are embodied in one or more non-transitory tangible computer-readable mediums to perform the functions, tasks, methods, actions, or other operational features described herein for the disclosed embodiments.
- the one or more non-transitory tangible computer-readable mediums can include, for example, one or more data storage devices, memory devices, flash memories, random access memories, read only memories, programmable memory devices, reprogrammable storage devices, hard drives, floppy disks, DVDs, CD-ROMs, or any other non-transitory tangible computer-readable mediums.
- Other variations can also be implemented while still taking advantage of the techniques described herein.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Techniques For Improving Reliability Of Storages (AREA)
Abstract
Description
TABLE |
XOR Truth Table |
Input |
| B | Output | |
0 | 0 | 0 | |
0 | 1 | 1 | |
1 | 0 | 1 | |
1 | 1 | 0 | |
Claims (24)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/542,776 US11557365B2 (en) | 2019-08-16 | 2019-08-16 | Combined ECC and transparent memory test for memory fault detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/542,776 US11557365B2 (en) | 2019-08-16 | 2019-08-16 | Combined ECC and transparent memory test for memory fault detection |
Publications (2)
Publication Number | Publication Date |
---|---|
US20210050068A1 US20210050068A1 (en) | 2021-02-18 |
US11557365B2 true US11557365B2 (en) | 2023-01-17 |
Family
ID=74568194
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/542,776 Active 2040-02-22 US11557365B2 (en) | 2019-08-16 | 2019-08-16 | Combined ECC and transparent memory test for memory fault detection |
Country Status (1)
Country | Link |
---|---|
US (1) | US11557365B2 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11314866B2 (en) * | 2019-11-25 | 2022-04-26 | Dell Products L.P. | System and method for runtime firmware verification, recovery, and repair in an information handling system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5469445A (en) | 1992-03-05 | 1995-11-21 | Sofia Koloni Ltd. | Transparent testing of integrated circuits |
US6216251B1 (en) * | 1999-04-30 | 2001-04-10 | Motorola Inc | On-chip error detection and correction system for an embedded non-volatile memory array and method of operation |
US20030105999A1 (en) * | 2001-12-05 | 2003-06-05 | Koss Louise A. | Apparatus for random access memory array self-test |
US20030169634A1 (en) * | 2002-03-11 | 2003-09-11 | International Business Machines Corporation | Memory array system |
US20050044467A1 (en) * | 2001-11-14 | 2005-02-24 | Wingyu Leung | Transparent error correcting memory |
US20050204232A1 (en) * | 2004-02-27 | 2005-09-15 | Markus Seuring | Technique for combining scan test and memory built-in self test |
US20140095947A1 (en) * | 2012-09-29 | 2014-04-03 | Christopher P. Mozak | Functional memory array testing with a transaction-level test engine |
US9583216B2 (en) | 2015-03-13 | 2017-02-28 | Analog Devices, Inc. | MBIST device for use with ECC-protected memories |
US20170068607A1 (en) * | 2015-09-04 | 2017-03-09 | Dell Products L.P. | Systems and methods for detecting memory faults in real-time via smi tests |
-
2019
- 2019-08-16 US US16/542,776 patent/US11557365B2/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5469445A (en) | 1992-03-05 | 1995-11-21 | Sofia Koloni Ltd. | Transparent testing of integrated circuits |
US6216251B1 (en) * | 1999-04-30 | 2001-04-10 | Motorola Inc | On-chip error detection and correction system for an embedded non-volatile memory array and method of operation |
US20050044467A1 (en) * | 2001-11-14 | 2005-02-24 | Wingyu Leung | Transparent error correcting memory |
US20030105999A1 (en) * | 2001-12-05 | 2003-06-05 | Koss Louise A. | Apparatus for random access memory array self-test |
US20030169634A1 (en) * | 2002-03-11 | 2003-09-11 | International Business Machines Corporation | Memory array system |
US20050204232A1 (en) * | 2004-02-27 | 2005-09-15 | Markus Seuring | Technique for combining scan test and memory built-in self test |
US20140095947A1 (en) * | 2012-09-29 | 2014-04-03 | Christopher P. Mozak | Functional memory array testing with a transaction-level test engine |
US9583216B2 (en) | 2015-03-13 | 2017-02-28 | Analog Devices, Inc. | MBIST device for use with ECC-protected memories |
US20170068607A1 (en) * | 2015-09-04 | 2017-03-09 | Dell Products L.P. | Systems and methods for detecting memory faults in real-time via smi tests |
Non-Patent Citations (9)
Title |
---|
Grigoryan et al., "Advanced ECC-Based FIT Rate Mitigation Technique For Automotive SoCs", International Test Conference, IEEE, 6 pgs. (2018). |
Hwang et al., "Cosmic Rays Don't Strike Twice: Understanding The Nature Of DRAM Errors And The Implications For System Design", ASPLOS, 12 pgs. (Mar. 2012). |
Kumar et al., "Impact Of NBTI On SRAM Read Stability And Design For Reliability", Proceedings Of The 7th International Symposium On Quality Electronic Design, 6 pgs. (2006). |
Nicolaidis, "Theory Of Transparent BIST For RAMs", IEEE Transactions On Computers, vol. 45, No. 10, 16 pgs. (Oct. 1996). |
Nicolaidis, "Transparent BIST For ECC-Based Memory Repair", IEEE, 8 pgs. (2013). |
Schroeder et al., "DRAM Errors In The Wild: A Large-Scale Field Study", Sigmetrics Performance, 12 pgs. (Jun. 2009). |
Schroeder et al., "Flash Reliability In Production: The Expected And The Unexpected", The Advanced Computing Systems Association, Proceedings Of The 14th USENIX Conference, 15 pgs. (Feb. 2016). |
Sridharan et al., "Feng Shui Of Supercomputer Memory", SC13, 11 pgs. (Nov. 2013). |
Thaller et al., "A Transparent Online Memory Test For Simultaneous Detection Of Functional Faults And Soft Errors In Memories", IEEE Transactions On Reliability, vol. 52, No. 4, 10 pgs. (Dec. 2003). |
Also Published As
Publication number | Publication date |
---|---|
US20210050068A1 (en) | 2021-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10019312B2 (en) | Error monitoring of a memory device containing embedded error correction | |
KR102154436B1 (en) | Semiconductor memory device | |
US7694198B2 (en) | Self-repairing of microprocessor array structures | |
US10403387B2 (en) | Repair circuit used in a memory device for performing error correction code operation and redundancy repair operation | |
US7353438B2 (en) | Transparent error correcting memory | |
US5200963A (en) | Self-checking on-line testable static ram | |
US20040237023A1 (en) | Memory device and memory error correction method | |
US20050036371A1 (en) | Semiconductor memory including error correction function | |
JPH04277848A (en) | Memory-fault mapping device, detection-error mapping method and multipath-memory-fault mapping device | |
US9042191B2 (en) | Self-repairing memory | |
US20110099459A1 (en) | Semiconductor memory device | |
US20110231718A1 (en) | Memory repair | |
US20100241900A1 (en) | System to determine fault tolerance in an integrated circuit and associated methods | |
US5535226A (en) | On-chip ECC status | |
JP2018156712A (en) | Semiconductor device and diagnostic method of semiconductor device | |
TWI655637B (en) | Memory device | |
US7949933B2 (en) | Semiconductor integrated circuit device | |
US7231582B2 (en) | Method and system to encode and decode wide data words | |
JP4627411B2 (en) | Memory device and memory error correction method | |
US8995217B2 (en) | Hybrid latch and fuse scheme for memory repair | |
US11557365B2 (en) | Combined ECC and transparent memory test for memory fault detection | |
KR20180061445A (en) | Memory device | |
CN111831486B (en) | Semiconductor device and semiconductor system including the same | |
US8352781B2 (en) | System and method for efficient detection and restoration of data storage array defects | |
JP7107696B2 (en) | Failure detection method for semiconductor device and semiconductor memory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: NXP B.V., NETHERLANDS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SCHAT, JAN-PETER;REEL/FRAME:050074/0974 Effective date: 20190813 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |