US20030023922A1 - Fault tolerant magnetoresistive solid-state storage device - Google Patents

Fault tolerant magnetoresistive solid-state storage device Download PDF

Info

Publication number
US20030023922A1
US20030023922A1 US09/915,179 US91517901A US2003023922A1 US 20030023922 A1 US20030023922 A1 US 20030023922A1 US 91517901 A US91517901 A US 91517901A US 2003023922 A1 US2003023922 A1 US 2003023922A1
Authority
US
United States
Prior art keywords
block
encoded data
ecc
storage cells
ecc encoded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US09/915,179
Inventor
James Davis
Kenneth Eldredge
Jonathan Jedwab
Dominic McCarthy
Stephen Morley
Kenneth Paterson
Frederick Perner
Kenneth Smith
Stewart Wyatt
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Development Co LP
Original Assignee
Hewlett Packard Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US09/915,179 priority Critical patent/US20030023922A1/en
Application filed by Hewlett Packard Co filed Critical Hewlett Packard Co
Assigned to HEWLETT-PACKARD COMPANY reassignment HEWLETT-PACKARD COMPANY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ELDREDGE, KENNETH J., SMITH, KENNETH K., WYATT, STEWART R., DAVIS, JAMES A., PERNER, FREDERICK A.
Assigned to HEWLETT-PACKARD COMPANY reassignment HEWLETT-PACKARD COMPANY ASSIGNMENT BY OPERATION OF LAW Assignors: HEWLETT-PACKARD LIMITED, JEDWAB, JONATHAN, MCCARTHY, DOMINIC P., MORLEY, STEPHEN, PATERSON, KENNETH GRAHAM
Priority to US09/997,199 priority patent/US7149948B2/en
Priority to US10/093,851 priority patent/US7107508B2/en
Priority to EP02254716A priority patent/EP1286360A3/en
Priority to GB0215468A priority patent/GB2380572B/en
Priority to JP2002216151A priority patent/JP2003115196A/en
Priority to JP2002216150A priority patent/JP2003115195A/en
Publication of US20030023922A1 publication Critical patent/US20030023922A1/en
Assigned to HEWLETT-PACKARD DEVELOPMENT COMPANY L.P. reassignment HEWLETT-PACKARD DEVELOPMENT COMPANY L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT-PACKARD COMPANY
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/02Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using magnetic elements
    • G11C11/16Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using magnetic elements using elements in which the storage effect is based on magnetic spin effect
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1048Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using arrangements adapted for a specific error detection or correction feature
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/04Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
    • G11C29/08Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
    • G11C29/12Built-in arrangements for testing, e.g. built-in self testing [BIST] or interconnection details
    • G11C29/38Response verification devices
    • G11C29/42Response verification devices using error correcting codes [ECC] or parity check
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/04Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
    • G11C29/08Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
    • G11C29/12Built-in arrangements for testing, e.g. built-in self testing [BIST] or interconnection details
    • G11C29/44Indication or identification of errors, e.g. for repair

Definitions

  • the present invention relates in general to a magnetoresistive solid-state storage device and to a method for controlling a magnetoresistive solid-state storage device.
  • the invention relates to a magnetoresistive solid-state storage device employing error correction coding.
  • a typical solid-state storage device comprises one or more arrays of storage cells for storing data.
  • Existing semiconductor technologies provide volatile solid-state storage devices suitable for relatively short term storage of data, such as dynamic random access memory (DRAM), or devices for relatively longer term storage of data such as static random access memory (SRAM) or non-volatile flash and EEPROM devices.
  • DRAM dynamic random access memory
  • SRAM static random access memory
  • non-volatile flash and EEPROM devices non-volatile flash and EEPROM devices.
  • many other technologies are known or are being developed.
  • MRAM magnetic random access memory
  • MRAM devices are subject to physical failure, which can result in an unacceptable loss of stored data.
  • Currently available manufacturing techniques for MRAM devices are subject to limitations and as a result manufacturing yields of commercially acceptable MRAM devices are relatively low. Although better manufacturing techniques are being developed, these tend to increase manufacturing complexity and cost. Hence, it is desired to apply lower cost manufacturing techniques whilst increasing device yield. Further, it is desired to increase cell density formed on a substrate such as silicon, but as the density increases manufacturing tolerances become increasingly difficult to control, again leading to higher failure rates and lower device yields. Since the MRAM devices are at a relatively early stage in development, it is desired to allow large scale is manufacturing of commercially acceptable devices, whilst tolerating the limitations of current manufacturing techniques.
  • An aim of the present invention is to provide a magnetoresistive solid-state storage device which is tolerant of at least some failures. Another aim is to provide a method for controlling a magnetoresistive solid-state storage device to tolerate at least some failures.
  • a preferred aim is to provide a magnetoresistive solid-state storage device and a method for controlling such a device which is tolerant of both systematic and random failures.
  • Other preferred aims are to provide a magnetoresistive solid-state storage device and a method for controlling such a device, which allows at least some failures to be tolerated without any loss of stored data, preferably which is efficient to implement, preferably which allows lower cost manufacturing techniques to be employed, and preferably which allows device yield to be increased.
  • a method for controlling a magnetoresistive solid-state storage device having a plurality of storage cells for storing a block of ECC encoded data comprising the steps of: accessing a set of the plurality of storage cells; and determining whether information is unrecoverable from a block of ECC encoded data stored in the accessed storage cells.
  • determination of whether information is unrecoverable from the stored block of ECC encoded data is made by attempting to perform ECC decoding. If the ECC decoding successfully recovers information from the block of ECC encoded data, then use of that set of storage cells can continue in future read and write access cycles. However, if the ECC decoding fails to recover information from the block of ECC encoded data, then preferably remedial action is taken concerning the set of storage cells. For example, the remedial action involves discarding that set of storage cells such that the set is not available in future read and write cycles.
  • the method comprises identifying failed symbols in the block of ECC encoded data, as an output from the ECC decoding step, and comparing the identified number of failed symbols against a threshold value.
  • the threshold value suitably represents a safety margin, such as 50% to 95% of the maximum number of failed symbols which can be corrected by ECC decoding the block of ECC encoded data.
  • the safety margin represents the situation where, although a relatively high proportion of failed symbols have been identified in the block of ECC encoded data, it is reasonable to continue using that set of storage cells in future. Even though further systematic or random failures might be encountered in a future read operation, it is reasonable to expect that the number of failed symbols will still be correctable by ECC decoding the block of ECC encoded data.
  • the accessed set of storage cells is evaluated based on parametric values, prior to attempting ECC decoding of the block of ECC encoded data.
  • the method comprises determining whether original information is expected to be unrecoverable from the block of ECC encoded data stored in the accessed set of storage cells. In particular, it is determined whether original information is expected to be unrecoverable because the probability of failing to correctly perform ECC decoding is unacceptably high. Where original information is not expected to be unrecoverable, then use of the set of storage cells may continue.
  • the first and second embodiments are preferably combined, such that a decision to continue use of the set of storage cells, or take remedial action, is made either after performing a parametric based test as in the second embodiment, or after performing ECC decoding as in the first embodiment, or a decision can be made at either stage.
  • the method comprises determining, from accessing the set of storage cells, failed symbols in the block of ECC encoded data that have been affected by a physical failure.
  • a determination is made whether there are more failed symbols in the block of ECC encoded data than can be corrected by error correction decoding the block of ECC encoded data.
  • ECC decoding the block of ECC encoded data may well fail to recover the original information. In other words, there is an unacceptable probability that decoding the block of ECC encoded data will not correctly recover original information.
  • accessing the set of storage cells comprises obtaining parametric values, which are compared against one or more ranges.
  • a logical bit value is derived, but some of the storage cells can be identified as being affected by a physical failure.
  • a failure count is determined based on the identified failed cells. The failure count can simply represent the number of failed cells, but preferably the failure count is based on failed symbols of the block of ECC encoded data affected by the identified failed cells.
  • the failure count is compared against a threshold value.
  • the threshold value represents the total number of failed symbols which can be corrected by ECC decoding the block of ECC encoded data.
  • the threshold value represents a safety margin less than the total number of failed symbols correctable by ECC decoding, such as between about 50% to 95% of the total number.
  • the threshold value is particularly useful in that only some types of physical failures in MRAM devices can be readily identified from the obtained parametric values, and the threshold value is set such that, given the identified number of failures, it is still reasonable to perform ECC decoding, whilst allowing for an additional number of as yet unidentified failures to affect the block of ECC encoded data.
  • original information is received for storing in the MRAM device in units of a sector, such as 512 bytes.
  • the original information sector is error correction encoded to form one or more blocks of ECC encoded data.
  • a linear ECC scheme such as a Reed-Solomon code is employed.
  • each sector of original information is encoded to form a sector of ECC encoded data comprising four codewords. Each codeword suitably forms the block of ECC encoded data mentioned above.
  • a method for controlling a magnetoresistive solid-state storage device comprising the steps of: receiving original information which it is desired to store; error correction encoding the original information to form a block of ECC encoded data; storing the block of ECC encoded data in a set of magnetoresistive storage cells arranged in at least one array; accessing the set of storage cells; forming logical symbol values of the block of ECC encoded data from the accessed set of storage cells; error correction decoding the block of ECC encoded data to provide recovered information; if the decoding step provided recovered information then outputting the recovered information and continuing use of the set of storage cells, or else if the decoding step did not provide recovered information then taking remedial action in respect of the set of storage cells.
  • the method comprises identifying, from the ECC decoding, zero or more failed symbols in the block of ECC encoded data; comparing the identified number of failed symbols against a threshold value; and, if the ECC decoding did not recover original information, or if the identified number of failed symbols is greater than the threshold value, then taking remedial action concerning the accessed set of storage cells.
  • a method for controlling a magnetoresistive solid-state storage device comprising the steps of: receiving original information which it is desired to store; error correction encoding the original information to form a block of ECC encoded data; storing the block of ECC encoded data in a set of magnetoresistive storage cells arranged in at least one array; accessing the set of storage cells; comparing parametric values obtained by accessing the set of storage cells against one or more ranges; identifying failed cells amongst the accessed set of cells; forming a failure count based on the identified failed cells; comparing the failure count against a threshold value; and determining whether the original information is expected to be unrecoverable from the block of ECC encoded data stored in the accessed set of storage cells.
  • a magnetoresistive solid-state storage device comprising: at least one array of magnetoresistive storage cells; a ECC encoding unit for forming a block of ECC encoded data from a unit of original information; and a controller arranged to store the block of ECC encoded data in a set of the storage cells, access the set of storage cells, and determine whether the original information is unrecoverable from the block of ECC encoded data stored in the accessed set of storage cells.
  • FIG. 1 is a schematic diagram showing a preferred MRAM device including an array of storage cells
  • FIG. 2 shows a preferred logical data structure
  • FIG. 3 shows an overview of a preferred method for controlling an MRAM device
  • FIG. 4 shows a first preferred method for controlling an MRAM device
  • FIG. 5 shows a second preferred method for controlling an MRAM device
  • FIG. 6 is a graph illustrating a parametric value obtained from a storage cell of an MRAM device.
  • FIG. 1 An example MRAM device will first be described with reference to FIG. 1, including a description of the failure mechanisms found in MRAM devices. The preferred methods for controlling such MRAM devices will then be described with reference to FIGS. 2 to 6 .
  • FIG. 1 shows a simplified magnetoresistive solid-state storage device 1 comprising an array 10 of storage cells 16 .
  • the array 10 is coupled to a controller 20 which, amongst other control elements, includes an ECC coding and decoding unit 22 .
  • the controller 20 and the array 10 can be formed on a single substrate, or can be arranged separately.
  • the array 10 comprises of the order of 1024 by 1024 storage cells, just a few of which are illustrated.
  • the cells 16 are each formed at an intersection between control lines 12 and 14 .
  • control lines 12 are arranged in rows, and control lines 14 are arranged in columns.
  • One row 12 and one or more columns 14 are selected to access the required storage cell or cells 16 (or conversely one column and several rows, depending upon the orientation of the array).
  • the row and column lines are coupled to control circuits 18 , which include a plurality of read/write control circuits.
  • one read/write control circuit is provided per column, or read/write control circuits are multiplexed or shared between columns.
  • the control lines 12 and 14 are generally orthogonal, but other more complicated lattice structures are also possible.
  • a single row line 12 and several column lines 14 are activated in the array 10 by the control circuits 18 , and a set of data read from those activated cells.
  • This operation is termed a slice.
  • the row in this example is 1024 storage cells long 1 and the accessed storage cells 16 are separated by a minimum reading distance m, such as sixty-four cells, to minimise cross-cell interference in the read process.
  • m minimum reading distance
  • a plurality of independently addressable arrays 10 are arranged to form a macro-array.
  • a small plurality of arrays 10 (typically four) are layered to form a stack, and plural stacks are arranged together, such as in a 16 ⁇ 16 layout.
  • each macro-array has a 16 ⁇ 18 ⁇ 4 or 16 ⁇ 20 ⁇ 4 layout (expressed as width ⁇ height ⁇ stack layers).
  • the MRAM device comprises more than one macro-array. In the currently preferred MRAM device only one of the four arrays in each stack can be accessed at any one time.
  • a slice from a macro-array reads a set of cells from one row of a subset of the plurality of arrays 10 , the subset preferably being one array within each stack.
  • Each storage cell 16 stores one bit of data suitably representing a numerical value and preferably a binary value, i.e. one or zero.
  • each storage cell includes two films which assume one of two stable magnetisation orientations, known as parallel and anti-parallel.
  • the magnetisation orientation affects the resistance of the storage cell.
  • the resistance is at its highest, and when the magnetic storage cell is in the parallel state, the resistance is at its lowest.
  • the anti-parallel state defines a zero logic state, and the parallel state defines a one logic state, or vice versa.
  • EP-A-0 918 334 discloses one example of a magnetoresistive solid-state storage device which is suitable for use in preferred embodiments of the present invention.
  • failures can occur which affect the ability of the device to store data reliably in the storage cells 16 .
  • Physical failures within a MRAM device can result from many causes including manufacturing imperfections, internal effects such as noise in a read process, environmental effects such as temperature and surrounding electromagnetic noise, or ageing of the device in use.
  • failures can be classified as either systematic failures or random failures.
  • Systematic failures consistently affect a particular storage cell or a particular group of storage cells. Random failures occur transiently and are not consistently repeatable. Typically, systematic failures arise as a result of manufacturing imperfections and ageing, whilst random failures occur in response to internal effects and to external environmental effects.
  • Failures are highly undesirable and mean that at least some storage cells in the device cannot be written to or read from reliably.
  • a cell affected by a failure can become unreadable, in which case no logical value can be read from the cell, or can become unreliable, in which case the logical value read from the cell is not necessarily the same as the value written to the cell (e.g. a “1” is written but a “0” is read).
  • the storage capacity and reliability of the device can be severely affected and in the worst case the entire device becomes unusable.
  • Shorted bits where the resistance of the storage cell is much lower than expected. Shorted bits tend to affect all storage cells lying in the same row and the same column.
  • Open bits where the resistance of the storage cell is much higher than expected. Open bit failures can, but do not always, affect all storage cells lying in the same row or column, or both.
  • Half-select bits where writing to a storage cell in a particular row or column causes another storage cell in the same row or column to change state. A cell which is vulnerable to half select will therefore possibly change state in response to a write access to any storage cell in the same row or column, resulting in unreliable stored data.
  • failure mechanisms are each systematic, in that the same storage cell or cells are consistently affected. Where the failure mechanism affects only one cell, this can be termed an isolated failure. Where the failure mechanism affects a group of cells, this can be termed a grouped failure.
  • data Whilst the storage cells of the MRAM device can be used to store data according to any suitable logical layout, data is preferably organised into basic data units (e.g. bytes) which in turn are grouped into larger logical data units (e.g. sectors).
  • a physical failure, and in particular a grouped failure affecting many cells, can affect many bytes and possibly many sectors. It has been found that keeping information about cells, bytes or even sectors affected by physical failures is not efficient, due to the quantity of data involved. That is, attempts to produce a list of all logical data units rendered unusable due to at least one physical failure, tend to generate a quantity of management data which is too large to handle efficiently.
  • a single physical failure can potentially affect a large number of logical data units, such that avoiding use of all bytes, sectors or other units affected by a failure substantially reduces the storage capacity of the device.
  • a grouped failure such as a shorted bit failure in just one storage cell affects many other storage cells, which lie in the same row or the same column.
  • a single shorted bit failure can affect 1023 other cells lying in the same row, and 1023 cells lying in the same column—a total of 2027 affected cells. These 2027 affected cells may form part of many bytes, and many sectors, each of which would be rendered unusable by the single grouped failure.
  • the preferred embodiments of the present invention employ error correction coding to provide a magnetoresistive solid-state storage device which is error tolerant, preferably to tolerate and recover from both random failures and systematic failures.
  • error correction coding involves receiving original information which it is desired to store and forming encoded data which allows errors to be identified and ideally corrected. The encoded data is stored in the solid-state storage device. At read time, the original information is recovered by error correction decoding the encoded stored data.
  • ECC error correction coding
  • Suitable ECC schemes include both schemes with single-bit symbols (e.g. BCH) and schemes with multiple-bit symbols (e.g. Reed-Solomon).
  • Reed-Solomon codes used in the preferred embodiments of the present invention is: “Reed-Solomon Codes and their Applications”, ED. S. B. Wicker and V. K. Bhargava, IEEE Press, New York, 1994.
  • FIG. 2 shows an example logical data structure used in preferred embodiments of the present invention.
  • Original information 200 is received in predetermined units such as a sector comprising 512 bytes.
  • Error correction coding is performed to produce a block of encoded data 202 , in this case an encoded sector.
  • the encoded sector 202 comprises a plurality of symbols 206 which can be a single bit (e.g. a BCH code with single-bit symbols) or can comprise multiple bits (e.g. a Reed-Solomon code using multi-bit symbols).
  • each symbol 206 conveniently comprises eight bits. As shown in FIG.
  • the encoded sector 202 comprises four codewords 204 , each comprising of the order of 144 to 160 symbols.
  • the eight bits corresponding to each symbol are conveniently stored in eight storage cells 16 .
  • a physical failure which affects any of these eight storage cells can result in one or more of the bits being unreliable (i.e. the wrong value is read) or unreadable (i.e. no value can be obtained), giving a failed symbol.
  • Error correction decoding the encoded data 202 allows failed symbols 206 to be identified and corrected.
  • the preferred Reed-Solomon scheme is an example of a linear error correcting code, which mathematically identifies and corrects completely up to a predetermined maximum number of failed symbols 206 , depending upon the power of the code.
  • a [160,128,33] Reed-Solomon code having one hundred and sixty 8-bit symbols corresponding to one hundred and twenty-eight original information bytes and a minimum distance of thirty-three symbols can locate and correct up to sixteen failed symbols.
  • the ECC scheme employed is selected with a power sufficient to recover original information 200 from the encoded data 202 in substantially all cases.
  • FIG. 3 shows a simplified overview of a preferred method for controlling the MRAM device 1 of FIG. 1.
  • Step 301 comprises accessing a plurality of the storage cells 16 of the MRAM device.
  • the plurality of storage cells correspond to a block of encoded data, such as a codeword 204 , or an encoded sector 202 .
  • a plurality of read operations are performed by accessing the plurality of cells 16 using the row and column control lines 12 and 14 .
  • the read operations provide logical bit values which are used to form the symbols 206 , and the symbols in turn are built into a complete logical block of data such as the codeword 204 .
  • four codewords 204 together form a complete encoded sector 202 , from which the original information sector 200 can be recovered.
  • Step 302 comprises determining whether original information is unrecoverable from the block of encoded data. That is, the step 302 comprises determining whether decoding the block of encoded data is expected not to be able to produce recovered information, or determining whether attempting to decode the block of encoded data does not produce recovered information.
  • the determining step can be performed by ECC decoding the block of encoded data as a logical evaluation technique, or can be performed using physical evaluation techniques, and preferably a combination of both logical and physical techniques are employed as will be described in more detail below.
  • step 302 determines that ECC decoding has not produced recovered information, or is not expected to produce recovered information, then remedial action is taken in step 304 . Otherwise, use of the cells continues in step 303 .
  • the remedial action in step 304 may take any suitable form, to manage future activity in the storage cells 16 .
  • the access of step 301 is immediately repeated, in the hope of avoiding some random errors and this time obtaining symbol values for the encoded data from which the original data can be recovered by ECC decoding.
  • the set of storage cells 16 corresponding to a failed codeword 204 or to a complete encoded sector 202 are identified and discarded, in order to avoid possible loss of data in future. In the currently preferred embodiments it is most convenient to use or discard sets of storage cells corresponding to a sector 202 , although greater or lesser granularity can be applied as desired.
  • FIG. 4 shows a more detailed preferred method for controlling the MRAM device, using logical evaluation of the accessed set of storage cells 16 corresponding to a block of encoded data such as a codeword 204 or an encoded sector 202 .
  • Step 401 comprises accessing the set of storage cells 16 , equivalent to step 301 above.
  • Step 402 comprises performing ECC decoding of the block of encoded data obtained by accessing the storage cells in step 401 .
  • Step 403 comprises determining whether the ECC decoding of step 402 was not successful, in the sense that the ECC decoding has not produced recovered information from the block of data. Where ECC decoding is not successful, it is not possible to recover the original data 200 from the accessed storage cells 16 , and remedial action can be taken as in step 304 .
  • the method includes the step 404 of determining the number of failed symbols identified by the ECC decoding of step 402 , and comparing the identified number of failures against a threshold value.
  • a physical failure in any of the accessed set of storage cells can result in a failed symbol.
  • the threshold value selected for the comparison is preferably in the range of between about 50% and 95% of the maximum number of failures that can be corrected by performing the ECC decoding of step 402 .
  • the threshold value in step 404 is selected on the basis that although a number of failures have been identified in this particular block of data, it is still reasonable to continue using the selected set of storage cells with the expectation of still being able to successfully perform ECC decoding next time those cells are accessed.
  • the threshold value in step 404 provides a safety margin allowing a further failure or failures to occur in the next access, whilst still allowing a successful ECC decoding to be performed.
  • the ECC scheme employed is sufficiently powerful to provide recovered information equivalent to the original information sector 200 .
  • the original information 200 is output from the MRAM device in step 405 .
  • the method of FIG. 4 is conveniently employed whilst the MRAM device is in use.
  • the method of FIG. 4 is applied whilst the device stores variable user data, allowing dynamic management of data storage in the device. For example, it is possible that the number of systematic errors will increase as the device ages. A small number of sets of storage cells such as sectors 202 will become unreliable and should be removed from active use as a remedial action. However, it is expected that most sectors will continue in use reliably, by employing a suitable ECC scheme.
  • the method of FIG. 4 is conveniently applied when the MRAM device is first manufactured, or is first installed, or at power up, or at convenient times subsequently such as a periodic check.
  • a sample of test data is applied to a block such as a sector, and the test method of FIG. 4 performed to establish the suitability of that sector for future use.
  • FIG. 5 shows a second preferred method for controlling the MRAM device 1 .
  • the method is intended for use with a logical block of data such as codeword 204 or an encoded sector 202 .
  • step 501 the set of storage cells corresponding to the block of data are accessed, preferably in a set of read operations.
  • Step 502 comprises obtaining a plurality of parametric values associated with the accessed set of storage cells from the access of step 401 .
  • a read voltage is applied along the row and column control lines 12 , 14 causing a sense current to flow through selected storage cells 16 , which have a resistance determined by parallel or anti-parallel alignment of the two magnetic films.
  • the resistance of a particular cell is determined according to a phenomenon known as spin tunnelling and the cells are often referred to as magnetic tunnel junction storage cells.
  • the condition of the storage cell is determined by measuring the sense current (proportional to resistance) or a related parameter such as response time to discharge a known capacitance.
  • Step 503 comprises comparing the obtained parametric values to one or more predicted ranges.
  • the comparison of step 503 in almost all cases allows a logical value (e.g. one or zero) to be established for each cell.
  • the comparison also conveniently allows at least some forms of physical failure to be identified. For example, it has been determined that a shorted bit failure leads to a very low resistance value in all cells of a particular row and a particular column. Also, open-bit failures can cause a very high resistance value for all cells of a particular row and column.
  • By comparing the obtained parametric values against predicted ranges cells affected by failures such as shorted-bit and open-bit failures can be identified with a high degree of certainty.
  • FIG. 6 is a graph as an illustrative example of the probability (p) that a particular cell will have a certain parametric value, in this case resistance (r), corresponding to a logical “0” in the left-hand curve, or a logical “1” in the right-hand curve.
  • resistance corresponding to a logical “0” in the left-hand curve, or a logical “1” in the right-hand curve.
  • Range 603 represents a medium resistance value where a logical value cannot be ascertained with any degree of certainty.
  • Range 604 is a high resistance range representing a logical “1”.
  • Range 605 is a very high resistance value where an open-bit failure can be predicted with a high degree of certainty.
  • the ranges shown in FIG. 6 are purely for illustration, and many other possibilities are available depending upon the physical construction of the MRAM device 1 , the manner in which the storage cells are accessed, and the parametric values obtained. The range or ranges are suitably calibrated depending, for example, on environmental factors such as temperature, factors affecting a particular cell or cells and their position within the array, or the nature of the cells themselves and the type of access employed.
  • step 504 comprises counting a number of physical failures, as identified in the comparison of step 503 .
  • the count of parametric failures in step 504 is performed on the basis of the number of symbols 206 (each containing one or more bits) which are affected by the identified physical failures.
  • Step 505 comprises comparing the number of parametric failures, i.e. the number of failed symbols identified by parametric testing, against a predetermined threshold value.
  • the number of physical failures can be represented in any suitable form. Depending upon the nature of the ECC scheme employed, some types of failure can be weighted differently to other types of failure. Since the data stored in the storage cells represents encoded data, it is expected that ECC decoding will not be able to recover the original data, where the number of parametric failures is greater than the maximum power of the ECC scheme.
  • the threshold value is suitably selected to represent a value which is equal to or less than the maximum number of failures which the ECC scheme employed is able to correct.
  • the threshold value in step 505 is selected to be substantially less than the maximum power of the ECC decoding scheme, suitably of the order of 50% to 95% of the maximum power.
  • the threshold value in step 505 is selected to represent about 50% to 75% and suitably about 60% of the maximum power of the employed ECC scheme.
  • the step 505 comprises determining the number of parametric failures to be greater than the threshold value, such that performing ECC decoding is expected (with a sufficiently high probability) not to be able to recover information from the encoded data. That is, where the number of parametric failures is greater than the threshold value, there is a greater than acceptable probability that information is unrecoverable from the encoded data.
  • Step 506 comprises determining whether or not to continue use of the set of cells corresponding to the accessed block of data, in view of the number of parametric failures which have been identified. If desired, remedial action can be taken as outlined in step 304 .
  • the physical evaluation of FIG. 5 is particularly useful as a test procedure immediately following manufacture of the device, or at installation, or at power up, or at any convenient time subsequently.
  • the test procedure of FIG. 5 is performed by writing a test set of data to the device and then reading from the device, or by any other suitable parametric testing.
  • each sector comprises four codewords, and a sector is made redundant where any one of its four codewords contains a number of parametric failures which is greater than the threshold value of step 505 .
  • a block of data such as an encoded sector 202 having a number of failed symbols greater than the threshold value is not used at all in the subsequent life span of the device, because the probability of unrecoverable data errors would be too high.
  • the threshold value used in the test procedure is set such that at least one and preferably several failures occurring subsequently will be tolerated. In particular, the threshold value is set to allow further systematic failures to be tolerated together with at least one and preferably several random failures, in a block of data.
  • the parametric evaluation of FIG. 5 is particularly useful in determining shorted-bit and/or open-bit failures in MRAM devices.
  • a systematic failure such as a half select or some forms of isolated bit failure, is not so easily detectable using parametric tests, but is more readily discovered by logical evaluation using ECC decoding as in FIG. 4. Therefore, in particularly preferred embodiments of the present invention the logical evaluation of FIG. 4 is combined with the parametric evaluation of FIG. 5 to provide a practical device which is able to take advantage of the considerable benefits offered by the new MRAM technology whilst minimising the limitations of current available manufacturing techniques.
  • the MRAM device described herein is ideally suited for use in place of any prior solid-state storage device.
  • the MRAM device is ideally suited both for use as a short-term storage device (e.g. cache memory) or a longer-term storage device (e.g. a solid-state hard disk).
  • An MRAM device can be employed for both short term storage and longer term storage within a single apparatus, such as a computing platform.
  • a magnetoresistive solid-state storage device and methods for controlling such a device have been described.
  • the storage device is able to tolerate a relatively large number of errors, including both systematic failures and transient failures, whilst successfully remaining in operation with no loss of original data.
  • Simpler and lower cost manufacturing techniques are employed and/or device yield and device density are increased.
  • overhead of the employed ECC scheme can be reduced.
  • error correction coding and decoding allows blocks of data, e.g. sectors or codewords, to remain in use, where otherwise the whole block must be discarded if only one failure occurs. Therefore, the preferred embodiments of the present invention avoid large scale discarding of logical blocks and reduce or even eliminate completely the need for inefficient control methods such as large-scale data mapping management or physical sparing.

Abstract

A magnetoresistive solid-state storage device (MRAM) performs error correction coding (ECC) of stored information. At manufacture or during use, each logical block of ECC encoded data and/or the corresponding set of storage cells are evaluated to determine suitability for continued use, or whether remedial action is necessary. In a first preferred method ECC decoding is attempted to determine whether information is unrecoverable from the block of ECC encoded data. In a second preferred method a parametric evaluation is made prior to attempting ECC decoding.

Description

  • The present invention relates in general to a magnetoresistive solid-state storage device and to a method for controlling a magnetoresistive solid-state storage device. In particular, but not exclusively, the invention relates to a magnetoresistive solid-state storage device employing error correction coding. [0001]
  • A typical solid-state storage device comprises one or more arrays of storage cells for storing data. Existing semiconductor technologies provide volatile solid-state storage devices suitable for relatively short term storage of data, such as dynamic random access memory (DRAM), or devices for relatively longer term storage of data such as static random access memory (SRAM) or non-volatile flash and EEPROM devices. However, many other technologies are known or are being developed. [0002]
  • Recently, a magnetoresistive storage device has been developed as a new type of non-volatile solid-state storage device (see, for example, EP-A-0918334 Hewlett-Packard). The magnetoresistive solid-state storage device is also known as magnetic random access memory (MRAM) device. MRAM devices have relatively low power consumption and relatively fast access times, particularly for data write operations, which renders MRAM devices ideally suitable for both short term and long term storage applications. [0003]
  • A problem arises in that MRAM devices are subject to physical failure, which can result in an unacceptable loss of stored data. Currently available manufacturing techniques for MRAM devices are subject to limitations and as a result manufacturing yields of commercially acceptable MRAM devices are relatively low. Although better manufacturing techniques are being developed, these tend to increase manufacturing complexity and cost. Hence, it is desired to apply lower cost manufacturing techniques whilst increasing device yield. Further, it is desired to increase cell density formed on a substrate such as silicon, but as the density increases manufacturing tolerances become increasingly difficult to control, again leading to higher failure rates and lower device yields. Since the MRAM devices are at a relatively early stage in development, it is desired to allow large scale is manufacturing of commercially acceptable devices, whilst tolerating the limitations of current manufacturing techniques. [0004]
  • An aim of the present invention is to provide a magnetoresistive solid-state storage device which is tolerant of at least some failures. Another aim is to provide a method for controlling a magnetoresistive solid-state storage device to tolerate at least some failures. [0005]
  • A preferred aim is to provide a magnetoresistive solid-state storage device and a method for controlling such a device which is tolerant of both systematic and random failures. Other preferred aims are to provide a magnetoresistive solid-state storage device and a method for controlling such a device, which allows at least some failures to be tolerated without any loss of stored data, preferably which is efficient to implement, preferably which allows lower cost manufacturing techniques to be employed, and preferably which allows device yield to be increased. [0006]
  • According to a first aspect of the present invention there is provided a method for controlling a magnetoresistive solid-state storage device having a plurality of storage cells for storing a block of ECC encoded data, the method comprising the steps of: accessing a set of the plurality of storage cells; and determining whether information is unrecoverable from a block of ECC encoded data stored in the accessed storage cells. [0007]
  • In a first preferred embodiment, determination of whether information is unrecoverable from the stored block of ECC encoded data is made by attempting to perform ECC decoding. If the ECC decoding successfully recovers information from the block of ECC encoded data, then use of that set of storage cells can continue in future read and write access cycles. However, if the ECC decoding fails to recover information from the block of ECC encoded data, then preferably remedial action is taken concerning the set of storage cells. For example, the remedial action involves discarding that set of storage cells such that the set is not available in future read and write cycles. [0008]
  • Optionally, the method comprises identifying failed symbols in the block of ECC encoded data, as an output from the ECC decoding step, and comparing the identified number of failed symbols against a threshold value. The threshold value suitably represents a safety margin, such as 50% to 95% of the maximum number of failed symbols which can be corrected by ECC decoding the block of ECC encoded data. The safety margin represents the situation where, although a relatively high proportion of failed symbols have been identified in the block of ECC encoded data, it is reasonable to continue using that set of storage cells in future. Even though further systematic or random failures might be encountered in a future read operation, it is reasonable to expect that the number of failed symbols will still be correctable by ECC decoding the block of ECC encoded data. [0009]
  • In a second preferred embodiment of the present invention, the accessed set of storage cells is evaluated based on parametric values, prior to attempting ECC decoding of the block of ECC encoded data. Preferably, the method comprises determining whether original information is expected to be unrecoverable from the block of ECC encoded data stored in the accessed set of storage cells. In particular, it is determined whether original information is expected to be unrecoverable because the probability of failing to correctly perform ECC decoding is unacceptably high. Where original information is not expected to be unrecoverable, then use of the set of storage cells may continue. The first and second embodiments are preferably combined, such that a decision to continue use of the set of storage cells, or take remedial action, is made either after performing a parametric based test as in the second embodiment, or after performing ECC decoding as in the first embodiment, or a decision can be made at either stage. [0010]
  • Preferably, in the second embodiment, the method comprises determining, from accessing the set of storage cells, failed symbols in the block of ECC encoded data that have been affected by a physical failure. Suitably, a determination is made whether there are more failed symbols in the block of ECC encoded data than can be corrected by error correction decoding the block of ECC encoded data. Here, a situation is identified where, due to physical failures, ECC decoding the block of ECC encoded data may well fail to recover the original information. In other words, there is an unacceptable probability that decoding the block of ECC encoded data will not correctly recover original information. [0011]
  • Preferably, accessing the set of storage cells comprises obtaining parametric values, which are compared against one or more ranges. Suitably, for most of the accessed set of storage cells, a logical bit value is derived, but some of the storage cells can be identified as being affected by a physical failure. Suitably, a failure count is determined based on the identified failed cells. The failure count can simply represent the number of failed cells, but preferably the failure count is based on failed symbols of the block of ECC encoded data affected by the identified failed cells. Preferably, the failure count is compared against a threshold value. As one option, the threshold value represents the total number of failed symbols which can be corrected by ECC decoding the block of ECC encoded data. As a second option, the threshold value represents a safety margin less than the total number of failed symbols correctable by ECC decoding, such as between about 50% to 95% of the total number. In this situation the threshold value is particularly useful in that only some types of physical failures in MRAM devices can be readily identified from the obtained parametric values, and the threshold value is set such that, given the identified number of failures, it is still reasonable to perform ECC decoding, whilst allowing for an additional number of as yet unidentified failures to affect the block of ECC encoded data. [0012]
  • Conveniently, original information is received for storing in the MRAM device in units of a sector, such as 512 bytes. The original information sector is error correction encoded to form one or more blocks of ECC encoded data. In the preferred embodiment a linear ECC scheme such as a Reed-Solomon code is employed. Conveniently, each sector of original information is encoded to form a sector of ECC encoded data comprising four codewords. Each codeword suitably forms the block of ECC encoded data mentioned above. [0013]
  • According to a second aspect of the present invention there is provided a method for controlling a magnetoresistive solid-state storage device, comprising the steps of: receiving original information which it is desired to store; error correction encoding the original information to form a block of ECC encoded data; storing the block of ECC encoded data in a set of magnetoresistive storage cells arranged in at least one array; accessing the set of storage cells; forming logical symbol values of the block of ECC encoded data from the accessed set of storage cells; error correction decoding the block of ECC encoded data to provide recovered information; if the decoding step provided recovered information then outputting the recovered information and continuing use of the set of storage cells, or else if the decoding step did not provide recovered information then taking remedial action in respect of the set of storage cells. [0014]
  • Preferably, the method comprises identifying, from the ECC decoding, zero or more failed symbols in the block of ECC encoded data; comparing the identified number of failed symbols against a threshold value; and, if the ECC decoding did not recover original information, or if the identified number of failed symbols is greater than the threshold value, then taking remedial action concerning the accessed set of storage cells. [0015]
  • According to a third aspect of the present invention there is provided a method for controlling a magnetoresistive solid-state storage device, comprising the steps of: receiving original information which it is desired to store; error correction encoding the original information to form a block of ECC encoded data; storing the block of ECC encoded data in a set of magnetoresistive storage cells arranged in at least one array; accessing the set of storage cells; comparing parametric values obtained by accessing the set of storage cells against one or more ranges; identifying failed cells amongst the accessed set of cells; forming a failure count based on the identified failed cells; comparing the failure count against a threshold value; and determining whether the original information is expected to be unrecoverable from the block of ECC encoded data stored in the accessed set of storage cells. [0016]
  • According to a fourth aspect of the present invention there is provided a magnetoresistive solid-state storage device, comprising: at least one array of magnetoresistive storage cells; a ECC encoding unit for forming a block of ECC encoded data from a unit of original information; and a controller arranged to store the block of ECC encoded data in a set of the storage cells, access the set of storage cells, and determine whether the original information is unrecoverable from the block of ECC encoded data stored in the accessed set of storage cells.[0017]
  • For a better understanding of the invention, and to show how embodiments of the same may be carried into effect, reference will now be made, by way of example, to the accompanying diagrammatic drawings in which: [0018]
  • FIG. 1 is a schematic diagram showing a preferred MRAM device including an array of storage cells; [0019]
  • FIG. 2 shows a preferred logical data structure; [0020]
  • FIG. 3 shows an overview of a preferred method for controlling an MRAM device; [0021]
  • FIG. 4 shows a first preferred method for controlling an MRAM device; [0022]
  • FIG. 5 shows a second preferred method for controlling an MRAM device; and [0023]
  • FIG. 6 is a graph illustrating a parametric value obtained from a storage cell of an MRAM device.[0024]
  • To assist a complete understanding of the present invention, an example MRAM device will first be described with reference to FIG. 1, including a description of the failure mechanisms found in MRAM devices. The preferred methods for controlling such MRAM devices will then be described with reference to FIGS. [0025] 2 to 6.
  • FIG. 1 shows a simplified magnetoresistive solid-[0026] state storage device 1 comprising an array 10 of storage cells 16. The array 10 is coupled to a controller 20 which, amongst other control elements, includes an ECC coding and decoding unit 22. The controller 20 and the array 10 can be formed on a single substrate, or can be arranged separately.
  • In one preferred embodiment, the [0027] array 10 comprises of the order of 1024 by 1024 storage cells, just a few of which are illustrated. The cells 16 are each formed at an intersection between control lines 12 and 14. In this example control lines 12 are arranged in rows, and control lines 14 are arranged in columns. One row 12 and one or more columns 14 are selected to access the required storage cell or cells 16 (or conversely one column and several rows, depending upon the orientation of the array). Suitably, the row and column lines are coupled to control circuits 18, which include a plurality of read/write control circuits. Depending upon the implementation, one read/write control circuit is provided per column, or read/write control circuits are multiplexed or shared between columns. In this example the control lines 12 and 14 are generally orthogonal, but other more complicated lattice structures are also possible.
  • In a read operation of the currently preferred MRAM device, a [0028] single row line 12 and several column lines 14 (represented by thicker lines in FIG. 1) are activated in the array 10 by the control circuits 18, and a set of data read from those activated cells. This operation is termed a slice. The row in this example is 1024 storage cells long 1 and the accessed storage cells 16 are separated by a minimum reading distance m, such as sixty-four cells, to minimise cross-cell interference in the read process. Hence, each slice provides up to l/m=1024/64=16 bits from the accessed array.
  • To provide an MRAM device of a desired storage capacity, preferably a plurality of independently [0029] addressable arrays 10 are arranged to form a macro-array. Conveniently, a small plurality of arrays 10 (typically four) are layered to form a stack, and plural stacks are arranged together, such as in a 16×16 layout. Preferably, each macro-array has a 16×18×4 or 16×20×4 layout (expressed as width×height×stack layers). Optionally, the MRAM device comprises more than one macro-array. In the currently preferred MRAM device only one of the four arrays in each stack can be accessed at any one time. Hence, a slice from a macro-array reads a set of cells from one row of a subset of the plurality of arrays 10, the subset preferably being one array within each stack.
  • Each [0030] storage cell 16 stores one bit of data suitably representing a numerical value and preferably a binary value, i.e. one or zero. Suitably, each storage cell includes two films which assume one of two stable magnetisation orientations, known as parallel and anti-parallel. The magnetisation orientation affects the resistance of the storage cell. When the storage cell 16 is in the anti-parallel state, the resistance is at its highest, and when the magnetic storage cell is in the parallel state, the resistance is at its lowest. Suitably, the anti-parallel state defines a zero logic state, and the parallel state defines a one logic state, or vice versa. As further background information, EP-A-0 918 334 (Hewlett-Packard) discloses one example of a magnetoresistive solid-state storage device which is suitable for use in preferred embodiments of the present invention.
  • Although generally reliable, it has been found that failures can occur which affect the ability of the device to store data reliably in the [0031] storage cells 16. Physical failures within a MRAM device can result from many causes including manufacturing imperfections, internal effects such as noise in a read process, environmental effects such as temperature and surrounding electromagnetic noise, or ageing of the device in use. In general, failures can be classified as either systematic failures or random failures. Systematic failures consistently affect a particular storage cell or a particular group of storage cells. Random failures occur transiently and are not consistently repeatable. Typically, systematic failures arise as a result of manufacturing imperfections and ageing, whilst random failures occur in response to internal effects and to external environmental effects.
  • Failures are highly undesirable and mean that at least some storage cells in the device cannot be written to or read from reliably. A cell affected by a failure can become unreadable, in which case no logical value can be read from the cell, or can become unreliable, in which case the logical value read from the cell is not necessarily the same as the value written to the cell (e.g. a “1” is written but a “0” is read). The storage capacity and reliability of the device can be severely affected and in the worst case the entire device becomes unusable. [0032]
  • Failure mechanisms take many forms, and the following examples are amongst those identified: [0033]
  • 1. Shorted bits—where the resistance of the storage cell is much lower than expected. Shorted bits tend to affect all storage cells lying in the same row and the same column. [0034]
  • 2. Open bits—where the resistance of the storage cell is much higher than expected. Open bit failures can, but do not always, affect all storage cells lying in the same row or column, or both. [0035]
  • 3. Half-select bits—where writing to a storage cell in a particular row or column causes another storage cell in the same row or column to change state. A cell which is vulnerable to half select will therefore possibly change state in response to a write access to any storage cell in the same row or column, resulting in unreliable stored data. [0036]
  • 4. Single failed bits—where a particular storage cell fails (e.g. is stuck always as a “0”), but does not affect other storage cells and is not affected by activity in other storage cells. [0037]
  • These four example failure mechanisms are each systematic, in that the same storage cell or cells are consistently affected. Where the failure mechanism affects only one cell, this can be termed an isolated failure. Where the failure mechanism affects a group of cells, this can be termed a grouped failure. [0038]
  • Whilst the storage cells of the MRAM device can be used to store data according to any suitable logical layout, data is preferably organised into basic data units (e.g. bytes) which in turn are grouped into larger logical data units (e.g. sectors). A physical failure, and in particular a grouped failure affecting many cells, can affect many bytes and possibly many sectors. It has been found that keeping information about cells, bytes or even sectors affected by physical failures is not efficient, due to the quantity of data involved. That is, attempts to produce a list of all logical data units rendered unusable due to at least one physical failure, tend to generate a quantity of management data which is too large to handle efficiently. Further, depending on how the data is organised on the device, a single physical failure can potentially affect a large number of logical data units, such that avoiding use of all bytes, sectors or other units affected by a failure substantially reduces the storage capacity of the device. For example, a grouped failure such as a shorted bit failure in just one storage cell affects many other storage cells, which lie in the same row or the same column. Thus, a single shorted bit failure can affect 1023 other cells lying in the same row, and 1023 cells lying in the same column—a total of 2027 affected cells. These 2027 affected cells may form part of many bytes, and many sectors, each of which would be rendered unusable by the single grouped failure. [0039]
  • Some improvements have been made in manufacturing processes and device construction to reduce the number of manufacturing failures and improve device longevity, but this usually involves increased manufacturing costs and complexity, and reduced device yields. Hence, techniques are being developed which respond to failures and avoid future loss of data. One example technique is the use of sparing. A row identified as containing failures is made redundant (spared) and replaced by one of a set of unused additional spare rows, and similarly for columns. However, either a physical replacement is required (i.e. routing connections from the failed row or column to instead reach the spare row or column), or else additional control overhead is required to map logical addresses to physical row and column lines. Only a limited sparing capacity can be provided, since enlarging the device to include spare rows and columns reduces device density for a fixed area of substrate and increases manufacturing complexity. Therefore, where failures are relatively common, sparing is unable to cope leading to possible loss of data. Also, sparing is not useful in handling random failures, and involves additional management overhead to determine deployment of sparing capacity. [0040]
  • The preferred embodiments of the present invention employ error correction coding to provide a magnetoresistive solid-state storage device which is error tolerant, preferably to tolerate and recover from both random failures and systematic failures. Typically, error correction coding involves receiving original information which it is desired to store and forming encoded data which allows errors to be identified and ideally corrected. The encoded data is stored in the solid-state storage device. At read time, the original information is recovered by error correction decoding the encoded stored data. A wide range of error correction coding (ECC) schemes are available and can be employed alone or in combination. Suitable ECC schemes include both schemes with single-bit symbols (e.g. BCH) and schemes with multiple-bit symbols (e.g. Reed-Solomon). [0041]
  • As general background information concerning error correction coding, reference is made to the following publication: W. W. Peterson and E. J. Weldon, Jr., “Error-Correcting Codes”, 2[0042] nd edition, 12th printing, 1994, MIT Press, Cambridge Mass.
  • A more specific reference concerning Reed-Solomon codes used in the preferred embodiments of the present invention is: “Reed-Solomon Codes and their Applications”, ED. S. B. Wicker and V. K. Bhargava, IEEE Press, New York, 1994. [0043]
  • FIG. 2 shows an example logical data structure used in preferred embodiments of the present invention. [0044] Original information 200 is received in predetermined units such as a sector comprising 512 bytes. Error correction coding is performed to produce a block of encoded data 202, in this case an encoded sector. The encoded sector 202 comprises a plurality of symbols 206 which can be a single bit (e.g. a BCH code with single-bit symbols) or can comprise multiple bits (e.g. a Reed-Solomon code using multi-bit symbols). In the preferred Reed-Solomon encoding scheme, each symbol 206 conveniently comprises eight bits. As shown in FIG. 2, the encoded sector 202 comprises four codewords 204, each comprising of the order of 144 to 160 symbols. The eight bits corresponding to each symbol are conveniently stored in eight storage cells 16. A physical failure which affects any of these eight storage cells can result in one or more of the bits being unreliable (i.e. the wrong value is read) or unreadable (i.e. no value can be obtained), giving a failed symbol.
  • Error correction decoding the encoded [0045] data 202 allows failed symbols 206 to be identified and corrected. The preferred Reed-Solomon scheme is an example of a linear error correcting code, which mathematically identifies and corrects completely up to a predetermined maximum number of failed symbols 206, depending upon the power of the code. For example, a [160,128,33] Reed-Solomon code having one hundred and sixty 8-bit symbols corresponding to one hundred and twenty-eight original information bytes and a minimum distance of thirty-three symbols can locate and correct up to sixteen failed symbols. Suitably, the ECC scheme employed is selected with a power sufficient to recover original information 200 from the encoded data 202 in substantially all cases. Very rarely, a block of encoded data 202 is encountered which is affected by so many failures that the original information 200 is unrecoverable. Also, very rarely the failures result in a mis-correct, where information recovered from the encoded data 202 is not equivalent to the original information 200. Even though the recovered information does not correspond to the original information, a mis-correct is not readily determined and means that the original information is unrecoverable.
  • In the current MRAM devices, grouped failures tend to affect a large group of storage cells, lying in the same row or column. This provides an environment which is unlike prior storage devices. The preferred embodiments of the present invention employ an ECC scheme with multi-bit symbols. Where manufacturing processes and device design change over time, it may become more appropriate to organise storage locations expecting bit-based errors and then apply an ECC scheme using single-bit symbols, and at least some the following embodiments can be applied to single-bit symbols. [0046]
  • FIG. 3 shows a simplified overview of a preferred method for controlling the [0047] MRAM device 1 of FIG. 1.
  • [0048] Step 301 comprises accessing a plurality of the storage cells 16 of the MRAM device. Preferably, the plurality of storage cells correspond to a block of encoded data, such as a codeword 204, or an encoded sector 202. Suitably, a plurality of read operations are performed by accessing the plurality of cells 16 using the row and column control lines 12 and 14. The read operations provide logical bit values which are used to form the symbols 206, and the symbols in turn are built into a complete logical block of data such as the codeword 204. In this example, four codewords 204 together form a complete encoded sector 202, from which the original information sector 200 can be recovered.
  • [0049] Step 302 comprises determining whether original information is unrecoverable from the block of encoded data. That is, the step 302 comprises determining whether decoding the block of encoded data is expected not to be able to produce recovered information, or determining whether attempting to decode the block of encoded data does not produce recovered information. The determining step can be performed by ECC decoding the block of encoded data as a logical evaluation technique, or can be performed using physical evaluation techniques, and preferably a combination of both logical and physical techniques are employed as will be described in more detail below.
  • Where [0050] step 302 determines that ECC decoding has not produced recovered information, or is not expected to produce recovered information, then remedial action is taken in step 304. Otherwise, use of the cells continues in step 303.
  • The remedial action in [0051] step 304 may take any suitable form, to manage future activity in the storage cells 16. As one example, the access of step 301 is immediately repeated, in the hope of avoiding some random errors and this time obtaining symbol values for the encoded data from which the original data can be recovered by ECC decoding. As a second example, the set of storage cells 16 corresponding to a failed codeword 204 or to a complete encoded sector 202 are identified and discarded, in order to avoid possible loss of data in future. In the currently preferred embodiments it is most convenient to use or discard sets of storage cells corresponding to a sector 202, although greater or lesser granularity can be applied as desired.
  • FIG. 4 shows a more detailed preferred method for controlling the MRAM device, using logical evaluation of the accessed set of [0052] storage cells 16 corresponding to a block of encoded data such as a codeword 204 or an encoded sector 202.
  • [0053] Step 401 comprises accessing the set of storage cells 16, equivalent to step 301 above.
  • [0054] Step 402 comprises performing ECC decoding of the block of encoded data obtained by accessing the storage cells in step 401.
  • [0055] Step 403 comprises determining whether the ECC decoding of step 402 was not successful, in the sense that the ECC decoding has not produced recovered information from the block of data. Where ECC decoding is not successful, it is not possible to recover the original data 200 from the accessed storage cells 16, and remedial action can be taken as in step 304.
  • Optionally, the method includes the [0056] step 404 of determining the number of failed symbols identified by the ECC decoding of step 402, and comparing the identified number of failures against a threshold value. A physical failure in any of the accessed set of storage cells can result in a failed symbol. The threshold value selected for the comparison is preferably in the range of between about 50% and 95% of the maximum number of failures that can be corrected by performing the ECC decoding of step 402. The threshold value in step 404 is selected on the basis that although a number of failures have been identified in this particular block of data, it is still reasonable to continue using the selected set of storage cells with the expectation of still being able to successfully perform ECC decoding next time those cells are accessed. The threshold value in step 404 provides a safety margin allowing a further failure or failures to occur in the next access, whilst still allowing a successful ECC decoding to be performed.
  • In almost all practical cases, the ECC scheme employed is sufficiently powerful to provide recovered information equivalent to the [0057] original information sector 200. The original information 200 is output from the MRAM device in step 405.
  • The method of FIG. 4 is conveniently employed whilst the MRAM device is in use. Suitably, the method of FIG. 4 is applied whilst the device stores variable user data, allowing dynamic management of data storage in the device. For example, it is possible that the number of systematic errors will increase as the device ages. A small number of sets of storage cells such as [0058] sectors 202 will become unreliable and should be removed from active use as a remedial action. However, it is expected that most sectors will continue in use reliably, by employing a suitable ECC scheme.
  • Additionally or alternatively, the method of FIG. 4 is conveniently applied when the MRAM device is first manufactured, or is first installed, or at power up, or at convenient times subsequently such as a periodic check. Suitably, a sample of test data is applied to a block such as a sector, and the test method of FIG. 4 performed to establish the suitability of that sector for future use. [0059]
  • FIG. 5 shows a second preferred method for controlling the [0060] MRAM device 1. As in FIGS. 3 and 4, the method is intended for use with a logical block of data such as codeword 204 or an encoded sector 202.
  • In [0061] step 501 the set of storage cells corresponding to the block of data are accessed, preferably in a set of read operations.
  • [0062] Step 502 comprises obtaining a plurality of parametric values associated with the accessed set of storage cells from the access of step 401. Suitably, a read voltage is applied along the row and column control lines 12, 14 causing a sense current to flow through selected storage cells 16, which have a resistance determined by parallel or anti-parallel alignment of the two magnetic films. The resistance of a particular cell is determined according to a phenomenon known as spin tunnelling and the cells are often referred to as magnetic tunnel junction storage cells. The condition of the storage cell is determined by measuring the sense current (proportional to resistance) or a related parameter such as response time to discharge a known capacitance.
  • [0063] Step 503 comprises comparing the obtained parametric values to one or more predicted ranges. The comparison of step 503 in almost all cases allows a logical value (e.g. one or zero) to be established for each cell. However, the comparison also conveniently allows at least some forms of physical failure to be identified. For example, it has been determined that a shorted bit failure leads to a very low resistance value in all cells of a particular row and a particular column. Also, open-bit failures can cause a very high resistance value for all cells of a particular row and column. By comparing the obtained parametric values against predicted ranges, cells affected by failures such as shorted-bit and open-bit failures can be identified with a high degree of certainty.
  • FIG. 6 is a graph as an illustrative example of the probability (p) that a particular cell will have a certain parametric value, in this case resistance (r), corresponding to a logical “0” in the left-hand curve, or a logical “1” in the right-hand curve. As an arbitrary scale, probability has been given between 0 and 1, whilst resistance is plotted between 0 and 100%. The resistance scale has been divided into five ranges. In [0064] range 601, the resistance value is very low and the predicted range represents a shorted-bit failure with a reasonable degree of certainty. Range 602 represents a low resistance value within expected boundaries, which in this example is determined as equivalent to a logical “0”. Range 603 represents a medium resistance value where a logical value cannot be ascertained with any degree of certainty. Range 604 is a high resistance range representing a logical “1”. Range 605 is a very high resistance value where an open-bit failure can be predicted with a high degree of certainty. The ranges shown in FIG. 6 are purely for illustration, and many other possibilities are available depending upon the physical construction of the MRAM device 1, the manner in which the storage cells are accessed, and the parametric values obtained. The range or ranges are suitably calibrated depending, for example, on environmental factors such as temperature, factors affecting a particular cell or cells and their position within the array, or the nature of the cells themselves and the type of access employed.
  • Referring again to FIG. 5, [0065] step 504 comprises counting a number of physical failures, as identified in the comparison of step 503. Suitably, the count of parametric failures in step 504 is performed on the basis of the number of symbols 206 (each containing one or more bits) which are affected by the identified physical failures.
  • [0066] Step 505 comprises comparing the number of parametric failures, i.e. the number of failed symbols identified by parametric testing, against a predetermined threshold value. The number of physical failures can be represented in any suitable form. Depending upon the nature of the ECC scheme employed, some types of failure can be weighted differently to other types of failure. Since the data stored in the storage cells represents encoded data, it is expected that ECC decoding will not be able to recover the original data, where the number of parametric failures is greater than the maximum power of the ECC scheme. Hence, the threshold value is suitably selected to represent a value which is equal to or less than the maximum number of failures which the ECC scheme employed is able to correct. Preferably, the threshold value in step 505 is selected to be substantially less than the maximum power of the ECC decoding scheme, suitably of the order of 50% to 95% of the maximum power. In a particular preferred embodiment the threshold value in step 505 is selected to represent about 50% to 75% and suitably about 60% of the maximum power of the employed ECC scheme. Preferably, the step 505 comprises determining the number of parametric failures to be greater than the threshold value, such that performing ECC decoding is expected (with a sufficiently high probability) not to be able to recover information from the encoded data. That is, where the number of parametric failures is greater than the threshold value, there is a greater than acceptable probability that information is unrecoverable from the encoded data.
  • [0067] Step 506 comprises determining whether or not to continue use of the set of cells corresponding to the accessed block of data, in view of the number of parametric failures which have been identified. If desired, remedial action can be taken as outlined in step 304.
  • The physical evaluation of FIG. 5 is particularly useful as a test procedure immediately following manufacture of the device, or at installation, or at power up, or at any convenient time subsequently. In one example, the test procedure of FIG. 5 is performed by writing a test set of data to the device and then reading from the device, or by any other suitable parametric testing. In particular, it is useful to apply the method of FIG. 5 to identify areas of the MRAM device which are severely affected by systematic errors caused by manufacturing imperfections, and remedial action can then be taken before the device is put into active use storing variable user data. In the preferred embodiment, each sector comprises four codewords, and a sector is made redundant where any one of its four codewords contains a number of parametric failures which is greater than the threshold value of [0068] step 505. A block of data such as an encoded sector 202 having a number of failed symbols greater than the threshold value is not used at all in the subsequent life span of the device, because the probability of unrecoverable data errors would be too high. The threshold value used in the test procedure is set such that at least one and preferably several failures occurring subsequently will be tolerated. In particular, the threshold value is set to allow further systematic failures to be tolerated together with at least one and preferably several random failures, in a block of data.
  • The parametric evaluation of FIG. 5 is particularly useful in determining shorted-bit and/or open-bit failures in MRAM devices. A systematic failure, such as a half select or some forms of isolated bit failure, is not so easily detectable using parametric tests, but is more readily discovered by logical evaluation using ECC decoding as in FIG. 4. Therefore, in particularly preferred embodiments of the present invention the logical evaluation of FIG. 4 is combined with the parametric evaluation of FIG. 5 to provide a practical device which is able to take advantage of the considerable benefits offered by the new MRAM technology whilst minimising the limitations of current available manufacturing techniques. [0069]
  • The MRAM device described herein is ideally suited for use in place of any prior solid-state storage device. In particular, the MRAM device is ideally suited both for use as a short-term storage device (e.g. cache memory) or a longer-term storage device (e.g. a solid-state hard disk). An MRAM device can be employed for both short term storage and longer term storage within a single apparatus, such as a computing platform. [0070]
  • A magnetoresistive solid-state storage device and methods for controlling such a device have been described. Advantageously, the storage device is able to tolerate a relatively large number of errors, including both systematic failures and transient failures, whilst successfully remaining in operation with no loss of original data. Simpler and lower cost manufacturing techniques are employed and/or device yield and device density are increased. As manufacturing processes improve, overhead of the employed ECC scheme can be reduced. However, error correction coding and decoding allows blocks of data, e.g. sectors or codewords, to remain in use, where otherwise the whole block must be discarded if only one failure occurs. Therefore, the preferred embodiments of the present invention avoid large scale discarding of logical blocks and reduce or even eliminate completely the need for inefficient control methods such as large-scale data mapping management or physical sparing. [0071]

Claims (33)

1. A method for controlling a magnetoresistive solid-state storage device having a plurality of storage cells for storing a block of ECC encoded data, the method comprising the steps of:
accessing a set of the plurality of storage cells; and
determining whether information is unrecoverable from a block of ECC encoded data stored in the accessed storage cells.
2. The method of claim 1, comprising determining whether information is unrecoverable, by attempting to perform ECC decoding of the block of ECC encoded data.
3. The method of claim 2, comprising continuing use of the set of storage cells, if the ECC decoding recovers information from the block of ECC encoded data.
4. The method of claim 2, comprising taking remedial action concerning the set of storage cells, if the ECC decoding does not recover information from the block of ECC encoded data.
5. The method of claim 2, comprising identifying, from the ECC decoding, zero or more failed symbols in the block of ECC encoded data; and comparing the identified number of failed symbols against a threshold value.
6. The method of claim 1, comprising determining whether original information is expected to be unrecoverable from a block of ECC encoded data stored in the accessed set of storage cells.
7. The method of claim 6, wherein original information is expected to be unrecoverable because a probability of failing to correctly perform ECC decoding of the block of ECC encoded data is unacceptably high.
8. The method of claim 6, comprising continuing use of the set of storage cells, when original information is not expected to be unrecoverable from the block of ECC encoded data stored in the accessed storage cells.
9. The method of claim 8, comprising taking remedial action concerning the set of storage cells, when original information is expected to be unrecoverable from a block of ECC encoded data stored in the accessed storage cells.
10. The method of claim 6, comprising determining, from accessing the set of storage cells, failed symbols in the block of ECC encoded data that have been affected by a physical failure.
11. The method of claim 10, comprising determining that there are more failed symbols in the block of ECC encoded data than can be corrected by error correction decoding the block of ECC encoded data.
12. The method of claim 10, comprising determining that due to failed symbols in the block of ECC encoded data, there is an unacceptable probability that decoding the block of ECC encoded data will not correctly recover original information.
13. The method of claim 6, comprising obtaining a parametric value for each of the set of storage cells, and comparing each parametric value against a range or ranges.
14. The method of claim 13, comprising deriving a logical bit value for each storage cell, as a result of comparing each parametric value against a range or ranges.
15. The method of claim 13, comprising identifying a cell or cells, amongst the set of storage cells, as being affected by a physical failure.
16. The method of claim 15, wherein the determining step comprises comparing a failure count based on the identified cells against a threshold value.
17. The method of claim 16, wherein the threshold value represents a number of failed symbols equal to or less than a total number of failed symbols which can be corrected by error correction decoding the block of ECC encoded data.
18. The method of claim 15, comprising using the identified cells to determine failed symbols, and comparing a count of the failed symbols against the threshold value.
19. The method of claim 18, wherein the threshold value is set to be in the range of about 50% to about 95% of the maximum number of failed symbols which can be corrected by error correction decoding the block of ECC encoded data.
20. The method of claim 6, comprising selectively ECC decoding the block of ECC encoded data in response to the determining step.
21. The method of claim 1, wherein the block of encoded data corresponds to a sector of original information.
22. The method of claim 1, wherein the block of ECC encoded data is a codeword, and wherein a plurality of codewords are grouped to form an encoded sector corresponding to a sector of original information.
23. The method of claim 1, performed prior to use of the storage device.
24. The method of claim 1, performed during use of the storage device.
25. A method for controlling a magnetoresistive solid-state storage device, comprising the steps of:
receiving original information which it is desired to store;
error correction encoding the original information to form a block of ECC encoded data;
storing the block of ECC encoded data in a set of magnetoresistive storage cells arranged in at least one array;
accessing the set of storage cells;
forming logical symbol values of the block of ECC encoded data from the accessed set of storage cells;
error correction decoding the block of ECC encoded data to provide recovered information;
if the decoding step provides recovered information then outputting the recovered information and continuing use of the set of storage cells, or else if the decoding step did not provide recovered information then taking remedial action in respect of the set of storage cells.
26. The method of claim 25, comprising:
identifying, from the ECC decoding, zero or more failed symbols in the block of ECC encoded data;
comparing the identified number of failed symbols against a threshold value; and
if the ECC decoding did not recover original information, or if the identified number of failed symbols is greater than the threshold value, then taking remedial action concerning the accessed set of storage cells.
27. A method for controlling a magnetoresistive solid-state storage device, comprising the steps of:
receiving original information which it is desired to store;
error correction encoding the original information to form a block of ECC encoded data;
storing the block of ECC encoded data in a set of magnetoresistive storage cells arranged in at least one array;
accessing the set of storage cells;
comparing parametric values obtained by accessing the set of storage cells against one or more ranges;
identifying failed cells amongst the accessed set of cells;
forming a failure count based on the identified failed cells;
comparing the failure count against a threshold value; and
determining whether the original information is expected to be unrecoverable from the block of ECC encoded data stored in the accessed set of storage cells.
28. The method of claim 27, comprising selectively attempting error correction decoding of the block of ECC encoded data, when original information is not expected to be unrecoverable, or else taking remedial action for the accessed set of storage cells where original information is expected to be unrecoverable.
29. The method of claim 28, wherein comparing the failure count against the threshold value indicates a probability of failing to correctly perform ECC decoding on the block of ECC encoded data as acceptable or unacceptable.
30. The method of claim 27, wherein the failure count is based on a number of failed symbols in the block of ECC encoded data, the failed symbols being identified with reference to the failed cells.
31. The method of claim 27, wherein the threshold value represents about 50% to about 95% of the maximum number of failed symbols which can be corrected by error correction decoding the block of ECC encoded data.
32. A magnetoresistive solid-state storage device, comprising:
at least one array of magnetoresistive storage cells;
a ECC encoding unit for forming a block of ECC encoded data from a unit of original information; and
a controller arranged to store the block of ECC encoded data in a set of the storage cells, access the set of storage cells, and determine whether the original information is unrecoverable from the block of ECC encoded data stored in the accessed set of storage cells.
33. An apparatus comprising the magnetoresistive solid-state storage device of claim 32.
US09/915,179 2001-07-25 2001-07-25 Fault tolerant magnetoresistive solid-state storage device Pending US20030023922A1 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
US09/915,179 US20030023922A1 (en) 2001-07-25 2001-07-25 Fault tolerant magnetoresistive solid-state storage device
US09/997,199 US7149948B2 (en) 2001-07-25 2001-11-28 Manufacturing test for a fault tolerant magnetoresistive solid-state storage device
US10/093,851 US7107508B2 (en) 2001-07-25 2002-03-08 Manufacturing test for a fault tolerant magnetoresistive solid-state storage device
EP02254716A EP1286360A3 (en) 2001-07-25 2002-07-04 Manufacturing test for a fault tolerant magnetoresistive solid-state storage device
GB0215468A GB2380572B (en) 2001-07-25 2002-07-04 Fault tolerant magnetoresistive solid-state storage device
JP2002216151A JP2003115196A (en) 2001-07-25 2002-07-25 Manufacturing test for fault tolerant magnetoresistive solid-state storage device
JP2002216150A JP2003115195A (en) 2001-07-25 2002-07-25 Fault tolerant magnetoresistive solid-state storage device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/915,179 US20030023922A1 (en) 2001-07-25 2001-07-25 Fault tolerant magnetoresistive solid-state storage device

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US09/997,199 Continuation-In-Part US7149948B2 (en) 2001-07-25 2001-11-28 Manufacturing test for a fault tolerant magnetoresistive solid-state storage device
US10/093,851 Continuation-In-Part US7107508B2 (en) 2001-07-25 2002-03-08 Manufacturing test for a fault tolerant magnetoresistive solid-state storage device

Publications (1)

Publication Number Publication Date
US20030023922A1 true US20030023922A1 (en) 2003-01-30

Family

ID=25435364

Family Applications (3)

Application Number Title Priority Date Filing Date
US09/915,179 Pending US20030023922A1 (en) 2001-07-25 2001-07-25 Fault tolerant magnetoresistive solid-state storage device
US09/997,199 Expired - Lifetime US7149948B2 (en) 2001-07-25 2001-11-28 Manufacturing test for a fault tolerant magnetoresistive solid-state storage device
US10/093,851 Expired - Lifetime US7107508B2 (en) 2001-07-25 2002-03-08 Manufacturing test for a fault tolerant magnetoresistive solid-state storage device

Family Applications After (2)

Application Number Title Priority Date Filing Date
US09/997,199 Expired - Lifetime US7149948B2 (en) 2001-07-25 2001-11-28 Manufacturing test for a fault tolerant magnetoresistive solid-state storage device
US10/093,851 Expired - Lifetime US7107508B2 (en) 2001-07-25 2002-03-08 Manufacturing test for a fault tolerant magnetoresistive solid-state storage device

Country Status (3)

Country Link
US (3) US20030023922A1 (en)
JP (1) JP2003115195A (en)
GB (1) GB2380572B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030023927A1 (en) * 2001-07-25 2003-01-30 Jonathan Jedwab Method for error correction decoding in a magnetoresistive solid-state storage device
US20030023924A1 (en) * 2001-07-25 2003-01-30 Davis James A. Data storage method for use in a magnetoresistive solid-state storage device
US20030046493A1 (en) * 2001-08-31 2003-03-06 Coulson Richard L. Hardware updated metadata for non-volatile mass storage cache
WO2004112048A2 (en) * 2003-06-12 2004-12-23 Infineon Technologies Ag Error detection and correction method and apparatus in a magneto-resistive random access memory
US20050055621A1 (en) * 2003-09-10 2005-03-10 Adelmann Todd Christopher Magnetic memory with error correction coding
US20050094459A1 (en) * 2003-11-03 2005-05-05 Robert Sesek Magnetic memory
US20050138495A1 (en) * 2003-11-26 2005-06-23 Jonathan Jedwab Magnetic memory which compares compressed fault maps
US20050144551A1 (en) * 2003-12-16 2005-06-30 Nahas Joseph J. MRAM having error correction code circuitry and method therefor
US20050172179A1 (en) * 2004-01-29 2005-08-04 Brandenberger Sarah M. System and method for configuring a solid-state storage device with error correction coding
US6973604B2 (en) 2002-03-08 2005-12-06 Hewlett-Packard Development Company, L.P. Allocation of sparing resources in a magnetoresistive solid-state storage device
US6999366B2 (en) 2003-12-03 2006-02-14 Hewlett-Packard Development Company, Lp. Magnetic memory including a sense result category between logic states
US20090141544A1 (en) * 2005-10-18 2009-06-04 Nec Corporation Mram and Operation Method of the Same
US20110060966A1 (en) * 2009-09-10 2011-03-10 Robustflash Technologies Ltd. Data programming method and system thereof
US20110173513A1 (en) * 2010-01-08 2011-07-14 International Business Machines Corporation Reference cells for spin torque based memory device
US20110179318A1 (en) * 2010-01-20 2011-07-21 Nec Corporation Apparatus, a method and a program thereof
US9250997B2 (en) 2012-11-27 2016-02-02 Samsung Electronics Co., Ltd. Semiconductor memory device including non-volatile memory, cache memory, and computer system
CN113849347A (en) * 2021-09-27 2021-12-28 深圳大学 Data recovery device, method, system and storage medium

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6894938B2 (en) * 2003-10-03 2005-05-17 Hewlett-Packard Development Company, L.P. System and method of calibrating a read circuit in a magnetic memory
FR2875352B1 (en) * 2004-09-10 2007-05-11 St Microelectronics Sa METHOD FOR DETECTING AND CORRECTING ERRORS FOR A MEMORY AND CORRESPONDING INTEGRATED CIRCUIT
US20070011513A1 (en) * 2005-06-13 2007-01-11 Intel Corporation Selective activation of error mitigation based on bit level error count
JP4905839B2 (en) * 2005-10-18 2012-03-28 日本電気株式会社 Operation method of MRAM
US8396041B2 (en) 2005-11-08 2013-03-12 Microsoft Corporation Adapting a communication network to varying conditions
US8381047B2 (en) 2005-11-30 2013-02-19 Microsoft Corporation Predicting degradation of a communication channel below a threshold based on data transmission errors
JP4692843B2 (en) * 2006-12-28 2011-06-01 Tdk株式会社 Memory controller, flash memory system, and flash memory control method
JP4905866B2 (en) * 2007-04-17 2012-03-28 日本電気株式会社 Semiconductor memory device and operation method thereof
US8120353B2 (en) 2008-04-28 2012-02-21 International Business Machines Corporation Methods for detecting damage to magnetoresistive sensors
US8626463B2 (en) * 2009-12-23 2014-01-07 Western Digital Technologies, Inc. Data storage device tester
US8458526B2 (en) * 2009-12-23 2013-06-04 Western Digital Technologies, Inc. Data storage device tester
JP2011198133A (en) * 2010-03-19 2011-10-06 Toshiba Corp Memory system and controller
US8639993B2 (en) * 2010-11-11 2014-01-28 Microsoft Corporation Encoding data to enable it to be stored in a storage block that includes at least one storage failure
KR20140026889A (en) * 2012-08-23 2014-03-06 삼성전자주식회사 Resistive memory device performing selective refresh and method of refreshing the resistive memeory device
US9164832B2 (en) * 2013-02-27 2015-10-20 Seagate Technology Llc ECC management for variable resistance memory cells
US10679718B2 (en) * 2017-10-04 2020-06-09 Western Digital Technologies, Inc. Error reducing matrix generation
US10922025B2 (en) * 2019-07-17 2021-02-16 Samsung Electronics Co., Ltd. Nonvolatile memory bad row management

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4069970A (en) * 1976-06-24 1978-01-24 Bell Telephone Laboratories, Incorporated Data access circuit for a memory array
US4209846A (en) * 1977-12-02 1980-06-24 Sperry Corporation Memory error logger which sorts transient errors from solid errors
US4216541A (en) * 1978-10-05 1980-08-05 Intel Magnetics Inc. Error repairing method and apparatus for bubble memories
US4458349A (en) * 1982-06-16 1984-07-03 International Business Machines Corporation Method for storing data words in fault tolerant memory to recover uncorrectable errors
US4933940A (en) * 1987-04-15 1990-06-12 Allied-Signal Inc. Operations controller for a fault tolerant multiple node processing system
US4939694A (en) * 1986-11-03 1990-07-03 Hewlett-Packard Company Defect tolerant self-testing self-repairing memory system
US5459742A (en) * 1992-06-11 1995-10-17 Quantum Corporation Solid state disk memory using storage devices with defects
US5502728A (en) * 1992-02-14 1996-03-26 International Business Machines Corporation Large, fault-tolerant, non-volatile, multiported memory
US5504760A (en) * 1991-03-15 1996-04-02 Sandisk Corporation Mixed data encoding EEPROM system
US5745673A (en) * 1994-09-21 1998-04-28 Texas Instruments Incorporated Memory architecture for solid state discs
US5848076A (en) * 1996-06-10 1998-12-08 Mitsubishi Denki Kabushiki Kaisha Memory card with capability of error correction and error correction method therefore
US5852574A (en) * 1997-12-24 1998-12-22 Motorola, Inc. High density magnetoresistive random access memory device and operating method thereof
US5887270A (en) * 1995-11-21 1999-03-23 Emc Corporation Fault tolerant controller system and method
US5987573A (en) * 1996-02-06 1999-11-16 Tokyo Electron Limited Memory apparatus and memory control method
US6166944A (en) * 1998-04-20 2000-12-26 Kyoto University Data storing apparatus including integrated magnetic memory cells and semiconductor devices
US6279133B1 (en) * 1997-12-31 2001-08-21 Kawasaki Steel Corporation Method and apparatus for significantly improving the reliability of multilevel memory architecture
US6430702B1 (en) * 1997-09-30 2002-08-06 Compaq Computer Corporation Fault tolerant memory
US6456525B1 (en) * 2000-09-15 2002-09-24 Hewlett-Packard Company Short-tolerant resistive cross point array

Family Cites Families (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4718042A (en) * 1985-12-23 1988-01-05 Ncr Corporation Non-destructive method and circuit to determine the programmability of a one time programmable device
US4845714A (en) * 1987-06-08 1989-07-04 Exabyte Corporation Multiple pass error correction process and apparatus for product codes
CA2019351A1 (en) * 1989-07-06 1991-01-06 Francis H. Reiff Fault tolerant memory
JPH03244218A (en) 1990-02-21 1991-10-31 Nec Corp Block code decoder and method for evaluating reliability of received word
US5233614A (en) 1991-01-07 1993-08-03 International Business Machines Corporation Fault mapping apparatus for memory
US5263030A (en) * 1991-02-13 1993-11-16 Digital Equipment Corporation Method and apparatus for encoding data for storage on magnetic tape
US5321703A (en) * 1992-03-13 1994-06-14 Digital Equipment Corporation Data recovery after error correction failure
US5590306A (en) * 1992-09-08 1996-12-31 Fuji Photo Film Co., Ltd. Memory card management system for writing data with usage and recording codes made significant
US5428630A (en) * 1993-07-01 1995-06-27 Quantum Corp. System and method for verifying the integrity of data written to a memory
US5488691A (en) * 1993-11-17 1996-01-30 International Business Machines Corporation Memory card, computer system and method of operation for differentiating the use of read-modify-write cycles in operating and initializaiton modes
ATE216096T1 (en) * 1994-02-22 2002-04-15 Siemens Ag FLEXIBLE ERROR CORRECTION CODE/PARITY BIT ARCHITECTURE
US5621690A (en) * 1995-04-28 1997-04-15 Intel Corporation Nonvolatile memory blocking architecture and redundancy
US5953351A (en) * 1995-09-15 1999-09-14 International Business Machines Corporation Method and apparatus for indicating uncorrectable data errors
US6112324A (en) * 1996-02-02 2000-08-29 The Arizona Board Of Regents Acting On Behalf Of The University Of Arizona Direct access compact disc, writing and reading method and device for same
US5864569A (en) * 1996-10-18 1999-01-26 Micron Technology, Inc. Method and apparatus for performing error correction on data read from a multistate memory
US5793795A (en) * 1996-12-04 1998-08-11 Motorola, Inc. Method for correcting errors from a jamming signal in a frequency hopped spread spectrum communication system
US5852874A (en) * 1997-02-19 1998-12-29 Walker; Henry F. Carton cutting device having a pivotal guard member
JPH10261043A (en) 1997-03-19 1998-09-29 Toshiba Corp Decoding method, decoder, and bar code processing system
JP3867862B2 (en) * 1997-04-16 2007-01-17 株式会社ルネサステクノロジ Semiconductor integrated circuit and memory inspection method
US6009550A (en) * 1997-05-20 1999-12-28 Seagate Technology, Inc. PBA recovery apparatus and method for interleaved reed-solomon codes
US6275965B1 (en) * 1997-11-17 2001-08-14 International Business Machines Corporation Method and apparatus for efficient error detection and correction in long byte strings using generalized, integrated, interleaved reed-solomon codewords
US6169686B1 (en) 1997-11-20 2001-01-02 Hewlett-Packard Company Solid-state memory with magnetic storage cells
EP0936743A1 (en) * 1998-02-17 1999-08-18 Koninklijke Philips Electronics N.V. Iterative decoding for binary block codes
US6408401B1 (en) * 1998-11-13 2002-06-18 Compaq Information Technologies Group, L.P. Embedded RAM with self-test and self-repair with spare rows and columns
US6381726B1 (en) * 1999-01-04 2002-04-30 Maxtor Corporation Architecture for soft decision decoding of linear block error correcting codes
US7219368B2 (en) * 1999-02-11 2007-05-15 Rsa Security Inc. Robust visual passwords
US6249475B1 (en) * 1999-04-05 2001-06-19 Madrone Solutions, Inc. Method for designing a tiled memory
US6584589B1 (en) 2000-02-04 2003-06-24 Hewlett-Packard Development Company, L.P. Self-testing of magneto-resistive memory arrays
US6856572B2 (en) * 2000-04-28 2005-02-15 Matrix Semiconductor, Inc. Multi-headed decoder structure utilizing memory array line driver with dual purpose driver device
US6483740B2 (en) * 2000-07-11 2002-11-19 Integrated Magnetoelectronics Corporation All metal giant magnetoresistive memory
US6400600B1 (en) * 2000-09-30 2002-06-04 Hewlett-Packard Company Method of repairing defective tunnel junctions
US6684353B1 (en) * 2000-12-07 2004-01-27 Advanced Micro Devices, Inc. Reliability monitor for a memory array
US6407953B1 (en) * 2001-02-02 2002-06-18 Matrix Semiconductor, Inc. Memory array organization and related test method particularly well suited for integrated circuits having write-once memory arrays
US6504779B2 (en) * 2001-05-14 2003-01-07 Hewlett-Packard Company Resistive cross point memory with on-chip sense amplifier calibration method and apparatus
US6633497B2 (en) * 2001-06-22 2003-10-14 Hewlett-Packard Development Company, L.P. Resistive cross point array of short-tolerant memory cells
US7036068B2 (en) * 2001-07-25 2006-04-25 Hewlett-Packard Development Company, L.P. Error correction coding and decoding in a solid-state storage device
US6801471B2 (en) * 2002-02-19 2004-10-05 Infineon Technologies Ag Fuse concept and method of operation
US20030172339A1 (en) * 2002-03-08 2003-09-11 Davis James Andrew Method for error correction decoding in a magnetoresistive solid-state storage device

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4069970A (en) * 1976-06-24 1978-01-24 Bell Telephone Laboratories, Incorporated Data access circuit for a memory array
US4209846A (en) * 1977-12-02 1980-06-24 Sperry Corporation Memory error logger which sorts transient errors from solid errors
US4216541A (en) * 1978-10-05 1980-08-05 Intel Magnetics Inc. Error repairing method and apparatus for bubble memories
US4458349A (en) * 1982-06-16 1984-07-03 International Business Machines Corporation Method for storing data words in fault tolerant memory to recover uncorrectable errors
US4939694A (en) * 1986-11-03 1990-07-03 Hewlett-Packard Company Defect tolerant self-testing self-repairing memory system
US4933940A (en) * 1987-04-15 1990-06-12 Allied-Signal Inc. Operations controller for a fault tolerant multiple node processing system
US5504760A (en) * 1991-03-15 1996-04-02 Sandisk Corporation Mixed data encoding EEPROM system
US5502728A (en) * 1992-02-14 1996-03-26 International Business Machines Corporation Large, fault-tolerant, non-volatile, multiported memory
US5459742A (en) * 1992-06-11 1995-10-17 Quantum Corporation Solid state disk memory using storage devices with defects
US5745673A (en) * 1994-09-21 1998-04-28 Texas Instruments Incorporated Memory architecture for solid state discs
US5887270A (en) * 1995-11-21 1999-03-23 Emc Corporation Fault tolerant controller system and method
US5987573A (en) * 1996-02-06 1999-11-16 Tokyo Electron Limited Memory apparatus and memory control method
US5848076A (en) * 1996-06-10 1998-12-08 Mitsubishi Denki Kabushiki Kaisha Memory card with capability of error correction and error correction method therefore
US6430702B1 (en) * 1997-09-30 2002-08-06 Compaq Computer Corporation Fault tolerant memory
US5852574A (en) * 1997-12-24 1998-12-22 Motorola, Inc. High density magnetoresistive random access memory device and operating method thereof
US6279133B1 (en) * 1997-12-31 2001-08-21 Kawasaki Steel Corporation Method and apparatus for significantly improving the reliability of multilevel memory architecture
US6166944A (en) * 1998-04-20 2000-12-26 Kyoto University Data storing apparatus including integrated magnetic memory cells and semiconductor devices
US6456525B1 (en) * 2000-09-15 2002-09-24 Hewlett-Packard Company Short-tolerant resistive cross point array

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7149949B2 (en) 2001-07-25 2006-12-12 Hewlett-Packard Development Company, L.P. Method for error correction decoding in a magnetoresistive solid-state storage device
US7036068B2 (en) 2001-07-25 2006-04-25 Hewlett-Packard Development Company, L.P. Error correction coding and decoding in a solid-state storage device
US7107507B2 (en) 2001-07-25 2006-09-12 Hewlett-Packard Development Company, L.P. Magnetoresistive solid-state storage device and data storage methods for use therewith
US20030023911A1 (en) * 2001-07-25 2003-01-30 Davis James Andrew Method for error correction decoding in an MRAM device (historical erasures)
US20030023927A1 (en) * 2001-07-25 2003-01-30 Jonathan Jedwab Method for error correction decoding in a magnetoresistive solid-state storage device
US20030023923A1 (en) * 2001-07-25 2003-01-30 Davis James Andrew Error correction coding and decoding in a solid-state storage device
US6990622B2 (en) 2001-07-25 2006-01-24 Hewlett-Packard Development Company, L.P. Method for error correction decoding in an MRAM device (historical erasures)
US6981196B2 (en) 2001-07-25 2005-12-27 Hewlett-Packard Development Company, L.P. Data storage method for use in a magnetoresistive solid-state storage device
US20030023924A1 (en) * 2001-07-25 2003-01-30 Davis James A. Data storage method for use in a magnetoresistive solid-state storage device
US20030023926A1 (en) * 2001-07-25 2003-01-30 Davis James Andrew Magnetoresistive solid-state storage device and data storage methods for use therewith
US20030046493A1 (en) * 2001-08-31 2003-03-06 Coulson Richard L. Hardware updated metadata for non-volatile mass storage cache
US7275135B2 (en) * 2001-08-31 2007-09-25 Intel Corporation Hardware updated metadata for non-volatile mass storage cache
US6973604B2 (en) 2002-03-08 2005-12-06 Hewlett-Packard Development Company, L.P. Allocation of sparing resources in a magnetoresistive solid-state storage device
WO2004112048A2 (en) * 2003-06-12 2004-12-23 Infineon Technologies Ag Error detection and correction method and apparatus in a magneto-resistive random access memory
WO2004112048A3 (en) * 2003-06-12 2005-04-07 Infineon Technologies Ag Error detection and correction method and apparatus in a magneto-resistive random access memory
US7191379B2 (en) 2003-09-10 2007-03-13 Hewlett-Packard Development Company, L.P. Magnetic memory with error correction coding
US20050055621A1 (en) * 2003-09-10 2005-03-10 Adelmann Todd Christopher Magnetic memory with error correction coding
US20050094459A1 (en) * 2003-11-03 2005-05-05 Robert Sesek Magnetic memory
US7325157B2 (en) 2003-11-03 2008-01-29 Samsung Electronics Co., Ltd Magnetic memory devices having selective error encoding capability based on fault probabilities
US20050138495A1 (en) * 2003-11-26 2005-06-23 Jonathan Jedwab Magnetic memory which compares compressed fault maps
US7472330B2 (en) * 2003-11-26 2008-12-30 Samsung Electronics Co., Ltd. Magnetic memory which compares compressed fault maps
US6999366B2 (en) 2003-12-03 2006-02-14 Hewlett-Packard Development Company, Lp. Magnetic memory including a sense result category between logic states
US20050144551A1 (en) * 2003-12-16 2005-06-30 Nahas Joseph J. MRAM having error correction code circuitry and method therefor
US7370260B2 (en) * 2003-12-16 2008-05-06 Freescale Semiconductor, Inc. MRAM having error correction code circuitry and method therefor
US20050172179A1 (en) * 2004-01-29 2005-08-04 Brandenberger Sarah M. System and method for configuring a solid-state storage device with error correction coding
US7210077B2 (en) 2004-01-29 2007-04-24 Hewlett-Packard Development Company, L.P. System and method for configuring a solid-state storage device with error correction coding
US20090141544A1 (en) * 2005-10-18 2009-06-04 Nec Corporation Mram and Operation Method of the Same
US7688617B2 (en) 2005-10-18 2010-03-30 Nec Corporation MRAM and operation method of the same
US20110060966A1 (en) * 2009-09-10 2011-03-10 Robustflash Technologies Ltd. Data programming method and system thereof
US20110173513A1 (en) * 2010-01-08 2011-07-14 International Business Machines Corporation Reference cells for spin torque based memory device
WO2011084905A2 (en) * 2010-01-08 2011-07-14 International Business Machines Corp. Reference cells for spin torque based memory device
WO2011084905A3 (en) * 2010-01-08 2012-05-03 International Business Machines Corp. Reference cells for spin torque based memory device
GB2491495A (en) * 2010-01-08 2012-12-05 Ibm Reference cells for spin torque based memory device
US8370714B2 (en) 2010-01-08 2013-02-05 International Business Machines Corporation Reference cells for spin torque based memory device
US20110179318A1 (en) * 2010-01-20 2011-07-21 Nec Corporation Apparatus, a method and a program thereof
US8261137B2 (en) * 2010-01-20 2012-09-04 Nec Corporation Apparatus, a method and a program thereof
US9250997B2 (en) 2012-11-27 2016-02-02 Samsung Electronics Co., Ltd. Semiconductor memory device including non-volatile memory, cache memory, and computer system
US9552256B2 (en) 2012-11-27 2017-01-24 Samsung Electronics Co., Ltd. Semiconductor memory device including non-volatile memory, cache memory, and computer system
CN113849347A (en) * 2021-09-27 2021-12-28 深圳大学 Data recovery device, method, system and storage medium

Also Published As

Publication number Publication date
GB2380572A (en) 2003-04-09
US7107508B2 (en) 2006-09-12
US20030023928A1 (en) 2003-01-30
GB0215468D0 (en) 2002-08-14
US7149948B2 (en) 2006-12-12
GB2380572B (en) 2005-05-18
JP2003115195A (en) 2003-04-18
US20030023925A1 (en) 2003-01-30

Similar Documents

Publication Publication Date Title
US7149948B2 (en) Manufacturing test for a fault tolerant magnetoresistive solid-state storage device
US7036068B2 (en) Error correction coding and decoding in a solid-state storage device
US7210077B2 (en) System and method for configuring a solid-state storage device with error correction coding
US6981196B2 (en) Data storage method for use in a magnetoresistive solid-state storage device
US10108509B2 (en) Dynamic enabling of redundant memory cells during operating life
US7191379B2 (en) Magnetic memory with error correction coding
US6973604B2 (en) Allocation of sparing resources in a magnetoresistive solid-state storage device
JP4905839B2 (en) Operation method of MRAM
US10659081B2 (en) Preprogrammed data recovery
KR20040083525A (en) Fuse concept and method of operation
US20040088614A1 (en) Management system for defective memory
US7325157B2 (en) Magnetic memory devices having selective error encoding capability based on fault probabilities
CN113393889A (en) Memory system
US20050138537A1 (en) Method and system to encode and decode wide data words
US6643195B2 (en) Self-healing MRAM
US20030172339A1 (en) Method for error correction decoding in a magnetoresistive solid-state storage device
EP1286360A2 (en) Manufacturing test for a fault tolerant magnetoresistive solid-state storage device
US11093322B1 (en) Memory error recovery using shared structural element error correlations
US20040141389A1 (en) Solid state storage device and data storage method
CN113495677B (en) Read-write method and memory device

Legal Events

Date Code Title Description
AS Assignment

Owner name: HEWLETT-PACKARD COMPANY, CALIFORNIA

Free format text: ASSIGNMENT BY OPERATION OF LAW;ASSIGNORS:HEWLETT-PACKARD LIMITED;JEDWAB, JONATHAN;MCCARTHY, DOMINIC P.;AND OTHERS;REEL/FRAME:012491/0462

Effective date: 20011105

Owner name: HEWLETT-PACKARD COMPANY, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DAVIS, JAMES A.;ELDREDGE, KENNETH J.;PERNER, FREDERICK A.;AND OTHERS;REEL/FRAME:012491/0492;SIGNING DATES FROM 20010926 TO 20011102

AS Assignment

Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD COMPANY;REEL/FRAME:014061/0492

Effective date: 20030926

Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY L.P.,TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD COMPANY;REEL/FRAME:014061/0492

Effective date: 20030926

STPP Information on status: patent application and granting procedure in general

Free format text: ADVISORY ACTION MAILED