US20110283135A1 - Managing memory faults - Google Patents

Managing memory faults Download PDF

Info

Publication number
US20110283135A1
US20110283135A1 US12/780,931 US78093110A US2011283135A1 US 20110283135 A1 US20110283135 A1 US 20110283135A1 US 78093110 A US78093110 A US 78093110A US 2011283135 A1 US2011283135 A1 US 2011283135A1
Authority
US
United States
Prior art keywords
memory
fault
physical
physical memory
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US12/780,931
Other versions
US8201024B2 (en
Inventor
Doug Burger
Jim Larus
Karin Strauss
Jeremy Condit
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Priority to US12/780,931 priority Critical patent/US8201024B2/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BURGER, DOUG, CONDIT, JEREMY, LARUS, JAMES, STRAUSS, KARIN
Publication of US20110283135A1 publication Critical patent/US20110283135A1/en
Priority to US13/465,602 priority patent/US8386836B2/en
Application granted granted Critical
Publication of US8201024B2 publication Critical patent/US8201024B2/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/073Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a memory management context, e.g. virtual memory or cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions

Definitions

  • Memory hardware can hide memory cell failures from the software layers executing on the hardware by using some hardware correction techniques.
  • hardware corrections include cell remapping where spare cells are used to replace faulty cells and error correcting code techniques where extra cells can store redundant information and use this redundant information to re-compute the original data in case of errors in a limited number of cells.
  • the memory hardware runs out of spare cells and/or the errors exceed ability to be corrected by the redundancy of error correcting codes, then the memory system has to decommission a large region of memory.
  • These large regions of memory are typically a memory page of the memory hardware that can be in the range of approximately one kilobyte (1K) up to two megabytes (2 MB) in size. Decommissioning an entire memory page is inefficient because the majority of memory cells on the page still work, yet the whole page cannot be used after a memory page has been decommissioned.
  • An example of a system can include a memory controller module to manage memory cells and report memory faults.
  • An error buffer module can store memory fault information received from the memory controller.
  • the error buffer can store information about data used in a write attempt, the size of the data being written, a physical location address where a write failure occurred, a write operation failure, and other relevant data.
  • a notification module can be in communication with the error buffer module.
  • the notification module may generate a notification of a memory fault in a memory access operation.
  • a system software module can provide services and manage executing programs on a processing device or processor.
  • the system software module can receive the notifications of the memory fault for the memory access operation.
  • a notification handler may be activated by an interrupt when the notification of the memory fault in the memory access operation is received.
  • a memory fault map can allow the system software module to record or add faulty memory regions to the memory fault map at a defined level of memory granularity.
  • the memory fault map can provide a detailed database about which memory cells are available to be re-used in a memory page that contains a memory fault.
  • a memory allocator for the system software module may be configured to access the memory fault map to avoid allocating faulty regions of memory.
  • FIG. 1 is a block diagram illustrating an embodiment of a system for managing memory faults.
  • FIG. 2 is a block diagram of an embodiment of a system for managing memory faults with a memory fault map.
  • FIG. 3 is a flowchart illustrating an embodiment of a method for managing memory faults.
  • a technology is provided for a hardware and/or software interface that exposes memory faults and other memory error messages to system software.
  • system software is an operating system. This memory interface enables access to memory faults to enable the software to deal with hardware failures more flexibly than a hardware-centric solution or hardware correction alone.
  • the technology includes modules and operations that the system software can activate to reclaim and use working memory cells that would otherwise be decommissioned because the working memory cells are in the vicinity of faulty cells.
  • a system for managing memory faults.
  • the system may include a memory controller module 102 or memory controller configured to manage memory cells and report memory faults.
  • the term memory fault can include memory faults for individual memory cells, other memory faults, and/or messages generated by the memory.
  • the memory cells can be located in a hardware memory module 104 .
  • the memory cells in the memory module may be a phase-change type of memory, flash memory, or another type of hardware memory module.
  • the memory controller can be computer hardware that is located on a printed circuit board with the memory, the memory controller can be located on the same substrate with the memory, or the memory controller can be located on a chip substrate with the processors or processing cores.
  • the memory controller is embedded in firmware or in a FPGA (field programmable gate array).
  • the memory controller module may be an ASIC (application specific integrated chip) that is associated with the memory.
  • the memory controller module may be a firmware device driver associated with the memory and memory cells.
  • An error buffer module 106 a can be configured to store memory fault information received from the memory module 104 by the memory controller 102 .
  • the error buffer module can include a storage buffer or another type of storage area that serves as temporary storage for information exchange about memory faults.
  • the error buffer module can contain a table 106 b to store memory errors and memory cell faults.
  • the error buffer module can be configured to store information including: whether a write operation failed, the data used in a write attempt 120 c , the size of the data that was being written 120 b , an address of a physical location where a write failure occurred 120 a , whether a memory address is currently in use 120 d , and other relevant fault data.
  • the memory controller module may use the contents of this table to manage read and/or write requests to locations that have suffered faults.
  • This means information in the error buffer can be retrieved by the memory controller to communicate to the system software that a write operation has failed.
  • the data used in the write attempt and which physical (and/or optionally virtual) location the hardware was attempting to write can be communicated to the system software.
  • the error buffer module may mark in the storage buffer to indicate that a memory region with faults has been permanently decommissioned by the system software module.
  • the system software can receive information from the memory controller about memory faults or to indicate that a memory region with faults is permanently decommissioned.
  • a notification module 108 can be in communication with the error buffer module and the memory controller.
  • the notification module can be configured to generate a notification 114 of a fault or notification message for a memory access operation.
  • the notification module can generate a memory fault interrupt for a read or write of a memory cell or memory region which failed or a memory region has been permanently decommissioned by the system software module.
  • the fault interrupt may be a hardware fault interrupt that is received from the notification module.
  • the fault interrupt may be a software interrupt, a software trap, hardware trap, or another kind of notification that the memory hardware sends to the system software or to a processor to redirect the execution to a handler that processes an event that needs attention. Examples of such events or interrupts may be the occurrence of a fault on a write operation or an interrupt representing that the amount of temporary storage used has reached a certain threshold such that faults can be processed in batch by the system software.
  • modules such as the notification module and the error buffer module are shown as being located inside other modules (e.g., the memory controller) these depictions are simply for illustrative purposes.
  • the modules illustrated in the disclosure as being inside other modules may also be located independently or separately from other modules and simply be in communication with the related modules.
  • a system software module 110 can be configured to provide services and manage executing programs on a processing device or processor.
  • the system software module can be configured to receive the notifications 114 of memory faults for the memory access operations.
  • the system software module can also include other run-time systems or run-time executables that can be considered part of the system software module or the run-time systems may interface with the system software module.
  • the interrupt can be delivered to a run-time system linked with an application rather than the operating system or the interrupt may be delivered to either the operating system or the run-time system which can notify other components in the system software module.
  • the system software module can be an operating system. Examples of such operating systems may include Microsoft WindowsTM, UNIX based operating systems, mobile phone operating systems, real-time operating systems, or other types of existing operating systems.
  • the system software may include a runtime executable environment or a virtual machine.
  • the system software can execute using one or more CPUs (central processing units) or processing cores 116 a - d . These processing cores may also have a memory page table 118 for caching memory pages currently being accessed for the system software, the processes executing on the system software, or runtime executable environments (e.g. virtual machines) executing on the system software.
  • CPUs central processing units
  • processing cores may also have a memory page table 118 for caching memory pages currently being accessed for the system software, the processes executing on the system software, or runtime executable environments (e.g. virtual machines) executing on the system software.
  • a notification handler 112 can be activated by an interrupt when the notification of the fault in the memory access operation is received.
  • the notification handler can be an interrupt handler or device driver that can be activated to record memory faults for the system software after the memory faults happen and to manage the system software's ability to work around the failed memory cells.
  • the system software may configure the notification handler in an interrupt table to be automatically looked up by hardware when a hardware interrupt is received.
  • the system software may receive an interrupt and trigger the notification handler.
  • FIG. 2 illustrates another example of a system for managing physical memory faults.
  • This system for managing physical memory includes many components similar to those components illustrated in FIG. 1 , and like components in the figures have like numbers.
  • this system includes a hardware memory module 102 or memory controller configured to store information in memory cells 104 and report memory faults or memory cell faults.
  • An error buffer 106 a - b can store dynamic information about memory faults, memory cell faults, and memory cell decommissions received from the system software module.
  • the system can include a notification module 108 to generate notifications of faults for memory access operations and a system software module 110 with a notification handler 112 configured to be activated by an interrupt when a notification of a fault on the memory access operation is received.
  • FIG. 2 An additional structure provided by the system in FIG. 2 is a memory fault map module 200 that can allow the system software module to record or add faulty memory regions in a memory fault map at a defined level of memory granularity. This memory fault map can later be used to avoid mistakenly trying to use or allocate faulty memory.
  • the memory fault map can provide a detailed database for the system software module, and the memory fault map can identify which memory cells may be re-used in a memory region or memory page even though the memory region has a memory cell fault.
  • the memory fault map can track faults at arbitrary granularity.
  • the memory fault map can track memory cells at the byte, word, memory row, tenth of a memory page, or another defined level of granularity.
  • a trade-off exists between the granularity of the memory fault map and the memory fault map's size. The finer the granularity of the memory fault map, the larger this memory fault map will be.
  • the memory fault map may contain multiple levels of granularity in one memory fault map, where different levels of granularity are tracked for different physical memory types. For instance, one granularity can be provided for SDRAM (synchronous dynamic random access memory) and a second granularity can be provided for phase change memory in the same memory fault map.
  • SDRAM synchronous dynamic random access memory
  • This memory fault map allows the system software to reclaim and re-use memory cells that would otherwise be decommissioned with a faulty memory page. Being able to reclaim working memory cells in proximity to or near to the non-working memory cells enables the memory to be used longer before an overall memory module fails and is not usable. In addition, reclaiming memory enables more of the working memory to be used even as individual memory cells begin to fail over time. Cell failure can be defined as complete failure of the cell at one time where the cell cannot be written to or read from or failure may be repeated intermittent failures during use over time.
  • the system software has detailed knowledge about which locations can or cannot be reused in a memory region that contains a fault. Since this map records faulty memory regions at defined levels' of granularity, working memory near or adjacent to the faulty region can still be used normally for data storage. The ability to reuse fine grain working memory regions as regular data storage despite nearby or adjacent faulty memory cells is a result of building a memory fault map at a fine granularity. As a result, a continuous virtual memory space can be provided using the memory fault map and a virtual to physical address translation table.
  • the use of the memory map configuration of FIG. 2 can also provide other results. Since the memory controller has limited storage for error reporting, the memory map can store a more complete map of the reported faults. On the other hand, the memory controller can cache details regarding more recent faults entries. The access to the memory controller's storage structures can be slow, since the access to a memory controller's control structures may be limited unless a processor is in supervisor mode. As a result, using a memory map that is more quickly accessible to the system software regardless of a processor's mode can make fault handling faster. In addition, the granularity at which a fault or error is reported from the memory controller may not be the same granularity as stored in the memory map. For instance, the software memory map granularity may be the same or coarser than the memory controller faults recorded.
  • a memory allocator 202 for the system software module may access the memory fault map to avoid allocating faulty regions of memory. By exposing memory faults to the memory allocator, the failed memory cells can be avoided when memory is allocated by the system software. For example, a sequential allocator that can allocate memory regions contiguously within a memory region can avoid allocating faulty regions by checking whether the memory fault map indicates any fault in the memory region the memory allocator intends to allocate on behalf of a requester (assuming the allocation is done for areas backed by physical memory). Using a memory fault map to avoid allocating data to faulty regions of memory means that requests for failed data cells can be avoided and memory allocation may be more efficient.
  • the memory allocator may include a fractional mapping module to map whole virtual memory pages into partial physical memory pages with memory defaults.
  • System software may further expose memory fault information (maps and unwritten data due to a fault in a memory write) to higher level software.
  • An example of software that can benefit from accessing the memory fault information can include software such as managed runtimes that handle their own memory management (i.e., memory allocation and garbage collection). Since no pointers are allowed in these managed runtimes, the managed runtimes have more flexibility on data placement in memory. On a memory fault, managed runtimes can copy the data to another working area of memory. In addition, managed runtimes can allocate memory around failures without the need of a continuous virtual address space. Regions of memory that are faulty can then be treated as special allocated objects that are not allocated for application data by the memory allocator, yet are not moved by a garbage collector process or module. If the memory space becomes fragmented by faults, the managed runtime can allocate small objects to small regions of working memory or break objects into smaller, hierarchically organized object segments. This is a software level solution to the pervasive memory fault problem.
  • Allowing the managed runtime environments to know about the memory faults can be valuable for modules such as virtual machines which may desire to allocate memory from the system software or main system's hardware memory.
  • separate smaller managed runtimes can use the memory fault management technology for discrete tasks.
  • the managed runtimes may handle things like heap management, security, class loading, garbage collection, and memory allocation, which may free software developers to concentrate on the business logic specific of the high level software applications.
  • the managed runtimes are provided with memory fault information this allows the managed runtimes' memory management to seamlessly tolerate faults as the memory faults happen.
  • the managed runtimes can reuse the working memory without the need of any support for a continuous virtual address space.
  • FIG. 3 illustrates a method for managing physical memory faults using system software.
  • the method can include the operation of notifying the system software that a memory fault has occurred for memory locations during a write operation, as in block 310 .
  • a working location in physical memory can then be found to enable the transfer of data from the memory locations not written due to the memory fault, as in block 320 .
  • the system software can find a working location in physical memory to move the data that was previously intended to be written when the fault happened.
  • some surrounding data e.g. data in the same memory page may also be moved to maintain data continuity.
  • the data from failed memory locations can then be transferred or moved to working memory locations for storage, as in block 330 .
  • the virtual addresses for the failed memory locations can then be remapped to addresses of the working physical memory locations, as in block 340 .
  • the virtual addresses are re-mapped to the selected working memory location(s).
  • the present technology provides arbitrary granularity for the virtual to physical memory translations which can result in a continuous virtual addressing space in the presence of memory faults.
  • the virtual pages that map to the physical page being moved can be found, but this search for virtual pages may be a relatively expensive operation.
  • This search operation can be implemented in a number of ways. For instance, the virtual pages that map to a physical memory page being moved can be identified. The system page tables can be scanned to find an address of the physical memory page to be moved. If the system page tables are scanned in search of the address of the physical page to be moved this can incur long latencies but there is no storage space overhead incurred. Alternatively, a reverse translation table of physical to virtual page addresses can be maintained in order to find the address of the physical page to be moved.
  • the fault management interface may enable the hardware or memory controller to continue processing memory requests to failing memory locations for a limited period of time while the data being sent to the memory locations is being moved.
  • the error buffer module that records faults can be capable of recording more than one failure simultaneously. As such, even if one write attempt to memory fails, other memory requests to other locations can continue to make progress. Eventually, if the recorded faults are not processed, the error buffer module can fill up. In this case, the system software may prevent any further write requests to memory until the data in the error buffer module has been processed or emptied.
  • the working memory cells near or adjacent to the failed memory locations which would otherwise be decommissioned can be reclaimed, as in block 350 .
  • the working memory cells can be marked as being available in a memory fault map to allow the system software module to find working memory cells at a defined level of memory granularity. Then later write operations can use these working memory cells. This enables the remaining working physical cells in a memory page to be used as opposed to being decommissioned.
  • the system software module may also use a memory allocator and a memory fault map to avoid allocating faulty regions of memory.
  • pervasive memory faults as is anticipated to be common in phase-change memory, for example, are inconvenient because they fragment the memory's physical address space.
  • the system software has to decommission the entire physical page where a fault has occurred to preserve virtual address continuity.
  • the memory pages may be from 4 KB and 2 MB in size. So disabling a memory page can disable a significant area of memory. Once the hardware error correction mechanisms are exhausted, even if only one additional bit is faulty, the entire page may be marked as decommissioned or unavailable because a page is typically the minimum unit to maintain a continuous address space. This results in a waste of working memory.
  • the finer granularity translations of the present technology enable system software to provide a continuous virtual memory space despite pervasive memory faults.
  • Using memory fault maps associated with fine granularity virtual to physical address translation can provide a continuous virtual address space, even when memory faults fragment a physical memory space.
  • This technology provides an explicit interface for managing memory faults and the explicit interface communicates information about the memory faults to the system software or higher level application software.
  • permanent memory fault information was either not exposed, exposed via error signals, exposed indirectly via parity bits to the system software.
  • memory regions were not decommissioned at granularities finer than a page.
  • This memory fault management interface enables software to directly identify regions with failures at a finer granularity than previously available and in a more direct manner.
  • the memory fault management interface may enable hardware to continue processing requests to failing memory locations for a limited period of time while the data being sent to the memory locations is being moved.
  • modules may be implemented as a hardware circuit comprising custom VLSI circuits or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components.
  • a module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices or the like.
  • Modules may also be implemented in software for execution by various types of processors.
  • An identified module of executable code may, for instance, comprise one or more blocks of computer instructions, which may be organized as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together, but may comprise disparate instructions stored in different locations which comprise the module and achieve the stated purpose for the module when joined logically together.
  • a module of executable code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices.
  • operational data may be identified and illustrated herein within modules, and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices.
  • the modules may be passive or active, including agents operable to perform desired functions.

Abstract

Embodiments are described for managing memory faults. An example system can include a memory controller module to manage memory cells and report memory faults. An error buffer module can store memory fault information received from the memory controller. A notification module can be in communication with the error buffer module. The notification module may generate a notification of a memory fault in a memory access operation. A system software module can provide services and manage executing programs on a processor. In addition, the system software module can receive the notifications of the memory fault for the memory access operation. A notification handler may be activated by an interrupt when the notification of the memory fault in the memory access operation is received.

Description

    BACKGROUND
  • Current and future off-the-shelf computing memory technologies are subject to memory cell failures that can prevent memory cells from reliably storing data. The failure of just a few memory cells can result in data loss and the decommissioning of a large region of memory. Examples of memory that may have a higher rate of memory cell failure than other types of memory include phase change memory and flash memory. However, the use of these types of memory is increasing due to a desire to replace the mechanical hard drives and volatile memory in many types of electronic products.
  • Memory hardware can hide memory cell failures from the software layers executing on the hardware by using some hardware correction techniques. Examples of hardware corrections include cell remapping where spare cells are used to replace faulty cells and error correcting code techniques where extra cells can store redundant information and use this redundant information to re-compute the original data in case of errors in a limited number of cells. However, once the memory hardware runs out of spare cells and/or the errors exceed ability to be corrected by the redundancy of error correcting codes, then the memory system has to decommission a large region of memory. These large regions of memory are typically a memory page of the memory hardware that can be in the range of approximately one kilobyte (1K) up to two megabytes (2 MB) in size. Decommissioning an entire memory page is inefficient because the majority of memory cells on the page still work, yet the whole page cannot be used after a memory page has been decommissioned.
  • SUMMARY
  • This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. While certain disadvantages of prior technologies are noted above, the claimed subject matter is not to be limited to implementations that solve any or all of the noted disadvantages of the prior technologies.
  • Various embodiments are described for managing memory faults. An example of a system can include a memory controller module to manage memory cells and report memory faults. An error buffer module can store memory fault information received from the memory controller. The error buffer can store information about data used in a write attempt, the size of the data being written, a physical location address where a write failure occurred, a write operation failure, and other relevant data. A notification module can be in communication with the error buffer module. The notification module may generate a notification of a memory fault in a memory access operation. A system software module can provide services and manage executing programs on a processing device or processor. In addition, the system software module can receive the notifications of the memory fault for the memory access operation. A notification handler may be activated by an interrupt when the notification of the memory fault in the memory access operation is received.
  • In an additional embodiment of the technology, a memory fault map can allow the system software module to record or add faulty memory regions to the memory fault map at a defined level of memory granularity. The memory fault map can provide a detailed database about which memory cells are available to be re-used in a memory page that contains a memory fault. A memory allocator for the system software module may be configured to access the memory fault map to avoid allocating faulty regions of memory.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram illustrating an embodiment of a system for managing memory faults.
  • FIG. 2 is a block diagram of an embodiment of a system for managing memory faults with a memory fault map.
  • FIG. 3 is a flowchart illustrating an embodiment of a method for managing memory faults.
  • DETAILED DESCRIPTION
  • Reference will now be made to the exemplary embodiments illustrated in the drawings, and specific language will be used herein to describe the same. It will nevertheless be understood that no limitation of the scope of the technology is thereby intended. Alterations and further modifications of the features illustrated herein, and additional applications of the embodiments as illustrated herein, which would occur to one skilled in the relevant art and having possession of this disclosure, are to be considered within the scope of the description.
  • A technology is provided for a hardware and/or software interface that exposes memory faults and other memory error messages to system software. An example of system software is an operating system. This memory interface enables access to memory faults to enable the software to deal with hardware failures more flexibly than a hardware-centric solution or hardware correction alone. The technology includes modules and operations that the system software can activate to reclaim and use working memory cells that would otherwise be decommissioned because the working memory cells are in the vicinity of faulty cells.
  • A system is provided for managing memory faults. The system may include a memory controller module 102 or memory controller configured to manage memory cells and report memory faults. The term memory fault can include memory faults for individual memory cells, other memory faults, and/or messages generated by the memory. The memory cells can be located in a hardware memory module 104. The memory cells in the memory module may be a phase-change type of memory, flash memory, or another type of hardware memory module. The memory controller can be computer hardware that is located on a printed circuit board with the memory, the memory controller can be located on the same substrate with the memory, or the memory controller can be located on a chip substrate with the processors or processing cores. In one embodiment, the memory controller is embedded in firmware or in a FPGA (field programmable gate array). In another example configuration, the memory controller module may be an ASIC (application specific integrated chip) that is associated with the memory. In another instance, the memory controller module may be a firmware device driver associated with the memory and memory cells.
  • An error buffer module 106 a can be configured to store memory fault information received from the memory module 104 by the memory controller 102. The error buffer module can include a storage buffer or another type of storage area that serves as temporary storage for information exchange about memory faults. The error buffer module can contain a table 106 b to store memory errors and memory cell faults. The error buffer module can be configured to store information including: whether a write operation failed, the data used in a write attempt 120 c, the size of the data that was being written 120 b, an address of a physical location where a write failure occurred 120 a, whether a memory address is currently in use 120 d, and other relevant fault data. The memory controller module may use the contents of this table to manage read and/or write requests to locations that have suffered faults. This means information in the error buffer can be retrieved by the memory controller to communicate to the system software that a write operation has failed. The data used in the write attempt and which physical (and/or optionally virtual) location the hardware was attempting to write can be communicated to the system software. The error buffer module may mark in the storage buffer to indicate that a memory region with faults has been permanently decommissioned by the system software module. Thus, the system software can receive information from the memory controller about memory faults or to indicate that a memory region with faults is permanently decommissioned.
  • A notification module 108 can be in communication with the error buffer module and the memory controller. The notification module can be configured to generate a notification 114 of a fault or notification message for a memory access operation. In one example, the notification module can generate a memory fault interrupt for a read or write of a memory cell or memory region which failed or a memory region has been permanently decommissioned by the system software module. The fault interrupt may be a hardware fault interrupt that is received from the notification module. Alternatively, the fault interrupt may be a software interrupt, a software trap, hardware trap, or another kind of notification that the memory hardware sends to the system software or to a processor to redirect the execution to a handler that processes an event that needs attention. Examples of such events or interrupts may be the occurrence of a fault on a write operation or an interrupt representing that the amount of temporary storage used has reached a certain threshold such that faults can be processed in batch by the system software.
  • While some modules such as the notification module and the error buffer module are shown as being located inside other modules (e.g., the memory controller) these depictions are simply for illustrative purposes. The modules illustrated in the disclosure as being inside other modules may also be located independently or separately from other modules and simply be in communication with the related modules.
  • A system software module 110 can be configured to provide services and manage executing programs on a processing device or processor. The system software module can be configured to receive the notifications 114 of memory faults for the memory access operations. In some software architectures, the system software module can also include other run-time systems or run-time executables that can be considered part of the system software module or the run-time systems may interface with the system software module. In these cases, the interrupt can be delivered to a run-time system linked with an application rather than the operating system or the interrupt may be delivered to either the operating system or the run-time system which can notify other components in the system software module.
  • The system software module can be an operating system. Examples of such operating systems may include Microsoft Windows™, UNIX based operating systems, mobile phone operating systems, real-time operating systems, or other types of existing operating systems. In some embodiments, the system software may include a runtime executable environment or a virtual machine.
  • The system software can execute using one or more CPUs (central processing units) or processing cores 116 a-d. These processing cores may also have a memory page table 118 for caching memory pages currently being accessed for the system software, the processes executing on the system software, or runtime executable environments (e.g. virtual machines) executing on the system software.
  • A notification handler 112 can be activated by an interrupt when the notification of the fault in the memory access operation is received. The notification handler can be an interrupt handler or device driver that can be activated to record memory faults for the system software after the memory faults happen and to manage the system software's ability to work around the failed memory cells. In one example, the system software may configure the notification handler in an interrupt table to be automatically looked up by hardware when a hardware interrupt is received. Alternatively, the system software may receive an interrupt and trigger the notification handler.
  • FIG. 2 illustrates another example of a system for managing physical memory faults. This system for managing physical memory includes many components similar to those components illustrated in FIG. 1, and like components in the figures have like numbers. Specifically, this system includes a hardware memory module 102 or memory controller configured to store information in memory cells 104 and report memory faults or memory cell faults. An error buffer 106 a-b can store dynamic information about memory faults, memory cell faults, and memory cell decommissions received from the system software module.
  • The system can include a notification module 108 to generate notifications of faults for memory access operations and a system software module 110 with a notification handler 112 configured to be activated by an interrupt when a notification of a fault on the memory access operation is received.
  • An additional structure provided by the system in FIG. 2 is a memory fault map module 200 that can allow the system software module to record or add faulty memory regions in a memory fault map at a defined level of memory granularity. This memory fault map can later be used to avoid mistakenly trying to use or allocate faulty memory. The memory fault map can provide a detailed database for the system software module, and the memory fault map can identify which memory cells may be re-used in a memory region or memory page even though the memory region has a memory cell fault.
  • The memory fault map can track faults at arbitrary granularity. For example, the memory fault map can track memory cells at the byte, word, memory row, tenth of a memory page, or another defined level of granularity. However, a trade-off exists between the granularity of the memory fault map and the memory fault map's size. The finer the granularity of the memory fault map, the larger this memory fault map will be. In one embodiment, the memory fault map may contain multiple levels of granularity in one memory fault map, where different levels of granularity are tracked for different physical memory types. For instance, one granularity can be provided for SDRAM (synchronous dynamic random access memory) and a second granularity can be provided for phase change memory in the same memory fault map.
  • This memory fault map allows the system software to reclaim and re-use memory cells that would otherwise be decommissioned with a faulty memory page. Being able to reclaim working memory cells in proximity to or near to the non-working memory cells enables the memory to be used longer before an overall memory module fails and is not usable. In addition, reclaiming memory enables more of the working memory to be used even as individual memory cells begin to fail over time. Cell failure can be defined as complete failure of the cell at one time where the cell cannot be written to or read from or failure may be repeated intermittent failures during use over time.
  • Once the system software has a map of memory faults, the system software has detailed knowledge about which locations can or cannot be reused in a memory region that contains a fault. Since this map records faulty memory regions at defined levels' of granularity, working memory near or adjacent to the faulty region can still be used normally for data storage. The ability to reuse fine grain working memory regions as regular data storage despite nearby or adjacent faulty memory cells is a result of building a memory fault map at a fine granularity. As a result, a continuous virtual memory space can be provided using the memory fault map and a virtual to physical address translation table.
  • The use of the memory map configuration of FIG. 2 can also provide other results. Since the memory controller has limited storage for error reporting, the memory map can store a more complete map of the reported faults. On the other hand, the memory controller can cache details regarding more recent faults entries. The access to the memory controller's storage structures can be slow, since the access to a memory controller's control structures may be limited unless a processor is in supervisor mode. As a result, using a memory map that is more quickly accessible to the system software regardless of a processor's mode can make fault handling faster. In addition, the granularity at which a fault or error is reported from the memory controller may not be the same granularity as stored in the memory map. For instance, the software memory map granularity may be the same or coarser than the memory controller faults recorded.
  • A memory allocator 202 for the system software module may access the memory fault map to avoid allocating faulty regions of memory. By exposing memory faults to the memory allocator, the failed memory cells can be avoided when memory is allocated by the system software. For example, a sequential allocator that can allocate memory regions contiguously within a memory region can avoid allocating faulty regions by checking whether the memory fault map indicates any fault in the memory region the memory allocator intends to allocate on behalf of a requester (assuming the allocation is done for areas backed by physical memory). Using a memory fault map to avoid allocating data to faulty regions of memory means that requests for failed data cells can be avoided and memory allocation may be more efficient. The memory allocator may include a fractional mapping module to map whole virtual memory pages into partial physical memory pages with memory defaults.
  • System software may further expose memory fault information (maps and unwritten data due to a fault in a memory write) to higher level software. An example of software that can benefit from accessing the memory fault information can include software such as managed runtimes that handle their own memory management (i.e., memory allocation and garbage collection). Since no pointers are allowed in these managed runtimes, the managed runtimes have more flexibility on data placement in memory. On a memory fault, managed runtimes can copy the data to another working area of memory. In addition, managed runtimes can allocate memory around failures without the need of a continuous virtual address space. Regions of memory that are faulty can then be treated as special allocated objects that are not allocated for application data by the memory allocator, yet are not moved by a garbage collector process or module. If the memory space becomes fragmented by faults, the managed runtime can allocate small objects to small regions of working memory or break objects into smaller, hierarchically organized object segments. This is a software level solution to the pervasive memory fault problem.
  • Allowing the managed runtime environments to know about the memory faults can be valuable for modules such as virtual machines which may desire to allocate memory from the system software or main system's hardware memory. In addition, separate smaller managed runtimes can use the memory fault management technology for discrete tasks. For example, the managed runtimes may handle things like heap management, security, class loading, garbage collection, and memory allocation, which may free software developers to concentrate on the business logic specific of the high level software applications. When the managed runtimes are provided with memory fault information this allows the managed runtimes' memory management to seamlessly tolerate faults as the memory faults happen. In addition, the managed runtimes can reuse the working memory without the need of any support for a continuous virtual address space.
  • FIG. 3 illustrates a method for managing physical memory faults using system software. Specifically, FIG. 3 describes operations for the system software to handle one or more memory faults after the memory faults happen. The method can include the operation of notifying the system software that a memory fault has occurred for memory locations during a write operation, as in block 310. A working location in physical memory can then be found to enable the transfer of data from the memory locations not written due to the memory fault, as in block 320. In other words, once the system software is notified that a fault happened, the system software can find a working location in physical memory to move the data that was previously intended to be written when the fault happened. In some cases, some surrounding data (e.g. data in the same memory page) may also be moved to maintain data continuity.
  • The data from failed memory locations can then be transferred or moved to working memory locations for storage, as in block 330. The virtual addresses for the failed memory locations can then be remapped to addresses of the working physical memory locations, as in block 340. In other words, the virtual addresses are re-mapped to the selected working memory location(s). The present technology provides arbitrary granularity for the virtual to physical memory translations which can result in a continuous virtual addressing space in the presence of memory faults.
  • To remap virtual addresses to the memory cell locations, the virtual pages that map to the physical page being moved can be found, but this search for virtual pages may be a relatively expensive operation. This search operation can be implemented in a number of ways. For instance, the virtual pages that map to a physical memory page being moved can be identified. The system page tables can be scanned to find an address of the physical memory page to be moved. If the system page tables are scanned in search of the address of the physical page to be moved this can incur long latencies but there is no storage space overhead incurred. Alternatively, a reverse translation table of physical to virtual page addresses can be maintained in order to find the address of the physical page to be moved. While maintaining a reverse translation table incurs the overhead of table maintenance, this may result less apparent delay to the operating system and user than undertaking a scan of the system page tables but significant storage space overhead. Additional techniques such as a hash table can be used, or other summarization techniques may be used to identify virtual pages that need remapping (intermediate resource option). A further technique that can increase the efficiency of the re-mapping process can be to buffer memory faults and process the memory faults in batches to amortize the cost of searching for virtual pages to be remapped.
  • In one example configuration, the fault management interface may enable the hardware or memory controller to continue processing memory requests to failing memory locations for a limited period of time while the data being sent to the memory locations is being moved. The error buffer module that records faults can be capable of recording more than one failure simultaneously. As such, even if one write attempt to memory fails, other memory requests to other locations can continue to make progress. Eventually, if the recorded faults are not processed, the error buffer module can fill up. In this case, the system software may prevent any further write requests to memory until the data in the error buffer module has been processed or emptied.
  • The working memory cells near or adjacent to the failed memory locations which would otherwise be decommissioned can be reclaimed, as in block 350. Once the failed write data has been moved, the working memory cells can be marked as being available in a memory fault map to allow the system software module to find working memory cells at a defined level of memory granularity. Then later write operations can use these working memory cells. This enables the remaining working physical cells in a memory page to be used as opposed to being decommissioned. As described previously, the system software module may also use a memory allocator and a memory fault map to avoid allocating faulty regions of memory.
  • The effects of pervasive memory faults, as is anticipated to be common in phase-change memory, for example, are inconvenient because they fragment the memory's physical address space. In existing systems, the system software has to decommission the entire physical page where a fault has occurred to preserve virtual address continuity. In the example of the x86 architecture, the memory pages may be from 4 KB and 2 MB in size. So disabling a memory page can disable a significant area of memory. Once the hardware error correction mechanisms are exhausted, even if only one additional bit is faulty, the entire page may be marked as decommissioned or unavailable because a page is typically the minimum unit to maintain a continuous address space. This results in a waste of working memory. The finer granularity translations of the present technology enable system software to provide a continuous virtual memory space despite pervasive memory faults. Using memory fault maps associated with fine granularity virtual to physical address translation can provide a continuous virtual address space, even when memory faults fragment a physical memory space.
  • The operations described allow the system software to be involved in handling memory faults and to avoid the use of memory fault regions during execution. This management of memory faults is also provided at a finer granularity. In contrast, prior systems have performed memory tests at machine initialization time and disabled memory regions at a coarse granularity (at least a physical page at a time). This results in relatively large areas of memory becoming unusable in memory system types where memory cells regularly fail.
  • This technology provides an explicit interface for managing memory faults and the explicit interface communicates information about the memory faults to the system software or higher level application software. Previously, permanent memory fault information was either not exposed, exposed via error signals, exposed indirectly via parity bits to the system software. In addition, memory regions were not decommissioned at granularities finer than a page. This memory fault management interface enables software to directly identify regions with failures at a finer granularity than previously available and in a more direct manner. In addition, the memory fault management interface may enable hardware to continue processing requests to failing memory locations for a limited period of time while the data being sent to the memory locations is being moved.
  • Some of the functional units described in this specification have been labeled as modules, in order to more particularly emphasize their implementation independence. For example, a module may be implemented as a hardware circuit comprising custom VLSI circuits or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. A module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices or the like.
  • Modules may also be implemented in software for execution by various types of processors. An identified module of executable code may, for instance, comprise one or more blocks of computer instructions, which may be organized as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together, but may comprise disparate instructions stored in different locations which comprise the module and achieve the stated purpose for the module when joined logically together.
  • Indeed, a module of executable code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices. Similarly, operational data may be identified and illustrated herein within modules, and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices. The modules may be passive or active, including agents operable to perform desired functions.
  • Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the preceding description, numerous specific details were provided, such as examples of various configurations to provide a thorough understanding of embodiments of the described technology. One skilled in the relevant art will recognize, however, that the technology can be practiced without one or more of the specific details, or with other methods, components, devices, etc. In other instances, well-known structures or operations are not shown or described in detail to avoid obscuring aspects of the technology. Although the subject matter has been described in language specific to structural features and/or operations, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features and operations described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. Numerous modifications and alternative arrangements can be devised without departing from the spirit and scope of the described technology.

Claims (21)

1. A system for managing memory faults of a hardware memory module, the system comprising:
a memory controller configured to manage memory cells of the hardware memory module, the memory controller comprising:
an error buffer module configured to store memory fault information received from the hardware memory module by the memory controller, the memory fault information reflecting a memory fault by an individual cell or memory region within a physical page of memory of the hardware memory module;
a notification module in communication with the error buffer module configured to generate a notification of the memory fault;
a system software module configured to provide services and manage executing programs on a processor, wherein the system software module is configured to receive the notification of the memory fault and re-use other memory cells or memory regions in the physical page of memory having the memory fault; and
a notification handler configured to be activated by an interrupt when the notification of the memory fault is received.
2. The system as in claim 1, wherein the error buffer module stores information including: whether a write operation failed, data used in a write attempt, and an address of a physical location where a write failure occurred.
3. The system as in claim 1, wherein the error buffer module is configured to mark in a storage buffer to indicate that individual memory regions with faults have been permanently decommissioned by the memory controller.
4. The system as in claim 1, wherein the notification module is configured to generate fault interrupts.
5. The system as in claim 4, wherein an individual fault interrupt is a hardware fault interrupt indicating that an individual memory region has been permanently decommissioned.
6. The system as in claim 1, wherein the notification module is configured to generate a fault interrupt indicating that an amount of temporary storage used has reached a certain threshold and a plurality of memory faults, including the memory fault by the individual cell or memory region, are available to be batch processed by the system software.
7. The system as in claim 1, wherein the memory cells are a non-volatile type of memory.
8. A method for managing physical memory faults using system software, the method comprising:
receiving, by the system software, a notification that a memory cell fault has occurred for a failed physical memory location of a physical memory, the memory cell fault occurring during a write operation to a physical memory page that includes the failed physical memory location;
finding a working physical memory location in the physical memory to move failed write data of the write operation;
moving the failed write data of the write operation to the working physical memory location of the physical memory for storage;
remapping a virtual address from the failed physical memory location having the memory cell fault to an address of the working physical memory location; and
reclaiming working memory cells in the physical memory page having the failed physical memory location.
9. The method as in claim 8, wherein remapping a virtual address further comprises finding a plurality of virtual pages that map to the physical memory page that includes the failed physical memory location.
10. The method as in claim 9, wherein finding the plurality of virtual pages further comprises scanning system page tables.
11. The method as in claim 9, wherein finding the plurality of virtual pages further comprises maintaining a reverse translation table of physical to virtual page addresses.
12. The method as in claim 8, wherein reclaiming the working memory cells further comprises marking the working memory cells as being available in a memory fault map to allow the system software to find the working memory cells at a defined level of memory granularity.
13. The method as in claim 8, wherein the system software module uses a memory allocator and a memory fault map to avoid allocating the failed physical memory location.
14-20. (canceled)
21. The system according to claim 1, further comprising:
a memory allocator configured to map a whole virtual memory page to partial physical memory pages having memory faults.
22. A system comprising:
a memory fault map configured to:
track faulty physical memory regions or faulty physical memory cells of a physical memory; and
identify working physical memory cells in a physical memory page that includes one or more of the faulty physical memory regions or one or more of the faulty physical memory cells;
a system software module configured to reclaim the working physical memory cells from the physical memory page that includes the one or more faulty physical memory regions or the one or more faulty physical memory cells; and
one or more processors configured to execute the system software module.
23. The system according to claim 22, further comprising:
a fractional mapping module configured to map a whole virtual page into partial physical memory pages.
24. The system according to claim 23, the partial physical memory pages including the physical memory page that includes the one or more faulty physical memory regions or the one or more faulty physical memory cells.
25. The system according to claim 24, the whole virtual page being partially mapped to the working physical memory cells of the physical memory page and partially mapped to other working physical memory cells of another physical memory page.
26. The system according to claim 22, further comprising:
a notification module configured to provide a hardware interrupt fault to the system software module when a memory access operation attempts to access the one or more of the faulty physical memory regions or the one or more of the faulty physical memory cells in the physical page.
27. The system according to claim 26, the memory access operation comprising a write operation.
US12/780,931 2010-05-17 2010-05-17 Managing memory faults Active US8201024B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US12/780,931 US8201024B2 (en) 2010-05-17 2010-05-17 Managing memory faults
US13/465,602 US8386836B2 (en) 2010-05-17 2012-05-07 Managing memory faults

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/780,931 US8201024B2 (en) 2010-05-17 2010-05-17 Managing memory faults

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/465,602 Continuation US8386836B2 (en) 2010-05-17 2012-05-07 Managing memory faults

Publications (2)

Publication Number Publication Date
US20110283135A1 true US20110283135A1 (en) 2011-11-17
US8201024B2 US8201024B2 (en) 2012-06-12

Family

ID=44912784

Family Applications (2)

Application Number Title Priority Date Filing Date
US12/780,931 Active US8201024B2 (en) 2010-05-17 2010-05-17 Managing memory faults
US13/465,602 Active US8386836B2 (en) 2010-05-17 2012-05-07 Managing memory faults

Family Applications After (1)

Application Number Title Priority Date Filing Date
US13/465,602 Active US8386836B2 (en) 2010-05-17 2012-05-07 Managing memory faults

Country Status (1)

Country Link
US (2) US8201024B2 (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8386836B2 (en) 2010-05-17 2013-02-26 Microsoft Corporation Managing memory faults
US8700834B2 (en) 2011-09-06 2014-04-15 Western Digital Technologies, Inc. Systems and methods for an enhanced controller architecture in data storage systems
US8707104B1 (en) 2011-09-06 2014-04-22 Western Digital Technologies, Inc. Systems and methods for error injection in data storage systems
US8713357B1 (en) * 2011-09-06 2014-04-29 Western Digital Technologies, Inc. Systems and methods for detailed error reporting in data storage systems
US9003223B2 (en) 2012-09-27 2015-04-07 International Business Machines Corporation Physical memory fault mitigation in a computing environment
US9053008B1 (en) 2012-03-26 2015-06-09 Western Digital Technologies, Inc. Systems and methods for providing inline parameter service in data storage devices
US9195530B1 (en) 2011-09-06 2015-11-24 Western Digital Technologies, Inc. Systems and methods for improved data management in data storage systems
US20160132406A1 (en) * 2014-11-11 2016-05-12 SK Hynix Inc. Data storage device and operating method thereof
US9442833B1 (en) * 2010-07-20 2016-09-13 Qualcomm Incorporated Managing device identity
US10115446B1 (en) * 2015-04-21 2018-10-30 Spin Transfer Technologies, Inc. Spin transfer torque MRAM device with error buffer
US10367139B2 (en) 2017-12-29 2019-07-30 Spin Memory, Inc. Methods of manufacturing magnetic tunnel junction devices
US10424726B2 (en) 2017-12-28 2019-09-24 Spin Memory, Inc. Process for improving photoresist pillar adhesion during MRAM fabrication
US10424723B2 (en) 2017-12-29 2019-09-24 Spin Memory, Inc. Magnetic tunnel junction devices including an optimization layer
US10438995B2 (en) 2018-01-08 2019-10-08 Spin Memory, Inc. Devices including magnetic tunnel junctions integrated with selectors
US10438996B2 (en) 2018-01-08 2019-10-08 Spin Memory, Inc. Methods of fabricating magnetic tunnel junctions integrated with selectors
US10446210B2 (en) 2016-09-27 2019-10-15 Spin Memory, Inc. Memory instruction pipeline with a pre-read stage for a write operation for reducing power consumption in a memory device that uses dynamic redundancy registers
US10460781B2 (en) 2016-09-27 2019-10-29 Spin Memory, Inc. Memory device with a dual Y-multiplexer structure for performing two simultaneous operations on the same row of a memory bank
US10559338B2 (en) 2018-07-06 2020-02-11 Spin Memory, Inc. Multi-bit cell read-out techniques
US10692569B2 (en) 2018-07-06 2020-06-23 Spin Memory, Inc. Read-out techniques for multi-bit cells
US10699761B2 (en) 2018-09-18 2020-06-30 Spin Memory, Inc. Word line decoder memory architecture
US10734573B2 (en) 2018-03-23 2020-08-04 Spin Memory, Inc. Three-dimensional arrays with magnetic tunnel junction devices including an annular discontinued free magnetic layer and a planar reference magnetic layer
US10784439B2 (en) 2017-12-29 2020-09-22 Spin Memory, Inc. Precessional spin current magnetic tunnel junction devices and methods of manufacture
US10840436B2 (en) 2017-12-29 2020-11-17 Spin Memory, Inc. Perpendicular magnetic anisotropy interface tunnel junction devices and methods of manufacture
US11183267B2 (en) * 2019-07-12 2021-11-23 Micron Technology, Inc. Recovery management of retired super management units
US11295830B2 (en) * 2019-10-01 2022-04-05 SK Hynix Inc. Memory system and operating method of the memory system
CN115686901A (en) * 2022-10-25 2023-02-03 超聚变数字技术有限公司 Memory fault analysis method and computer equipment
CN116126581A (en) * 2023-04-10 2023-05-16 阿里云计算有限公司 Memory fault processing method, device, system, equipment and storage medium

Families Citing this family (80)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10698859B2 (en) 2009-09-18 2020-06-30 The Board Of Regents Of The University Of Texas System Data multicasting with router replication and target instruction identification in a distributed multi-core processing architecture
US9021241B2 (en) 2010-06-18 2015-04-28 The Board Of Regents Of The University Of Texas System Combined branch target and predicate prediction for instruction blocks
US8621324B2 (en) * 2010-12-10 2013-12-31 Qualcomm Incorporated Embedded DRAM having low power self-correction capability
US20140006712A1 (en) * 2011-03-16 2014-01-02 Joseph A. Tucek Systems and methods for fine granularity memory sparing
KR20130078973A (en) * 2012-01-02 2013-07-10 삼성전자주식회사 Method for managing bed storage space in memory device and storage device using method thereof
US8849731B2 (en) * 2012-02-23 2014-09-30 Microsoft Corporation Content pre-fetching for computing devices
US8972649B2 (en) 2012-10-05 2015-03-03 Microsoft Technology Licensing, Llc Writing memory blocks using codewords
US9032244B2 (en) 2012-11-16 2015-05-12 Microsoft Technology Licensing, Llc Memory segment remapping to address fragmentation
US9280417B2 (en) 2013-05-21 2016-03-08 Microsoft Technology Licensing, Llc Message storage in memory blocks using codewords
US9372750B2 (en) * 2013-11-01 2016-06-21 Qualcomm Incorporated Method and apparatus for non-volatile RAM error re-mapping
US9287005B2 (en) 2013-12-13 2016-03-15 International Business Machines Corporation Detecting missing write to cache/memory operations
WO2015153645A1 (en) * 2014-03-31 2015-10-08 Oracle International Corporation Memory migration in presence of live memory traffic
US9583219B2 (en) 2014-09-27 2017-02-28 Qualcomm Incorporated Method and apparatus for in-system repair of memory in burst refresh
US10141955B2 (en) * 2015-04-11 2018-11-27 International Business Machines Corporation Method and apparatus for selective and power-aware memory error protection and memory management
US9812222B2 (en) 2015-04-20 2017-11-07 Qualcomm Incorporated Method and apparatus for in-system management and repair of semi-conductor memory failure
US9940136B2 (en) 2015-06-26 2018-04-10 Microsoft Technology Licensing, Llc Reuse of decoded instructions
US9946548B2 (en) 2015-06-26 2018-04-17 Microsoft Technology Licensing, Llc Age-based management of instruction blocks in a processor instruction window
US9952867B2 (en) 2015-06-26 2018-04-24 Microsoft Technology Licensing, Llc Mapping instruction blocks based on block size
US10175988B2 (en) 2015-06-26 2019-01-08 Microsoft Technology Licensing, Llc Explicit instruction scheduler state information for a processor
US10169044B2 (en) 2015-06-26 2019-01-01 Microsoft Technology Licensing, Llc Processing an encoding format field to interpret header information regarding a group of instructions
US10409599B2 (en) 2015-06-26 2019-09-10 Microsoft Technology Licensing, Llc Decoding information about a group of instructions including a size of the group of instructions
US10409606B2 (en) 2015-06-26 2019-09-10 Microsoft Technology Licensing, Llc Verifying branch targets
US10346168B2 (en) 2015-06-26 2019-07-09 Microsoft Technology Licensing, Llc Decoupled processor instruction window and operand buffer
US11755484B2 (en) 2015-06-26 2023-09-12 Microsoft Technology Licensing, Llc Instruction block allocation
US10191747B2 (en) 2015-06-26 2019-01-29 Microsoft Technology Licensing, Llc Locking operand values for groups of instructions executed atomically
US10163479B2 (en) 2015-08-14 2018-12-25 Spin Transfer Technologies, Inc. Method and apparatus for bipolar memory write-verify
US11126433B2 (en) 2015-09-19 2021-09-21 Microsoft Technology Licensing, Llc Block-based processor core composition register
US10776115B2 (en) 2015-09-19 2020-09-15 Microsoft Technology Licensing, Llc Debug support for block-based processor
US11681531B2 (en) 2015-09-19 2023-06-20 Microsoft Technology Licensing, Llc Generation and use of memory access instruction order encodings
US10768936B2 (en) 2015-09-19 2020-09-08 Microsoft Technology Licensing, Llc Block-based processor including topology and control registers to indicate resource sharing and size of logical processor
US10095519B2 (en) 2015-09-19 2018-10-09 Microsoft Technology Licensing, Llc Instruction block address register
US10061584B2 (en) 2015-09-19 2018-08-28 Microsoft Technology Licensing, Llc Store nullification in the target field
US10936316B2 (en) 2015-09-19 2021-03-02 Microsoft Technology Licensing, Llc Dense read encoding for dataflow ISA
US10180840B2 (en) 2015-09-19 2019-01-15 Microsoft Technology Licensing, Llc Dynamic generation of null instructions
US10031756B2 (en) 2015-09-19 2018-07-24 Microsoft Technology Licensing, Llc Multi-nullification
US11016770B2 (en) 2015-09-19 2021-05-25 Microsoft Technology Licensing, Llc Distinct system registers for logical processors
US10198263B2 (en) 2015-09-19 2019-02-05 Microsoft Technology Licensing, Llc Write nullification
US20170083327A1 (en) 2015-09-19 2017-03-23 Microsoft Technology Licensing, Llc Implicit program order
US10678544B2 (en) 2015-09-19 2020-06-09 Microsoft Technology Licensing, Llc Initiating instruction block execution using a register access instruction
US10719321B2 (en) 2015-09-19 2020-07-21 Microsoft Technology Licensing, Llc Prefetching instruction blocks
US10452399B2 (en) 2015-09-19 2019-10-22 Microsoft Technology Licensing, Llc Broadcast channel architectures for block-based processors
US10871967B2 (en) 2015-09-19 2020-12-22 Microsoft Technology Licensing, Llc Register read/write ordering
US11687345B2 (en) 2016-04-28 2023-06-27 Microsoft Technology Licensing, Llc Out-of-order block-based processors and instruction schedulers using ready state data indexed by instruction position identifiers
US10445200B2 (en) 2016-05-02 2019-10-15 Samsung Electronics Co., Ltd. Storage device having various recovery methods and recovery modes
KR102628239B1 (en) 2016-05-02 2024-01-24 삼성전자주식회사 Storage device, operating method of storage device and operating method of computing system including storage device and host device
US10360964B2 (en) 2016-09-27 2019-07-23 Spin Memory, Inc. Method of writing contents in memory during a power up sequence using a dynamic redundancy register in a memory device
US10546625B2 (en) 2016-09-27 2020-01-28 Spin Memory, Inc. Method of optimizing write voltage based on error buffer occupancy
US10437491B2 (en) 2016-09-27 2019-10-08 Spin Memory, Inc. Method of processing incomplete memory operations in a memory device during a power up sequence and a power down sequence using a dynamic redundancy register
US10366774B2 (en) 2016-09-27 2019-07-30 Spin Memory, Inc. Device with dynamic redundancy registers
US10437723B2 (en) 2016-09-27 2019-10-08 Spin Memory, Inc. Method of flushing the contents of a dynamic redundancy register to a secure storage area during a power down in a memory device
US10818331B2 (en) 2016-09-27 2020-10-27 Spin Memory, Inc. Multi-chip module for MRAM devices with levels of dynamic redundancy registers
US11531552B2 (en) 2017-02-06 2022-12-20 Microsoft Technology Licensing, Llc Executing multiple programs simultaneously on a processor core
US10481976B2 (en) 2017-10-24 2019-11-19 Spin Memory, Inc. Forcing bits as bad to widen the window between the distributions of acceptable high and low resistive bits thereby lowering the margin and increasing the speed of the sense amplifiers
US10529439B2 (en) 2017-10-24 2020-01-07 Spin Memory, Inc. On-the-fly bit failure detection and bit redundancy remapping techniques to correct for fixed bit defects
US10489245B2 (en) 2017-10-24 2019-11-26 Spin Memory, Inc. Forcing stuck bits, waterfall bits, shunt bits and low TMR bits to short during testing and using on-the-fly bit failure detection and bit redundancy remapping techniques to correct them
US10656994B2 (en) 2017-10-24 2020-05-19 Spin Memory, Inc. Over-voltage write operation of tunnel magnet-resistance (“TMR”) memory device and correcting failure bits therefrom by using on-the-fly bit failure detection and bit redundancy remapping techniques
US10811594B2 (en) 2017-12-28 2020-10-20 Spin Memory, Inc. Process for hard mask development for MRAM pillar formation using photolithography
US10395712B2 (en) 2017-12-28 2019-08-27 Spin Memory, Inc. Memory array with horizontal source line and sacrificial bitline per virtual source
US10360962B1 (en) 2017-12-28 2019-07-23 Spin Memory, Inc. Memory array with individually trimmable sense amplifiers
US10891997B2 (en) 2017-12-28 2021-01-12 Spin Memory, Inc. Memory array with horizontal source line and a virtual source line
US10395711B2 (en) 2017-12-28 2019-08-27 Spin Memory, Inc. Perpendicular source and bit lines for an MRAM array
US10840439B2 (en) 2017-12-29 2020-11-17 Spin Memory, Inc. Magnetic tunnel junction (MTJ) fabrication methods and systems
US10546624B2 (en) 2017-12-29 2020-01-28 Spin Memory, Inc. Multi-port random access memory
US10886330B2 (en) 2017-12-29 2021-01-05 Spin Memory, Inc. Memory device having overlapping magnetic tunnel junctions in compliance with a reference pitch
US10963379B2 (en) 2018-01-30 2021-03-30 Microsoft Technology Licensing, Llc Coupling wide memory interface to wide write back paths
US10446744B2 (en) 2018-03-08 2019-10-15 Spin Memory, Inc. Magnetic tunnel junction wafer adaptor used in magnetic annealing furnace and method of using the same
US11107974B2 (en) 2018-03-23 2021-08-31 Spin Memory, Inc. Magnetic tunnel junction devices including a free magnetic trench layer and a planar reference magnetic layer
US11107978B2 (en) 2018-03-23 2021-08-31 Spin Memory, Inc. Methods of manufacturing three-dimensional arrays with MTJ devices including a free magnetic trench layer and a planar reference magnetic layer
US10784437B2 (en) 2018-03-23 2020-09-22 Spin Memory, Inc. Three-dimensional arrays with MTJ devices including a free magnetic trench layer and a planar reference magnetic layer
US10411185B1 (en) 2018-05-30 2019-09-10 Spin Memory, Inc. Process for creating a high density magnetic tunnel junction array test platform
US10600478B2 (en) 2018-07-06 2020-03-24 Spin Memory, Inc. Multi-bit cell read-out techniques for MRAM cells with mixed pinned magnetization orientations
US10593396B2 (en) 2018-07-06 2020-03-17 Spin Memory, Inc. Multi-bit cell read-out techniques for MRAM cells with mixed pinned magnetization orientations
US10650875B2 (en) 2018-08-21 2020-05-12 Spin Memory, Inc. System for a wide temperature range nonvolatile memory
WO2020051921A1 (en) * 2018-09-15 2020-03-19 Intel Corporation Runtime cell row replacement in a memory
US10824429B2 (en) 2018-09-19 2020-11-03 Microsoft Technology Licensing, Llc Commit logic and precise exceptions in explicit dataflow graph execution architectures
US11621293B2 (en) 2018-10-01 2023-04-04 Integrated Silicon Solution, (Cayman) Inc. Multi terminal device stack systems and methods
US10971680B2 (en) 2018-10-01 2021-04-06 Spin Memory, Inc. Multi terminal device stack formation methods
US11107979B2 (en) 2018-12-28 2021-08-31 Spin Memory, Inc. Patterned silicide structures and methods of manufacture
US11513705B2 (en) * 2021-04-19 2022-11-29 EMC IP Holding Company, LLC System and method for volume polarization across multiple storage systems
US20230401120A1 (en) * 2022-05-18 2023-12-14 Samsung Electronics Co., Ltd. Systems and methods for expandable memory error handling

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5598553A (en) * 1994-03-08 1997-01-28 Exponential Technology, Inc. Program watchpoint checking using paging with sub-page validity
US6112286A (en) * 1997-09-19 2000-08-29 Silicon Graphics, Inc. Reverse mapping page frame data structures to page table entries
US6477612B1 (en) * 2000-02-08 2002-11-05 Microsoft Corporation Providing access to physical memory allocated to a process by selectively mapping pages of the physical memory with virtual memory allocated to the process
US6581142B1 (en) * 2000-09-01 2003-06-17 International Business Machines Corporation Computer program product and method for partial paging and eviction of microprocessor instructions in an embedded computer
US20070226450A1 (en) * 2006-02-07 2007-09-27 International Business Machines Corporation Method and system for unifying memory access for CPU and IO operations
US20080052015A1 (en) * 2006-04-06 2008-02-28 Advantest Corporation Test apparatus and test method
US7434100B2 (en) * 2000-09-22 2008-10-07 Microsoft Corporation Systems and methods for replicating virtual memory on a host computer and debugging using replicated memory
US20090287902A1 (en) * 2008-05-15 2009-11-19 Smooth-Stone, Inc. C/O Barry Evans Distributed computing system with universal address system and method
US20090323417A1 (en) * 2008-06-30 2009-12-31 Tomoji Takada Semiconductor memory repairing a defective bit and semiconductor memory system
US20100100715A1 (en) * 2008-10-21 2010-04-22 Thomas Michael Gooding Handling debugger breakpoints in a shared instruction system
US20100223447A1 (en) * 2009-02-27 2010-09-02 Serebrin Benjamin C Translate and Verify Instruction for a Processor
US20100269000A1 (en) * 2009-04-21 2010-10-21 Samsung Electronics Co., Ltd. Methods and apparatuses for managing bad memory cell
US20100281202A1 (en) * 2009-04-30 2010-11-04 International Business Machines Corporation Wear-leveling and bad block management of limited lifetime memory devices
US20110055623A1 (en) * 2009-09-03 2011-03-03 Hynix Semiconductor Inc. Solid state storage system with improved data merging efficiency and control method thereof
US20110119538A1 (en) * 2009-11-18 2011-05-19 Microsoft Corporation Dynamically Replicated Memory
US20110145632A1 (en) * 2009-12-11 2011-06-16 Vmware, Inc. Transparent recovery from hardware memory errors
US20110199845A1 (en) * 2010-02-12 2011-08-18 Taiwan Semiconductor Manufacturing Company, Ltd. Redundancy circuits and operating methods thereof
US20110202709A1 (en) * 2008-03-19 2011-08-18 Rambus Inc. Optimizing storage of common patterns in flash memory
US20110209028A1 (en) * 2010-02-24 2011-08-25 Apple Inc. Codeword remapping schemes for non-volatile memories
US20110231713A1 (en) * 2009-11-04 2011-09-22 Hitachi, Ltd. Flash memory module

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4479214A (en) 1982-06-16 1984-10-23 International Business Machines Corporation System for updating error map of fault tolerant memory
US5822784A (en) * 1993-03-19 1998-10-13 Intel Corporation Mechanism supporting execute in place read only memory applications located on removable computer cards
GB9305801D0 (en) 1993-03-19 1993-05-05 Deans Alexander R Semiconductor memory system
US5754817A (en) * 1994-09-29 1998-05-19 Intel Corporation Execution in place of a file stored non-contiguously in a non-volatile memory
US6425058B1 (en) * 1999-09-07 2002-07-23 International Business Machines Corporation Cache management mechanism to enable information-type dependent cache policies
TW200615963A (en) 2004-10-07 2006-05-16 Amic Technology Corp Memory structure which having repair function and its repair method
US20060253682A1 (en) * 2005-05-05 2006-11-09 International Business Machines Corporation Managing computer memory in a computing environment with dynamic logical partitioning
US7610523B1 (en) 2006-02-09 2009-10-27 Sun Microsystems, Inc. Method and template for physical-memory allocation for implementing an in-system memory test
US7653778B2 (en) * 2006-05-08 2010-01-26 Siliconsystems, Inc. Systems and methods for measuring the useful life of solid-state storage devices
JP4821426B2 (en) * 2006-05-11 2011-11-24 富士ゼロックス株式会社 Error recovery program, error recovery device, and computer system
JP4893746B2 (en) 2006-10-27 2012-03-07 富士通株式会社 Address line fault processing apparatus, address line fault processing method, address line fault processing program, information processing apparatus, and memory controller
JPWO2008099786A1 (en) * 2007-02-13 2010-05-27 日本電気株式会社 Memory failure recovery method, information processing apparatus, and program
US20090013148A1 (en) * 2007-07-03 2009-01-08 Micron Technology, Inc. Block addressing for parallel memory arrays
US7675776B2 (en) * 2007-12-21 2010-03-09 Spansion, Llc Bit map control of erase block defect list in a memory
US8195981B2 (en) * 2008-06-03 2012-06-05 International Business Machines Corporation Memory metadata used to handle memory errors without process termination
JP2010009383A (en) * 2008-06-27 2010-01-14 Fujitsu Ltd Memory device and information processing system
US20100037102A1 (en) 2008-08-08 2010-02-11 Seagate Technology Llc Fault-tolerant non-volatile buddy memory structure
US20100251013A1 (en) * 2009-03-26 2010-09-30 Inventec Corporation Method for processing bad block in redundant array of independent disks
JP5377182B2 (en) * 2009-09-10 2013-12-25 株式会社東芝 Control device
US8201024B2 (en) * 2010-05-17 2012-06-12 Microsoft Corporation Managing memory faults

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5598553A (en) * 1994-03-08 1997-01-28 Exponential Technology, Inc. Program watchpoint checking using paging with sub-page validity
US6112286A (en) * 1997-09-19 2000-08-29 Silicon Graphics, Inc. Reverse mapping page frame data structures to page table entries
US6477612B1 (en) * 2000-02-08 2002-11-05 Microsoft Corporation Providing access to physical memory allocated to a process by selectively mapping pages of the physical memory with virtual memory allocated to the process
US6581142B1 (en) * 2000-09-01 2003-06-17 International Business Machines Corporation Computer program product and method for partial paging and eviction of microprocessor instructions in an embedded computer
US7434100B2 (en) * 2000-09-22 2008-10-07 Microsoft Corporation Systems and methods for replicating virtual memory on a host computer and debugging using replicated memory
US20070226450A1 (en) * 2006-02-07 2007-09-27 International Business Machines Corporation Method and system for unifying memory access for CPU and IO operations
US20080052015A1 (en) * 2006-04-06 2008-02-28 Advantest Corporation Test apparatus and test method
US20110202709A1 (en) * 2008-03-19 2011-08-18 Rambus Inc. Optimizing storage of common patterns in flash memory
US20090287902A1 (en) * 2008-05-15 2009-11-19 Smooth-Stone, Inc. C/O Barry Evans Distributed computing system with universal address system and method
US20090323417A1 (en) * 2008-06-30 2009-12-31 Tomoji Takada Semiconductor memory repairing a defective bit and semiconductor memory system
US20100100715A1 (en) * 2008-10-21 2010-04-22 Thomas Michael Gooding Handling debugger breakpoints in a shared instruction system
US20100223447A1 (en) * 2009-02-27 2010-09-02 Serebrin Benjamin C Translate and Verify Instruction for a Processor
US20100269000A1 (en) * 2009-04-21 2010-10-21 Samsung Electronics Co., Ltd. Methods and apparatuses for managing bad memory cell
US20100281202A1 (en) * 2009-04-30 2010-11-04 International Business Machines Corporation Wear-leveling and bad block management of limited lifetime memory devices
US20110055623A1 (en) * 2009-09-03 2011-03-03 Hynix Semiconductor Inc. Solid state storage system with improved data merging efficiency and control method thereof
US20110231713A1 (en) * 2009-11-04 2011-09-22 Hitachi, Ltd. Flash memory module
US20110119538A1 (en) * 2009-11-18 2011-05-19 Microsoft Corporation Dynamically Replicated Memory
US20110145632A1 (en) * 2009-12-11 2011-06-16 Vmware, Inc. Transparent recovery from hardware memory errors
US20110199845A1 (en) * 2010-02-12 2011-08-18 Taiwan Semiconductor Manufacturing Company, Ltd. Redundancy circuits and operating methods thereof
US20110209028A1 (en) * 2010-02-24 2011-08-25 Apple Inc. Codeword remapping schemes for non-volatile memories

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8386836B2 (en) 2010-05-17 2013-02-26 Microsoft Corporation Managing memory faults
US9442833B1 (en) * 2010-07-20 2016-09-13 Qualcomm Incorporated Managing device identity
US8700834B2 (en) 2011-09-06 2014-04-15 Western Digital Technologies, Inc. Systems and methods for an enhanced controller architecture in data storage systems
US8707104B1 (en) 2011-09-06 2014-04-22 Western Digital Technologies, Inc. Systems and methods for error injection in data storage systems
US8713357B1 (en) * 2011-09-06 2014-04-29 Western Digital Technologies, Inc. Systems and methods for detailed error reporting in data storage systems
US9021168B1 (en) * 2011-09-06 2015-04-28 Western Digital Technologies, Inc. Systems and methods for an enhanced controller architecture in data storage systems
US9058261B1 (en) * 2011-09-06 2015-06-16 Western Digital Technologies, Inc. Systems and methods for detailed error reporting in data storage systems
US9195530B1 (en) 2011-09-06 2015-11-24 Western Digital Technologies, Inc. Systems and methods for improved data management in data storage systems
US9542287B1 (en) 2011-09-06 2017-01-10 Western Digital Technologies, Inc. Systems and methods for error injection in data storage systems
US9053008B1 (en) 2012-03-26 2015-06-09 Western Digital Technologies, Inc. Systems and methods for providing inline parameter service in data storage devices
US9003223B2 (en) 2012-09-27 2015-04-07 International Business Machines Corporation Physical memory fault mitigation in a computing environment
US10180866B2 (en) 2012-09-27 2019-01-15 International Business Machines Corporation Physical memory fault mitigation in a computing environment
CN105589810A (en) * 2014-11-11 2016-05-18 爱思开海力士有限公司 Data storage device and operating method thereof
US9501373B2 (en) * 2014-11-11 2016-11-22 SK Hynix Inc. Data storage device and operating method thereof
KR20160056446A (en) * 2014-11-11 2016-05-20 에스케이하이닉스 주식회사 Data storage device and operating method thereof
TWI654520B (en) 2014-11-11 2019-03-21 韓商愛思開海力士有限公司 Data storage device and operating method thereof
US20160132406A1 (en) * 2014-11-11 2016-05-12 SK Hynix Inc. Data storage device and operating method thereof
KR102264757B1 (en) 2014-11-11 2021-06-16 에스케이하이닉스 주식회사 Data storage device and operating method thereof
CN105589810B (en) * 2014-11-11 2020-10-30 爱思开海力士有限公司 Data storage device and method of operating the same
US10115446B1 (en) * 2015-04-21 2018-10-30 Spin Transfer Technologies, Inc. Spin transfer torque MRAM device with error buffer
US10446210B2 (en) 2016-09-27 2019-10-15 Spin Memory, Inc. Memory instruction pipeline with a pre-read stage for a write operation for reducing power consumption in a memory device that uses dynamic redundancy registers
US10460781B2 (en) 2016-09-27 2019-10-29 Spin Memory, Inc. Memory device with a dual Y-multiplexer structure for performing two simultaneous operations on the same row of a memory bank
US10424726B2 (en) 2017-12-28 2019-09-24 Spin Memory, Inc. Process for improving photoresist pillar adhesion during MRAM fabrication
US10367139B2 (en) 2017-12-29 2019-07-30 Spin Memory, Inc. Methods of manufacturing magnetic tunnel junction devices
US10424723B2 (en) 2017-12-29 2019-09-24 Spin Memory, Inc. Magnetic tunnel junction devices including an optimization layer
US10784439B2 (en) 2017-12-29 2020-09-22 Spin Memory, Inc. Precessional spin current magnetic tunnel junction devices and methods of manufacture
US10840436B2 (en) 2017-12-29 2020-11-17 Spin Memory, Inc. Perpendicular magnetic anisotropy interface tunnel junction devices and methods of manufacture
US10438996B2 (en) 2018-01-08 2019-10-08 Spin Memory, Inc. Methods of fabricating magnetic tunnel junctions integrated with selectors
US10438995B2 (en) 2018-01-08 2019-10-08 Spin Memory, Inc. Devices including magnetic tunnel junctions integrated with selectors
US10734573B2 (en) 2018-03-23 2020-08-04 Spin Memory, Inc. Three-dimensional arrays with magnetic tunnel junction devices including an annular discontinued free magnetic layer and a planar reference magnetic layer
US10559338B2 (en) 2018-07-06 2020-02-11 Spin Memory, Inc. Multi-bit cell read-out techniques
US10692569B2 (en) 2018-07-06 2020-06-23 Spin Memory, Inc. Read-out techniques for multi-bit cells
US10699761B2 (en) 2018-09-18 2020-06-30 Spin Memory, Inc. Word line decoder memory architecture
US11183267B2 (en) * 2019-07-12 2021-11-23 Micron Technology, Inc. Recovery management of retired super management units
US11929138B2 (en) 2019-07-12 2024-03-12 Micron Technology, Inc. Recovery management of retired super management units
US11295830B2 (en) * 2019-10-01 2022-04-05 SK Hynix Inc. Memory system and operating method of the memory system
CN115686901A (en) * 2022-10-25 2023-02-03 超聚变数字技术有限公司 Memory fault analysis method and computer equipment
CN116126581A (en) * 2023-04-10 2023-05-16 阿里云计算有限公司 Memory fault processing method, device, system, equipment and storage medium

Also Published As

Publication number Publication date
US8201024B2 (en) 2012-06-12
US20120221905A1 (en) 2012-08-30
US8386836B2 (en) 2013-02-26

Similar Documents

Publication Publication Date Title
US8201024B2 (en) Managing memory faults
US8458514B2 (en) Memory management to accommodate non-maskable failures
JP6518191B2 (en) Memory segment remapping to address fragmentation
US8407439B2 (en) Managing memory systems containing components with asymmetric characteristics
US8024546B2 (en) Opportunistic page largification
CN100456266C (en) Demand paging apparatus and method for embedded system
JP6882662B2 (en) Migration program, information processing device and migration method
US10789019B2 (en) Storage device capable of managing jobs without intervention of a processor
US20050132249A1 (en) Apparatus method and system for fault tolerant virtual memory management
US11656985B2 (en) External memory as an extension to local primary memory
US20060294339A1 (en) Abstracted dynamic addressing
CN1551243A (en) Apparatus and method for managing bad blocks in a flash memory
US7240177B2 (en) System and method for improving performance of dynamic memory removals by reducing file cache size
CN112667422A (en) Memory fault processing method and device, computing equipment and storage medium
US9015535B2 (en) Information processing apparatus having memory dump function, memory dump method, and recording medium
US20230297257A1 (en) Resiliency and performance for cluster memory
US11907065B2 (en) Resiliency and performance for cluster memory
US20200250104A1 (en) Apparatus and method for transmitting map information in a memory system
WO2022193768A1 (en) Method for executing memory read-write instruction, and computing device
US20090300290A1 (en) Memory Metadata Used to Handle Memory Errors Without Process Termination
US20090031100A1 (en) Memory reallocation in a computing environment
US11687251B2 (en) Dynamic repartition of memory physical address mapping
US11500720B2 (en) Apparatus and method for controlling input/output throughput of a memory system
US20230017804A1 (en) Copy and restore of page in byte-addressable chunks of cluster memory

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BURGER, DOUG;LARUS, JAMES;STRAUSS, KARIN;AND OTHERS;SIGNING DATES FROM 20100510 TO 20100513;REEL/FRAME:024391/0050

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034544/0001

Effective date: 20141014

FPAY Fee payment

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY