US20210357279A1 - Handling operation system (os) in system for predicting and managing faulty memories based on page faults - Google Patents
Handling operation system (os) in system for predicting and managing faulty memories based on page faults Download PDFInfo
- Publication number
- US20210357279A1 US20210357279A1 US17/198,979 US202117198979A US2021357279A1 US 20210357279 A1 US20210357279 A1 US 20210357279A1 US 202117198979 A US202117198979 A US 202117198979A US 2021357279 A1 US2021357279 A1 US 2021357279A1
- Authority
- US
- United States
- Prior art keywords
- address
- faulty
- memory
- physical addresses
- row
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1008—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
- G06F11/1012—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using codes or arrangements adapted for a specific type of error
- G06F11/1016—Error in accessing a memory location, i.e. addressing error
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1027—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
- G06F12/1036—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] for multiple virtual address spaces, e.g. segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1008—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
- G06F11/1064—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices in cache or content addressable memories
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/073—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a memory management context, e.g. virtual memory or cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0772—Means for error signaling, e.g. using interrupts, exception flags, dedicated error registers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1008—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
- G06F11/1048—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using arrangements adapted for a specific error detection or correction feature
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0653—Monitoring storage devices or systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0662—Virtualisation aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45575—Starting, stopping, suspending or resuming virtual machine instances
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45583—Memory management, e.g. access or allocation
Definitions
- the inventive concept relates to a data processing system, and more particularly, to a system for predicting a faulty memory based on a page fault and managing the predicted faulty memory and a method of handling an operating system (OS) of the system.
- OS operating system
- a data processing system like a data center is hosted by many companies and their computer systems.
- the data center is used to distribute hosted applications and/or transactions and includes networked computer resources that are often referred to as clouds, e.g., servers, disks, virtual machines, etc.
- clouds e.g., servers, disks, virtual machines, etc.
- companies are clients of the data center.
- Data centers offer clients a number of advantages, including reduced cost, easy expansion, and reduced management load.
- a page fault may occur due to a fault in the memory.
- availability constraints of a data center e.g., aborting and restarting a started operation
- the page fault may arise from a single bit failure or two or more bits failures. Therefore, when a faulty memory may be identified based on a page fault, it will be possible to predict a memory fault due to the faulty memories. Also, managing a predicted memory fault in advance will be desirable for maintaining the availability of a data center.
- the inventive concept provides a system for predicting a memory fault based on a page fault and managing the predicted memory fault and a method of handling an operating system of the system.
- a method of operating a system running a virtual machine that executes an application and an operating system includes performing, by the OS, first address translation from a plurality of first virtual addresses processed by the application to a plurality of first physical addresses for accessing a memory, identifying, by the OS, a plurality of faulty physical addresses among the plurality of first physical addresses, wherein each of the plurality of faulty physical addresses corresponds to a corresponding first physical address, among the plurality of first physical addresses, associated with a faulty memory cell of the memory, analyzing, by the OS, a row address and a column address of each of the plurality of faulty physical addresses and specifying, by the OS, a fault type of the plurality of faulty physical addresses based on the analyzing of the row address and the column address of each of the plurality of faulty physical addresses, wherein the fault type includes a row failure, a column failure or a block failure, and performing, by the OS, second address translation from a plurality of
- a non-transitory computer-readable recording medium storing computer-executable instructions for performing a method of operating a system running a virtual machine that executes an application and an operating system (OS) includes executing the application using a processor and a memory of the system, performing first address translation from a plurality of first virtual addresses allocated to the application to a plurality of first physical addresses for accessing the memory, identifying, during a time when the application is executed, a plurality of faulty physical addresses among the plurality of first physical addresses translated from the plurality of first virtual addresses, specifying a fault type of the plurality of faulty physical addresses, wherein the fault type includes a row failure, a column failure, or a block failure, and performing second address translation from a plurality of second virtual addresses to a plurality of second physical addresses based on a faulty address, thereby excluding the faulty address from the plurality of second physical addresses.
- the faulty address corresponds to the fault type of the plurality of faulty physical addresses, and
- a system operating in a virtual machine environment includes a memory, and a processor operatively coupled to the memory.
- the processor executes an application in cooperation with the memory, performs first address translation from a plurality of first virtual addresses processed by the application to a plurality of first physical addresses for accessing the memory, identifies a plurality of faulty physical addresses among the plurality of first physical addresses, wherein each of the plurality of faulty physical addresses corresponds to a corresponding first physical address, among the plurality of first physical addresses, associated with a faulty memory cell of the memory, specifies a fault type of the plurality of faulty physical addresses of the memory, wherein the fault type includes a row failure, a column failure, or a block failure, and performs second address translation from a plurality of second virtual addresses to a plurality of second physical addresses based on a faulty address to prevent the faulty address from being used for the second address translation.
- the faulty address corresponds to the fault type of the plurality of faulty physical addresses, and includes a faulty row address of the row failure, a faulty column address of the column failure or a faulty block address of the block failure.
- the processor is further configured to, without causing the system to be rebooted, specify the fault type, store the faulty address, and perform the second address translation.
- a memory device includes a memory cell array comprising a plurality of memory cells, and a repair control circuit configured to repair a plurality of faulty memory cells from among the plurality of memory cells by using a plurality of redundancy memory cells in the memory cell array.
- the repair control circuit is configured to receive, during rebooting of the memory device, a source address of the plurality of faulty memory cells from a processor to which the memory device is operatively coupled and repair the source address with a destination address of the plurality of redundancy memory cells.
- the source address of the plurality of faulty memory cells corresponds to a faulty address including a common row address of the plurality of faulty memory cells, a common column address of the plurality of faulty memory cells, or a block address of the plurality of faulty memory cells.
- the plurality of faulty memory cells are identified during execution of a virtual machine running on the processor.
- the faulty address of the plurality of faulty memory cells is included in a plurality of physical addresses for accessing the memory device by a system.
- the plurality of physical addresses are translated from a plurality of virtual addresses used by the processor.
- FIG. 1 is a block diagram conceptually showing a system according to embodiments of the inventive concept
- FIG. 2 is a diagram for describing an example of address mapping for address translation performed by a processor of FIG. 1 ;
- FIG. 3 is a diagram showing an example of an address mapping table of FIG. 2 ;
- FIG. 4 is a diagram for describing an example of page table entries of the address mapping table of FIG. 3 ;
- FIG. 5 is a diagram for describing the row-based fault attribute shown in the address mapping table of FIG. 3 ;
- FIG. 6 is a diagram for describing the column-based fault attribute shown in the address mapping table of FIG. 3 ;
- FIG. 7 is a diagram for describing the block-based fault attribute shown in the address mapping table of FIG. 3 ;
- FIG. 8 is a flowchart of a method of handling a runtime OS of a system according to an embodiment of the inventive concept
- FIG. 9 is a conceptual diagram for describing a repair operation performed when a system of FIG. 1 is booted
- FIGS. 10 to 12 are diagrams for describing a repair operation performed in a memory of FIG. 1 ;
- FIG. 13 is a flowchart of a method of booting a system according to an embodiment of the inventive concept.
- FIG. 1 is a block diagram conceptually illustrating an example system that may be used to predict failed memories based on page faults and manage predicted failed memories according to embodiments of the inventive concept.
- a system 100 may be a data center including dozens of host machines or servers for performing hundreds of virtual machines VM. Although various hardware components of the system 100 to be described below are shown in FIG. 1 , the inventive concept is not limited thereto, and other components may be employed.
- the system 100 may include a processor 110 , a memory 120 , and a basic input/output system (BIOS) memory 130 .
- BIOS basic input/output system
- the processor 110 may be communicatively connected to the memory 120 through a memory interface 140 .
- the processor 110 may be connected to the BIOS memory 130 through an interface 150 of various types like a serial peripheral interface (SPI) or a low pin count (LPC) bus.
- SPI serial peripheral interface
- LPC low pin count
- the memory 120 and the BIOS memory 130 connected to the processor 110 may be referred to as system memories.
- the system 100 may be, for example, a computing device like a laptop computer, a desktop computer, a server computer, a workstation, a portable communication terminal, a personal digital assistant (PDA), a portable multimedia player (PMP), a smart phone, or any other suitable computers, a VM, or a virtual computing device thereof.
- the system 100 may be one of components included in a computing system, e.g., a graphics card.
- the processor 110 is a functional block for performing general computer operations in the system 100 and may be a processor like a central processing unit (CPU), a digital signal processor (DSP), a network processor, an application processor (AP), or any other device for executing codes.
- CPU central processing unit
- DSP digital signal processor
- AP application processor
- the processor 110 may be configured to execute instructions, software, firmware, or pieces of combinations thereof that may be executed by one or more machines.
- the processor 110 may include any number of processor cores.
- the processor 110 may include a single-core or a multi-core like dual-core, quad-core, and hexa-core.
- FIG. 1 shows the system 100 including one processor 110 , according to an embodiment, the system 100 may include two or more processors.
- the processor 110 may execute software in a virtualized environment.
- a virtual machine VM in the system 100 may include an application APP and an operating system (OS). Since virtual machines VM may be dynamically changed during their use initiated and stopped by a user, the number of the virtual machine VM that may be executed on processor 110 may vary.
- Software entities such as an OS consider the processor 110 as a logic processor or a processing element capable of executing virtual machines VM simultaneously. For example, n (n is a natural number) OS may consider the processor 110 as n logic processors or n processing elements.
- each application APP uses a virtual address space, and thus a virtual address VA (i.e., an address that may be used by software) is used.
- VA i.e., an address that may be used by software
- the OS in each virtual machine VM may control a time point at which a particular application APP accesses a given memory 120 and may control addresses that are accessed by the application APP to at least a certain degree.
- the OS in the virtual machine VM may perform and manage mapping between virtual addresses VA and physical addresses PA in the virtual machine VM.
- Physical addresses PA generated by the OS are system physical addresses PA (i.e., physical addresses that may be used by the memory controller 112 to access the memory 120 ) of a physical address space throughout the memory 120 of the system 100 .
- the OS may perform address translation (e.g., address mapping) between virtual addresses VA and system physical addresses PA.
- FIG. 1 shows two virtual machines 160 and 161 .
- Each of the virtual machines 160 and 161 includes an OS and at least one application APP.
- a plurality of virtual machines VM may be executed, and the processor 110 may execute and implement a large number of applications APP and/or transactions in terms of time and memory footprints through combination of hardware acceleration and software.
- an application 170 and an OS 180 accessing the memory 120 will be described under an assumption that the system 100 runs a first virtual machine 160 .
- the BIOS memory 130 may store a BIOS code for booting the system 100 .
- the BIOS memory 130 may be implemented by a non-volatile memory device.
- the non-volatile memory devices may be implemented by an electrically erasable programmable read-only memory (EEPROM), a flash memory, a resistive RAM (RRAM), a magnetic RAM (MRAM), a phase change RAM (PRAM), a ferroelectric RAM (FRAM), a nano floating gate memory (NFGM), a polymer RAM (PoRAM), or a similar memory.
- the BIOS code may include a power-on-self-test (POST) code for detecting hardware components of the system 100 like a system board, the memory 120 , a disc drive, and input/output (I/O) devices and checking whether the hardware components are normally working and/or a part of the POST code.
- POST power-on-self-test
- the BIOS code may include various algorithms that are configured to allow the processor 110 to normally interoperate with the memory 120 .
- the memory interface 140 is shown as being connected through a single signal line between the processor 110 and the memory 120 for simplicity of illustration, but may be actually connected through a plurality of signal lines.
- the memory interface 140 includes connectors for connecting the memory controller 112 and the memory 120 to each other.
- the connectors may be implemented as pins, balls, signal lines, or other hardware components.
- clocks, commands, addresses, data, etc. may be exchanged between the memory controller 112 and the memory 120 through the memory interface 140 .
- the memory interface 140 may be implemented as one channel including a plurality of signal lines or may be implemented as a plurality of channels. Also, one or more memories 120 may be connected to a corresponding channel of the plurality of channels.
- the processor 110 may include a memory controller 112 that controls data transmission/reception to/from the memory 120 .
- the memory controller 112 may access the memory 120 according to a memory request of the processor 110 , and a system physical address may be provided to access the memory 120 .
- the memory controller 112 may include a memory physical layer interface, that is, a memory PHY for memory interfacing like selecting a row and a column corresponding to a memory cell, writing data to a memory cell, or reading written data.
- the memory controller 112 performing the functions stated above may be implemented in various forms.
- memory controller 112 may be implemented by one or more hardware components (e.g., analog circuits, logic circuits) and program codes of software and/or firmware.
- the memory controller 112 may be integrated into the processor 110 , such that the memory 120 may be accessed by the processor 110 at high speed and/or low power consumption.
- Data used for the operation of the system 100 may be stored in or loaded from the memory 120 .
- Data processed or to be processed by the processor 110 may be stored in or read from the memory 120 .
- the memory 120 may include a volatile memory like a static random access memory (SRAM) and a dynamic random access memory (DRAM) and/or a non-volatile memory like a flash memory, an RRAM, an MRAM, a PRAM, and a FRAM.
- SRAM static random access memory
- DRAM dynamic random access memory
- non-volatile memory like a flash memory, an RRAM, an MRAM, a PRAM, and a FRAM.
- the memory 120 may include memory cells for storing data.
- a memory cell may be accessed using an address.
- Write data may be written to a memory cell indicated by an address, and read data may be loaded from a memory cell indicated by an address.
- one memory region in the memory 120 may include a memory cell array with a plurality of memory cells which are accessed using a plurality of addresses.
- the memory 120 may be configured to repair a faulty cell with a redundancy cell.
- the memory 120 is capable of performing post package repair (PPR) that repairs faulty cells additionally occurring due to continuous use with redundancy cells.
- PPR post package repair
- the processor 110 may provide addresses to the memory 120 to exchange data which are read from the memory 120 and stored in the memory 120 during execution of the application 170 .
- the memory 120 may store or read data based on requests (e.g., commands and addresses) received from the processor 110 .
- an address processed by the application 170 may be referred to as a virtual address VA, and an address for accessing the memory 120 may be referred to as a system physical address PA (i.e., a physical address).
- the OS 180 may perform address translation between a virtual address VA processed by an application APP and a system physical address PA for the memory 120 .
- the application 170 processed by the processor 110 may operate with reference to the virtual addresses VA, and when accessing the memory, may use the system physical addresses PA translated from the virtual addresses VA.
- FIG. 2 is a diagram for describing an example of address mapping for address translation performed by the processor 110 of FIG. 1 .
- the memory 120 has a system physical address PA range from an address zero (0) to an upper level.
- the application 170 may have a virtual address VA range starting from the upper level of the system physical address PA range of the memory 120 .
- Each address Pa of virtual addresses VA may be mapped to an address Pg (or an address space) of system physical addresses PA of the memory 120 .
- the OS 180 may allocate a page requested for memory access by the application 170 to a page of the memory 120 .
- a reference designated from a virtual address Pa to a corresponding system physical address Pg may be stored in an address mapping table 200 as a page table entry PTE.
- a page may be a unit of address translation. In other words, addresses in a virtual page may be translated into addresses in a corresponding physical page. Pages may have various sizes ranging from 4 kilobytes up to Megabytes or even Gigabytes.
- addresses shown in FIG. 2 are merely examples, and are not necessarily those of actual memory addressed. Also, the example memory shown in FIG. 2 does not represent or imply limitations on the inventive concept.
- FIG. 3 is a diagram showing an example of the address mapping table 200 of FIG. 2 .
- the OS 180 may manage the address mapping table 200 .
- Page table entries PTE of the address mapping table 200 may include information about a mapping relationship between virtual addresses VA and system physical address PA.
- the address mapping table 200 may be implemented in the form of a look-up table.
- the OS 180 may translate the virtual addresses VA into the system physical addresses PA by referring to the page table entries PTE of the address mapping table 200 .
- a virtual address Va 1 may correspond to a system physical address Pa 1 .
- the OS 180 may map the virtual address Va 1 to the system physical address Pa 1 .
- the OS 180 may process a request received from the application 170 together with the virtual address Va 1 in association with a memory cell indicated by the system physical address Pa 1 .
- the OS 180 may map virtual addresses Va 2 and Va 3 to system physical addresses Pa 2 and Pa 3 , map virtual addresses Vb 1 , Vb 2 , and Vb 3 to system physical addresses Pb 1 , Pb 2 , Pb 3 , and map virtual addresses Vc 1 , Vc 2 , Vc 3 , Vc 4 , and Vc 5 to system physical addresses Pc 1 , Pc 2 , Pc 3 , Pc 4 , and Pc 5 .
- the OS 180 may process a request from the application 170 in association with memory cells indicated by system physical addresses Pa 2 , Pa 3 , Pb 1 , Pb 2 , Pb 3 , Pc 1 , Pc 2 , Pc 3 , Pc 4 , and Pc 5 corresponding to virtual addresses Va 2 , Va 3 , Vb 1 , Vb 2 , Vb 3 , Vc 1 , Vc 2 , and Vc 3 .
- FIG. 4 is a diagram for describing an example of the page table entries PTE of the address mapping table 200 of FIG. 3 .
- the page table entries PTE of the address mapping table 200 may be a table of translation data that may be used to translate a virtual address VA to a system physical address PA.
- the translation table may store translation data in any way. For example, depending on a translation level, various subsets of virtual address VA bits or system physical addresses PA may be used to index the levels of the translation table. Also, each level may be at the end of a translation (i.e., storing an actual page number for translation) or may point another table (indexed by another set of address bits) in a hierarchical manner.
- the page table entries PTE may include pointers to other page tables in a hierarchical manner.
- the page table entry PTE may indicate a level in a page table layer structure, e.g., page map levels 2, 3, or 4, at which translation needs to be started for requests mapped to the corresponding page table entry PTE. Therefore, the page table entries PTE of a table of a page map level 2, 3, or 4 may include any number of bit entries.
- the page table entries PTE shown in FIG. 4 are a first level translation table.
- the first level translation table fields related to address translation are provided to map a virtual address to a system physical address.
- the present invention is not limited thereto.
- the present invention may apply to a second level address translation table where a single bit field and/or a plurality of bit fields may be provided for translation level identification, depths of tables, indication of translation invalid/valid, etc. Further action may occur with reference to the bit field or bit fields to complete address translation.
- the page table entry PTE relates to a translation of virtual page address bits into actual system physical page address bits and is a 64-bit entry, for example.
- the page table entry PTE may include a virtual address VA field and a system physical address PA field.
- the virtual address VA field is configured to increase a virtual address space to be used by the application 170
- the system physical address PA field indicates an address of the memory 120 corresponding to the virtual address VA.
- the system physical address PA field may include PTE[11:0] bits
- the virtual address VA field may include PTE[63:12] bits.
- the system physical address PA field may include row address R[5:0] bits and column address C[5:0] bits.
- FIG. 5 is a diagram for describing a row-based fault attribute shown in the address mapping table of FIG. 3 .
- the OS 180 may provide system physical addresses Pa 1 , Pa 2 , and Pa 3 corresponding to the virtual addresses Va 1 , Va 2 , and Va 3 for accessing the memory 120 .
- a system physical address Pa 1 corresponding to a virtual address Va 1 may be provided as a row address R[5:0] bits of 011000 and a column address C[5:0] bits of 000001.
- the processor 110 may execute the application 170 by accessing a memory cell indicated by the row address R[5:0] bits of 011000 and the column address C[5:0] bits of 000001 of the system physical address Pa 1 through the memory controller 112 .
- a system physical address Pa 2 corresponding to a virtual address Va 2 may be provided as a row address R[5:0] bits of 011000 and a column address C[5:0] bits of 000010
- a system physical address Pa 3 corresponding to a virtual address Va 3 may be provided as a row address R[5:0] bits of 011000 and a column address C[5:0] bits of 000100.
- the processor 110 may execute the application 170 by accessing a memory cell indicated by the row address R[5:0] bits of 011000 and the column address C[5:0] bits of 000010 of the system physical address Pa 2 and a memory cell indicated by the row address R[5:0] bits of 011000 and the column address C[5:0] bits of 000100 of the system physical address Pa 3 through the memory controller 112 .
- execution associated with the system physical address Pa 2 of the memory 120 does not operate properly.
- execution associated with the system physical address Pa 3 of the memory 120 does not operate properly.
- a page fault may occur.
- an error may occur in an execution path regarding the memory 120 , for example.
- memory errors may occur at system physical addresses Pa 2 and Pa 3 .
- One of the major causes of such memory errors is when memory cells addressed by the system physical addresses Pa 2 and Pa 3 fail, that is, when a hardware exception event is detected.
- the system 100 when such exception events frequently occur, the system 100 , which is pending, may be stopped and attempts for resuming the system 100 (i.e., rebooting of the system 100 ) may be made. Such a solution is unable to achieve acceleration of the system 100 .
- the OS 180 may perform controls to process exception events without stopping the system 100 .
- the OS 180 may continue operating the system 100 by combining hardware support from the processor 110 with OS codes. As described in more detail below, the mechanism of the OS 180 for this function may be provided.
- the OS 180 may become aware of (i.e., identify) faulty pages (i.e., faulty physical addresses) of the system physical addresses Pa 2 and Pa 3 .
- the faulty addresses may refer to physical addresses associated with faulty memory cells.
- the OS 180 may observe (i.e., analyze) bits of the row address R[5:0] and bits of the column address C[5:0] identified at the system physical addresses Pa 2 and Pa 3 , thereby determining that the system physical addresses Pa 2 and Pa 3 have the same row address R[5:0] bits of 011000.
- the OS 180 may expect that there is a high possibility that the memory cells accessed with the row address are faulty. Therefore, the OS 180 may predict or consider a fault type of memory cells accessed with the row address R[5:0] bits of 011000 in the memory region of the memory 120 as possible row-based fault.
- the row address R[5:0] bits of 011000 may be referred to as a faulty row address FRA of the fault type.
- the OS 180 may be given a privilege to specify memory cells accessed with the row address R[5:0] bits of 011000 as row-based fault.
- the OS 180 translates a system physical address PA corresponding to a virtual address VA at which the application 170 is being executed on the processor 110 , the OS 180 does not provide a faulty row address FRA as the system physical address PA, such that row-based faulty memory cells are not selected.
- the OS 180 may store the faulty row address FRA in the BIOS memory 130 ( FIG. 1 ).
- FIG. 6 is a diagram for describing a column-based fault attribute shown in the address mapping table of FIG. 3 .
- the OS 180 may provide system physical addresses Pb 1 , Pb 2 , and Pb 3 corresponding to the virtual addresses Vb 1 , Vb 2 , and Vb 3 for accessing the memory 120 .
- a system physical address Pb 1 corresponding to a virtual address Vb 1 may be provided as a row address R[5:0] bits 100000 and a column address C[5:0] bits of 000011.
- the processor 110 may execute the application 170 by accessing a memory cell indicated by the row address R[5:0] bits 100000 and the column address C[5:0] bits of 000011 of the system physical address Pb 1 through the memory controller 112 .
- a system physical address Pb 2 corresponding to a virtual address Vb 2 may be provided as a row address R[5:0] bits of 010000 and a column address C[5:0] bits of 000011
- a system physical address Pb 3 corresponding to a virtual address Vb 3 may be provided as a row address R[5:0] bits of 001000 and a column address C[5:0] bits of 000011.
- the processor 110 may execute the application 170 by accessing a memory cell indicated by the row address R[5:0] bits of 010000 and the column address C[5:0] bits of 000011 of the system physical address Pb 2 and a memory cell indicated by the row address R[5:0] bits of 001000 and the column address C[5:0] bits of 000011 of the system physical address Pb 3 through the memory controller 112 .
- execution associated with the system physical address Pb 2 and execution associated with the system physical address Pb 3 of the memory 120 does not operate properly.
- memory errors may occur at system physical addresses Pb 2 and Pb 3 .
- the OS 180 may become aware of faulty pages of the system physical addresses Pb 2 and Pb 3 .
- the OS 180 may observe bits of the row address R[5:0] and bits of the column address C[5:0] identified at the system physical addresses Pb 2 and Pb 3 .
- the OS 180 may determine that the system physical addresses Pb 2 and Pb 3 have the same column address C[5:0] bits of 000011.
- the OS 180 may expect that there is a high possibility that the memory cells accessed with the column address are faulty. Therefore, the OS 180 may predict or consider a fault type of memory cells accessed with the column address C[5:0] bits of 000011 in the memory region of the memory 120 as possible column-based fault.
- the column address C[5:0] bits of 000011 may be referred to as a faulty column address FRA of the fault type.
- the OS 180 may specify memory cells accessed with the column address C[5:0] bits of 000011 as column-based fault.
- the OS 180 translates a system physical address PA corresponding to a virtual address VA at which the application 170 is being executed on the processor 110 , the OS 180 does not provide a faulty column address FCA as the system physical address PA, such that column-based faulty memory cells are not selected.
- the OS 180 may store the faulty column address FCA in the BIOS memory 130 of FIG. 1 .
- the OS 180 may specify the two faulty pages as have been described above as a row-based fault or a column-based fault.
- the present invention is not limited thereto.
- the OS 180 may specify three or more faulty pages as a row-based fault or a column-based fault when the number of the faulty pages exceeds a reference value.
- the reference value may be set to n (n is a natural number equal to or greater than 2). According to other embodiments, the reference value may be set differently and may also be changed.
- FIG. 7 is a diagram for describing a block-based fault attribute shown in the address mapping table of FIG. 3 .
- the OS 180 may provide system physical addresses Pc 1 , Pc 2 , Pc 3 , Pc 4 , and Pc 5 corresponding to the virtual addresses Vc 1 , Vc 2 , Vc 3 , Vc 4 , and Vc 5 for accessing the memory 120 .
- a system physical address Pc 1 corresponding to a virtual address Vc 1 may be provided as a row address R[5:0] bits of 110001 and a column address C[5:0] bits of 111000.
- a system physical address Pc 2 corresponding to a virtual address Vc 2 may be provided as a row address R[5:0] bits of 110010 and a column address C[5:0] bits of 111010
- a system physical address Pc 3 corresponding to a virtual address Vc 3 may be provided as a row address R[5:0] bits of 110100 and a column address C[5:0] bits of 110000
- a system physical address Pc 4 corresponding to a virtual address Vc 4 may be provided as a row address R[5:0] bits of 111000 and a column address C[5:0] bits of 110010
- a system physical address Pc 5 corresponding to a virtual address Vc 5 may be provided as a row address R[5:0] bits of 111111 and a column address C[5:0] bits of 110100.
- the processor 110 may access the memory cells indicated by the system physical addresses Pc 1 , Pc 2 , Pc 3 , Pc 4 , and Pc 5
- execution associated with the system physical addresses Pc 1 , Pc 2 , Pc 3 , Pc 4 , and Pc 5 of the memory 120 does not operate properly.
- memory errors may occur at the system physical addresses Pc 1 , Pc 2 , Pc 3 , Pc 4 , and Pc 5 .
- the OS 180 may become aware of faulty pages of the system physical addresses Pc 1 , Pc 2 , Pc 3 , Pc 4 , and Pc 5 .
- the OS 180 may observe bits of the row address R[5:0] and bits of the column address C[5:0] identified at the system physical addresses Pc 1 , Pc 2 , Pc 3 , Pc 4 , and Pc 5 .
- the OS 180 may determine that the two uppermost bits of the row addresses R[5:0] of the system physical addresses Pc 1 , Pc 2 , Pc 3 , Pc 4 , and Pc 5 (i.e., bits R[5:4] 11) are the same and the two uppermost bits of the column addresses C[5:0] (i.e., bits C[5:4] 11) are the same.
- the memory 120 may be configured to sequentially decodes row address bits by using a row decoder, generate a decoded row address signal, and activate a word line corresponding to a decoded row address signal.
- decoded row address signal lines may be arranged in a row-wise direction from the bottom or the top of a memory region, wherein upper bits of a row address may serve as a signal for addressing a particular region on the upper side or the lower side based on the center of the memory region.
- the memory 120 is configured to sequentially decode column address bits by using a column decoder, generate a decoded column address signal, and activate bit lines corresponding to a decoded column address signal.
- decoded column address signal lines may be arranged in a column-wise direction from the left side or the right side of a memory region, wherein upper bits of a column address may serve as a signal for addressing a particular region on the left side or the right side based on the center of the memory region.
- the OS 180 may expect that there is a high possibility that the memory cells accessed with the same upper row address bits and the same upper column address bits are faulty. Therefore, the OS 180 may predict or consider a fault type of memory cells accessed with the upper row address bits R[5:4] 11 and the upper column address bits C[5:4] 11 in the memory region of the memory 120 as possible block-based fault.
- the upper row address bits R[5:4]11 and the upper column address bits C[5:4] 11 may be referred to as a faulty block address (FBA) of the fault type.
- the faulty row address, the faulty column address, the faulty block address may be referred to as a faulty address which may be stored in the BIOS and in the post package repair, may correspond to a source address to be replaced or repaired with a destination address of redundancy cells.
- the OS 180 may specify memory cells accessed with the upper row address bits R[5:4]11 and the upper column address bits C[5:4] 11 as block-based fault.
- the OS 180 translates a system physical address PA corresponding to a virtual address VA at which the application 170 is being executed on the processor 110 , the OS 180 does not provide a faulty block address FBA as the system physical address PA, such that block-based fault memory cells are not selected.
- the OS 180 may store the faulty block address FBA in the BIOS memory 130 ( FIG. 1 ).
- a privilege that the OS 180 handles a block-based fault of the OS 180 by referring to five faulty pages has been described above, but such a privilege may be given when the number of faulty pages exceeds a reference value.
- the reference value may be set to n (n is a natural number equal to or greater than 5). According to other embodiments, the reference value may be set differently and may also be changed.
- FIG. 8 is a flowchart of a method of handling a runtime OS of the system 100 according to an embodiment of the inventive concept.
- the OS 180 may be executed while providing machine virtualization to execute the application 170 in cooperation with the processor 110 (operation S 810 ).
- the OS 180 in a virtual machine VM may perform first address translation between virtual addresses VA to be processed by the application 170 and system physical addresses PA for the memory 120 .
- the processor 110 i.e., the OS operated in the virtual machine VM executed by the processor 110
- the OS 180 may execute the application 170 using the virtual addresses VA, and when accessing the memory 120 , may use the system physical addresses PA translated from the virtual addresses VA.
- the OS 180 may become aware of (i.e., identify) faulty pages among the system physical addresses PA (operation S 812 ).
- the OS 180 may count the faulty pages and determine whether the number of the faulty pages exceeds a reference value (operation S 813 ). When it is determined that the number of the faulty pages exceeds the reference value, the OS 180 may observe (i.e., analyze) bits of row addresses RA and bits of column addresses CA identified from the system physical addresses PA of the faulty pages (operation S 814 ). When it is determined that the number of the faulty pages does not exceed the reference value, the OS 180 may continue to operate the application 170 and proceed to operation S 812 .
- the OS 180 may predict a possible faulty address attribute appearing at the same bad address bits in the system physical addresses PA of the faulty pages (operation S 815 ).
- the OS 180 may specify the possible faulty address attribute (i.e., a fault type) of the system physical addresses PA as a row-based fault, a column-based fault, or a block-based fault.
- the OS 180 when the OS 180 performs second address translation between virtual addresses and system physical addresses for the memory 120 , the OS 180 does not provide faulty system physical addresses of a particular fault type (e.g., a row-based fault or a row failure, a column-based fault or a column failure, or a block-based fault or a block failure) (operation S 816 ) as the system physical addresses.
- a particular fault type e.g., a row-based fault or a row failure, a column-based fault or a column failure, or a block-based fault or a block failure
- an OS of another virtual machine may perform address translation based on the faulty system physical addresses to prevent the faulty system physical addresses from being used in the translation.
- the faulty system physical addresses may be stored as a faulty address in a local system memory of the processor 110 , and may be referenced by at least one virtual machine or if previously stored in the BIOS memory 130 , may be uploaded to the local system memory from the BIOS memory 130 .
- the OS 180 may store the faulty system physical addresses as a faulty address in the BIOS memory 130 .
- FIG. 9 is a conceptual diagram for describing a repair operation performed when the system 100 of FIG. 1 is booted.
- the BIOS memory 130 may store a BIOS code for booting the system 100 . Also, the BIOS memory 130 may store faulty addresses specified by the OS 180 as a faulty row address FRA, a faulty column address FCA, and/or a faulty block address FBA. The faulty row address FRA, the faulty column address FCA, and/or the faulty block address FBA may be stored in a non-volatile memory unit 930 .
- the non-volatile memory unit 930 is a part of a non-volatile memory device constituting the BIOS memory 130 .
- the system 100 may execute boot operations that execute a part of the BIOS code of the BIOS memory 130 by the processor 110 as the system 100 is powered on.
- Memory training for the memory 120 may be included in boot operations for executing the BIOS code by the processor 110 .
- the memory training may be performed for the memory controller 112 to determine the optimal parameters for core parameters and/or peripheral circuit parameters of the memory 120 .
- the memory 120 will be collectively referred to as a dynamic random access memory (DRAM) 120 .
- DRAM dynamic random access memory
- the DRAM 120 may be any one of a synchronous DRAM (SDRAM), a double data rate SDRAM (DDR SDRAM), a low power double data rate SDRAM (LPDDR SDRAM), a graphics double data rate SDRAM (GDDR SDRAM), a DDR2 SDRAM, a DDR3. SDRAM, a DDR4 SDRAM, DDR5 SDRAM, a wide I/O DRAM, a high bandwidth memory (HBM), and a hybrid memory cube (HMC).
- SDRAM synchronous DRAM
- DDR SDRAM double data rate SDRAM
- LPDDR SDRAM low power double data rate SDRAM
- GDDR SDRAM graphics double data rate SDRAM
- DDR2 SDRAM a DDR2 SDRAM
- DDR3 SDRAM a DDR4 SDRAM, DDR5 SDRAM
- HBM high bandwidth memory
- HMC hybrid memory cube
- the memory controller 112 may initialize the DRAM 120 according to an algorithm set in a register control word (RCW) and perform memory training for the DRAM 120 .
- the memory training may be performed by using a memory PHY provided for signals, frequencies, timings, drivings, detailed operation parameters, and functionality needed for an efficient communication between the memory controller 112 and the DRAM 120 .
- the memory controller 112 may provide the faulty row address FRA, the faulty column address FCA, and/or the faulty block address FRA stored in the non-volatile memory unit 930 of the BIOS memory 130 to the DRAM 120 after the memory training of the DRAM 120 .
- the DRAM 120 may repair faulty cells showing fault characteristics in a memory cell array.
- the memory cell array may include a plurality of word lines, a plurality of bit lines, and a plurality of memory cells formed at points where the word lines and the bit lines intersect each other.
- the DRAM 120 may include a repair control circuit 920 configured to repair faulty cells with redundancy cells.
- the repair control circuit 920 may repair faulty cells detected through a test after a semiconductor manufacturing process of the DRAM 120 .
- the repair control circuit 920 may perform a post package repair (PPR) that repairs faulty cells occurring during continuous use of the DRAM 120 with redundancy cells.
- PPR post package repair
- the repair control circuit 920 may perform PPR to replace the faulty row address FRA, the faulty column address FCA, and/or the faulty block address FBA with a redundancy row address RRA, a redundancy column address RCA, and/or a redundancy block address RBA, respectively.
- the repair control circuit 920 may store information about destination addresses D_ADDR (i.e., the redundancy row address RRA, the redundancy column address RCA, and/or the redundancy block address RBA) that replaced source addresses S_ADDR that needed to be repaired (i.e., the faulty row address FRA, the faulty column address FCA, and/or the faulty block address FBA) in an address storage table 921 (i.e., an address storage circuit).
- the address storage table 921 may be included in the repair control circuit 920 or in the memory 120 .
- the address storage table 921 may include, for example, an anti-fuse array or a content addressable memory (CAM).
- the anti-fuse is a resistive fuse element having electrical characteristics opposite to those of a fuse element, having a high resistance value in a non-programmed state, and having a low resistance value in a programmed state.
- the CAM is a special memory structure that simultaneously compares an input address with source addresses S_ADDR stored in respective CAM entries, and an output of the CAM indicates that a source address S_ADDR, if any, is matching a destination address D_ADDR.
- the repair control circuit 920 may provide the address storage table 921 to the memory controller 112 .
- the memory controller 112 may access the address storage table 921 to update the information stored in the address storage table 921 or read the information from the address storage table 921 .
- the memory controller 112 may store the address storage table 921 information as memory management information 910 for consistent access of the DRAM 120 by at least one processor 110 .
- the address storage table 921 information may be shared by at least one processor 110 . When at least one processor 110 performs an operation for memory allocation during execution of the applications 170 , the memory allocation operation is performed based on information in the address storage table 921 .
- At least one processor 110 may perform functions commonly known as those of a memory manager, that is, managing an address space of the OS 180 in the DRAM 120 and evenly distributing memory regions to other virtual machines VM using the DRAM 120 , by using the memory management information 910 .
- FIGS. 10 to 12 are diagrams for describing a repair operation performed in the memory 120 of FIG. 1 .
- a memory cell array 1000 a may include a normal cell array NMCA and a redundancy cell array RMCA.
- the normal cell array NMCA may include memory cells connected to word lines and bit lines
- the redundancy cell array RMCA may include memory cells connected to redundancy word lines and redundancy bit lines.
- the repair control circuit 920 may include a row repairer 922 (i.e., a row repair circuit) that determines a redundancy row address RRA, such that redundancy resources for repairing a faulty row address FRA do not overlap with one another.
- the row repairer 922 may perform a repair operation, such that the redundancy row address RRA is selected instead of the faulty row address FRA.
- redundancy cells corresponding to the redundancy row address RRA of the redundancy cell array RMCA are selected.
- the row repairer 922 deactivates a word line corresponding to the faulty row address FRA and activates a redundancy word line corresponding to the redundancy row address RRA instead. Therefore, redundancy cells corresponding to the redundancy row address RRA are selected instead of memory cells corresponding to the faulty row address FRA.
- a memory cell array 1000 b may include the normal cell array NMCA and the redundancy cell array RMCA.
- the normal cell array NMCA may include memory cells connected to word lines and bit lines
- the redundancy cell array RMCA may include memory cells connected to the word lines and redundancy bit lines.
- the repair control circuit 920 may include a column repairer 924 that determines a redundancy column address RCA, such that redundancy resources for repairing a faulty column address FRA do not overlap with one another.
- the column repairer 924 may perform a repair operation, such that the redundancy column address RCA is selected instead of the faulty column address FCA.
- redundancy cells corresponding to the redundancy column address RCA of the redundancy cell array RMCA are selected.
- the column repairer 924 prevents a bit line corresponding to the faulty column address FCA from being selected and selects a redundancy bit line corresponding to the redundancy column address RCA instead. Therefore, redundancy cells corresponding to the redundancy column address RCA are selected instead of memory cells corresponding to the faulty column address FCA.
- a memory cell array 1000 c may include the normal cell array NMCA and the redundancy cell array RMCA.
- the normal cell array NMCA may include memory cells connected to word lines and bit lines
- the redundancy cell array RMCA may include memory cells connected to redundancy word lines and redundancy bit lines.
- the repair control circuit 920 may include a block repairer 926 that determines a redundancy block address RBA, such that redundancy resources for repairing a faulty block address FBA do not overlap with one another.
- the block repairer 926 may perform a repair operation, such that the redundancy block address RBA is selected instead of the faulty block address FBA.
- a repair operation such that the redundancy block address RBA is selected instead of the faulty block address FBA.
- an access row address and an access column address applied to a memory designate a faulty block address FBA indicating a certain region of the normal cell array NMCA
- redundancy cell regions corresponding to the redundancy block address RBA of the redundancy cell array RMCA are selected.
- the block repairer 926 prevents memory cells in a memory region corresponding to the faulty block address FBA from being selected and selects redundancy cells in a memory region corresponding to the redundancy block address RBA instead.
- FIG. 13 is a flowchart of a method of booting the system 100 according to an embodiment of the inventive concept.
- the memory 120 may execute booting operations for executing a part of the BIOS code of the BIOS memory 130 through the processor 110 .
- Memory training for the memory 120 may be performed from among the boot operations for executing the BIOS code through the processor 110 (operation S 1313 ).
- the memory training may be performed for the memory controller 112 to determine the optimal parameters for core parameters and/or peripheral circuit parameters of the memory 120 .
- faulty system physical addresses e.g., a faulty row address FRA, a faulty column address FCA, and/or a faulty block address FBA
- BIOS memory 130 may be transmitted to the memory 120 (operation S 1320 ).
- the memory 120 may perform an operation for repairing faulty system physical addresses (operation S 1314 ). As described above, the memory 120 may repair the faulty row address FRA with a redundancy row address RRA. The memory may repair the faulty column address FCA with a redundancy column address RCA. The memory may repair the faulty block address FBA with a redundancy block address RBA. The memory 120 may repair faulty system physical addresses, thereby using the resources of the memory 120 with the maximum efficiency.
- Embodiments of the inventive concept may be implemented in many different types of system. Furthermore, embodiments of the inventive concept may be implemented in code and stored in an article comprising a non-transitory machine-readable storage medium storing instructions that may be used to program system to execute instructions.
- the non-transitory storage medium may include, but is not limited to, any type of disc including a floppy disc, an optical disc, a solid state drive (SSD), a compact disc read only memory (CD-ROM), a compact disc rewritable (CD-RW), and a magneto-optical disc, a ROM, a random access memory (RAM) like a dynamic random access memory (DRAM) and a static random access memory (SRAM), a semiconductor type device like an erasable and programmable read-only memory (EPROM), a flash memory, and an electrically erasable and programmable read-only memory (EEPROM), a magnetic or optical card, or any other types of media suitable for storing electronic instructions.
- any type of disc including a floppy disc, an optical disc, a solid state drive (SSD), a compact disc read only memory (CD-ROM), a compact disc rewritable (CD-RW), and a magneto-optical disc, a ROM,
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Quality & Reliability (AREA)
- Human Computer Interaction (AREA)
- Techniques For Improving Reliability Of Storages (AREA)
Abstract
Description
- This application claims the benefit of Korean Patent Application No. 10-2020-0058448, filed on May 15, 2020, in the Korean Intellectual Property Office, the disclosure of which is incorporated by reference herein in its entirety.
- The inventive concept relates to a data processing system, and more particularly, to a system for predicting a faulty memory based on a page fault and managing the predicted faulty memory and a method of handling an operating system (OS) of the system.
- A data processing system like a data center is hosted by many companies and their computer systems. The data center is used to distribute hosted applications and/or transactions and includes networked computer resources that are often referred to as clouds, e.g., servers, disks, virtual machines, etc. In this configuration, companies are clients of the data center. Data centers offer clients a number of advantages, including reduced cost, easy expansion, and reduced management load.
- Demands for high-capacity memories for data centers for stable and fast real-time processing of large amounts of data have increased. However, the performance quality of a memory may change over time. For example, when an application and/or a transaction is allocated and uses a memory, a page fault may occur due to a fault in the memory. When such a page fault occurs frequently, the normal flow execution of instructions may be disrupted, and thus availability constraints of a data center, e.g., aborting and restarting a started operation, may occur. The page fault may arise from a single bit failure or two or more bits failures. Therefore, when a faulty memory may be identified based on a page fault, it will be possible to predict a memory fault due to the faulty memories. Also, managing a predicted memory fault in advance will be desirable for maintaining the availability of a data center.
- The inventive concept provides a system for predicting a memory fault based on a page fault and managing the predicted memory fault and a method of handling an operating system of the system.
- According to an exemplary embodiment of the present invention, a method of operating a system running a virtual machine that executes an application and an operating system (OS) includes performing, by the OS, first address translation from a plurality of first virtual addresses processed by the application to a plurality of first physical addresses for accessing a memory, identifying, by the OS, a plurality of faulty physical addresses among the plurality of first physical addresses, wherein each of the plurality of faulty physical addresses corresponds to a corresponding first physical address, among the plurality of first physical addresses, associated with a faulty memory cell of the memory, analyzing, by the OS, a row address and a column address of each of the plurality of faulty physical addresses and specifying, by the OS, a fault type of the plurality of faulty physical addresses based on the analyzing of the row address and the column address of each of the plurality of faulty physical addresses, wherein the fault type includes a row failure, a column failure or a block failure, and performing, by the OS, second address translation from a plurality of second virtual addresses to a plurality of second physical addresses based on a faulty address, thereby excluding the faulty address from the plurality of second physical addresses. The faulty address corresponds to the fault type of the plurality of faulty physical addresses, and includes a faulty row address of the row failure, a faulty column address of the column failure, or a faulty block address of the block failure.
- According to an exemplary embodiment of the present invention, a non-transitory computer-readable recording medium storing computer-executable instructions for performing a method of operating a system running a virtual machine that executes an application and an operating system (OS) includes executing the application using a processor and a memory of the system, performing first address translation from a plurality of first virtual addresses allocated to the application to a plurality of first physical addresses for accessing the memory, identifying, during a time when the application is executed, a plurality of faulty physical addresses among the plurality of first physical addresses translated from the plurality of first virtual addresses, specifying a fault type of the plurality of faulty physical addresses, wherein the fault type includes a row failure, a column failure, or a block failure, and performing second address translation from a plurality of second virtual addresses to a plurality of second physical addresses based on a faulty address, thereby excluding the faulty address from the plurality of second physical addresses. The faulty address corresponds to the fault type of the plurality of faulty physical addresses, and includes a faulty row address of the row failure, a faulty column address of the column failure or a faulty block address of the block failure.
- According to an exemplary embodiment of the present invention, a system operating in a virtual machine environment includes a memory, and a processor operatively coupled to the memory. The processor executes an application in cooperation with the memory, performs first address translation from a plurality of first virtual addresses processed by the application to a plurality of first physical addresses for accessing the memory, identifies a plurality of faulty physical addresses among the plurality of first physical addresses, wherein each of the plurality of faulty physical addresses corresponds to a corresponding first physical address, among the plurality of first physical addresses, associated with a faulty memory cell of the memory, specifies a fault type of the plurality of faulty physical addresses of the memory, wherein the fault type includes a row failure, a column failure, or a block failure, and performs second address translation from a plurality of second virtual addresses to a plurality of second physical addresses based on a faulty address to prevent the faulty address from being used for the second address translation. The faulty address corresponds to the fault type of the plurality of faulty physical addresses, and includes a faulty row address of the row failure, a faulty column address of the column failure or a faulty block address of the block failure. The processor is further configured to, without causing the system to be rebooted, specify the fault type, store the faulty address, and perform the second address translation.
- According to an exemplary embodiment of the present invention, a memory device includes a memory cell array comprising a plurality of memory cells, and a repair control circuit configured to repair a plurality of faulty memory cells from among the plurality of memory cells by using a plurality of redundancy memory cells in the memory cell array. The repair control circuit is configured to receive, during rebooting of the memory device, a source address of the plurality of faulty memory cells from a processor to which the memory device is operatively coupled and repair the source address with a destination address of the plurality of redundancy memory cells. The source address of the plurality of faulty memory cells corresponds to a faulty address including a common row address of the plurality of faulty memory cells, a common column address of the plurality of faulty memory cells, or a block address of the plurality of faulty memory cells. The plurality of faulty memory cells are identified during execution of a virtual machine running on the processor. The faulty address of the plurality of faulty memory cells is included in a plurality of physical addresses for accessing the memory device by a system. The plurality of physical addresses are translated from a plurality of virtual addresses used by the processor.
- Embodiments of the inventive concept will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:
-
FIG. 1 is a block diagram conceptually showing a system according to embodiments of the inventive concept; -
FIG. 2 is a diagram for describing an example of address mapping for address translation performed by a processor ofFIG. 1 ; -
FIG. 3 is a diagram showing an example of an address mapping table ofFIG. 2 ; -
FIG. 4 is a diagram for describing an example of page table entries of the address mapping table ofFIG. 3 ; -
FIG. 5 is a diagram for describing the row-based fault attribute shown in the address mapping table ofFIG. 3 ; -
FIG. 6 is a diagram for describing the column-based fault attribute shown in the address mapping table ofFIG. 3 ; -
FIG. 7 is a diagram for describing the block-based fault attribute shown in the address mapping table ofFIG. 3 ; -
FIG. 8 is a flowchart of a method of handling a runtime OS of a system according to an embodiment of the inventive concept; -
FIG. 9 is a conceptual diagram for describing a repair operation performed when a system ofFIG. 1 is booted; -
FIGS. 10 to 12 are diagrams for describing a repair operation performed in a memory ofFIG. 1 ; and -
FIG. 13 is a flowchart of a method of booting a system according to an embodiment of the inventive concept. -
FIG. 1 is a block diagram conceptually illustrating an example system that may be used to predict failed memories based on page faults and manage predicted failed memories according to embodiments of the inventive concept. - Referring to
FIG. 1 , asystem 100 may be a data center including dozens of host machines or servers for performing hundreds of virtual machines VM. Although various hardware components of thesystem 100 to be described below are shown inFIG. 1 , the inventive concept is not limited thereto, and other components may be employed. Thesystem 100 may include aprocessor 110, amemory 120, and a basic input/output system (BIOS)memory 130. - The
processor 110 may be communicatively connected to thememory 120 through amemory interface 140. Theprocessor 110 may be connected to theBIOS memory 130 through aninterface 150 of various types like a serial peripheral interface (SPI) or a low pin count (LPC) bus. Thememory 120 and theBIOS memory 130 connected to theprocessor 110 may be referred to as system memories. - Some examples may be described by using the expressions “connected” and/or “coupled” together with their derivatives. These terms are not necessarily intended as synonyms for each other. For example, descriptions using the terms “connected” and/or “coupled” may indicate that two or more elements are in direct physical or electrical contact with each other. In addition, the terms “connected” and/or “coupled” may also mean that two or more elements are not in direct contact with each other but still cooperate or interact with each other.
- According to some embodiments, the
system 100 may be, for example, a computing device like a laptop computer, a desktop computer, a server computer, a workstation, a portable communication terminal, a personal digital assistant (PDA), a portable multimedia player (PMP), a smart phone, or any other suitable computers, a VM, or a virtual computing device thereof. Alternatively, thesystem 100 may be one of components included in a computing system, e.g., a graphics card. - The
processor 110 is a functional block for performing general computer operations in thesystem 100 and may be a processor like a central processing unit (CPU), a digital signal processor (DSP), a network processor, an application processor (AP), or any other device for executing codes. - The
processor 110 may be configured to execute instructions, software, firmware, or pieces of combinations thereof that may be executed by one or more machines. Theprocessor 110 may include any number of processor cores. For example, theprocessor 110 may include a single-core or a multi-core like dual-core, quad-core, and hexa-core. AlthoughFIG. 1 shows thesystem 100 including oneprocessor 110, according to an embodiment, thesystem 100 may include two or more processors. - The
processor 110 may execute software in a virtualized environment. Accordingly, a virtual machine VM in thesystem 100 may include an application APP and an operating system (OS). Since virtual machines VM may be dynamically changed during their use initiated and stopped by a user, the number of the virtual machine VM that may be executed onprocessor 110 may vary. Software entities such as an OS consider theprocessor 110 as a logic processor or a processing element capable of executing virtual machines VM simultaneously. For example, n (n is a natural number) OS may consider theprocessor 110 as n logic processors or n processing elements. - In the virtual machines VM, each application APP uses a virtual address space, and thus a virtual address VA (i.e., an address that may be used by software) is used. The OS in each virtual machine VM may control a time point at which a particular application APP accesses a given
memory 120 and may control addresses that are accessed by the application APP to at least a certain degree. The OS in the virtual machine VM may perform and manage mapping between virtual addresses VA and physical addresses PA in the virtual machine VM. Physical addresses PA generated by the OS are system physical addresses PA (i.e., physical addresses that may be used by thememory controller 112 to access the memory 120) of a physical address space throughout thememory 120 of thesystem 100. The OS may perform address translation (e.g., address mapping) between virtual addresses VA and system physical addresses PA. - To briefly illustrate machine virtualization in the
system 100,FIG. 1 shows twovirtual machines virtual machines processor 110 may execute and implement a large number of applications APP and/or transactions in terms of time and memory footprints through combination of hardware acceleration and software. Hereinafter, for convenience of explanation, anapplication 170 and anOS 180 accessing thememory 120 will be described under an assumption that thesystem 100 runs a firstvirtual machine 160. - The
BIOS memory 130 may store a BIOS code for booting thesystem 100. TheBIOS memory 130 may be implemented by a non-volatile memory device. The non-volatile memory devices may be implemented by an electrically erasable programmable read-only memory (EEPROM), a flash memory, a resistive RAM (RRAM), a magnetic RAM (MRAM), a phase change RAM (PRAM), a ferroelectric RAM (FRAM), a nano floating gate memory (NFGM), a polymer RAM (PoRAM), or a similar memory. - The BIOS code may include a power-on-self-test (POST) code for detecting hardware components of the
system 100 like a system board, thememory 120, a disc drive, and input/output (I/O) devices and checking whether the hardware components are normally working and/or a part of the POST code. The BIOS code may include various algorithms that are configured to allow theprocessor 110 to normally interoperate with thememory 120. - The
memory interface 140 is shown as being connected through a single signal line between theprocessor 110 and thememory 120 for simplicity of illustration, but may be actually connected through a plurality of signal lines. Thememory interface 140 includes connectors for connecting thememory controller 112 and thememory 120 to each other. In an example embodiment, the connectors may be implemented as pins, balls, signal lines, or other hardware components. For example, clocks, commands, addresses, data, etc. may be exchanged between thememory controller 112 and thememory 120 through thememory interface 140. Thememory interface 140 may be implemented as one channel including a plurality of signal lines or may be implemented as a plurality of channels. Also, one ormore memories 120 may be connected to a corresponding channel of the plurality of channels. - The
processor 110 may include amemory controller 112 that controls data transmission/reception to/from thememory 120. Thememory controller 112 may access thememory 120 according to a memory request of theprocessor 110, and a system physical address may be provided to access thememory 120. Thememory controller 112 may include a memory physical layer interface, that is, a memory PHY for memory interfacing like selecting a row and a column corresponding to a memory cell, writing data to a memory cell, or reading written data. Thememory controller 112 performing the functions stated above may be implemented in various forms. For example,memory controller 112 may be implemented by one or more hardware components (e.g., analog circuits, logic circuits) and program codes of software and/or firmware. Thememory controller 112 may be integrated into theprocessor 110, such that thememory 120 may be accessed by theprocessor 110 at high speed and/or low power consumption. - Data used for the operation of the
system 100 may be stored in or loaded from thememory 120. Data processed or to be processed by theprocessor 110 may be stored in or read from thememory 120. Thememory 120 may include a volatile memory like a static random access memory (SRAM) and a dynamic random access memory (DRAM) and/or a non-volatile memory like a flash memory, an RRAM, an MRAM, a PRAM, and a FRAM. - The
memory 120 may include memory cells for storing data. A memory cell may be accessed using an address. Write data may be written to a memory cell indicated by an address, and read data may be loaded from a memory cell indicated by an address. In the present disclosure, one memory region in thememory 120 may include a memory cell array with a plurality of memory cells which are accessed using a plurality of addresses. - When a memory cell in the memory region fails, the
memory 120 may be configured to repair a faulty cell with a redundancy cell. Thememory 120 is capable of performing post package repair (PPR) that repairs faulty cells additionally occurring due to continuous use with redundancy cells. - The
processor 110 may provide addresses to thememory 120 to exchange data which are read from thememory 120 and stored in thememory 120 during execution of theapplication 170. Thememory 120 may store or read data based on requests (e.g., commands and addresses) received from theprocessor 110. - Meanwhile, an address processed by the
application 170 may be referred to as a virtual address VA, and an address for accessing thememory 120 may be referred to as a system physical address PA (i.e., a physical address). TheOS 180 may perform address translation between a virtual address VA processed by an application APP and a system physical address PA for thememory 120. In an example embodiment, theapplication 170 processed by theprocessor 110 may operate with reference to the virtual addresses VA, and when accessing the memory, may use the system physical addresses PA translated from the virtual addresses VA. -
FIG. 2 is a diagram for describing an example of address mapping for address translation performed by theprocessor 110 ofFIG. 1 . - Referring to
FIGS. 1 and 2 , thememory 120 has a system physical address PA range from an address zero (0) to an upper level. Theapplication 170 may have a virtual address VA range starting from the upper level of the system physical address PA range of thememory 120. Each address Pa of virtual addresses VA may be mapped to an address Pg (or an address space) of system physical addresses PA of thememory 120. TheOS 180 may allocate a page requested for memory access by theapplication 170 to a page of thememory 120. At this time, a reference designated from a virtual address Pa to a corresponding system physical address Pg may be stored in an address mapping table 200 as a page table entry PTE. A page may be a unit of address translation. In other words, addresses in a virtual page may be translated into addresses in a corresponding physical page. Pages may have various sizes ranging from 4 kilobytes up to Megabytes or even Gigabytes. - It is noted that locations and sizes of addresses shown in
FIG. 2 are merely examples, and are not necessarily those of actual memory addressed. Also, the example memory shown inFIG. 2 does not represent or imply limitations on the inventive concept. -
FIG. 3 is a diagram showing an example of the address mapping table 200 ofFIG. 2 . - Referring to
FIGS. 2 and 3 , theOS 180 may manage the address mapping table 200. Page table entries PTE of the address mapping table 200 may include information about a mapping relationship between virtual addresses VA and system physical address PA. For example, the address mapping table 200 may be implemented in the form of a look-up table. TheOS 180 may translate the virtual addresses VA into the system physical addresses PA by referring to the page table entries PTE of the address mapping table 200. - For example, a virtual address Va1 may correspond to a system physical address Pa1. When the
OS 180 receives the virtual address Va1 from theapplication 170, theOS 180 may map the virtual address Va1 to the system physical address Pa1. TheOS 180 may process a request received from theapplication 170 together with the virtual address Va1 in association with a memory cell indicated by the system physical address Pa1. - Similarly, according to corresponding information in the address mapping table 200, the
OS 180 may map virtual addresses Va2 and Va3 to system physical addresses Pa2 and Pa3, map virtual addresses Vb1, Vb2, and Vb3 to system physical addresses Pb1, Pb2, Pb3, and map virtual addresses Vc1, Vc2, Vc3, Vc4, and Vc5 to system physical addresses Pc1, Pc2, Pc3, Pc4, and Pc5. TheOS 180 may process a request from theapplication 170 in association with memory cells indicated by system physical addresses Pa2, Pa3, Pb1, Pb2, Pb3, Pc1, Pc2, Pc3, Pc4, and Pc5 corresponding to virtual addresses Va2, Va3, Vb1, Vb2, Vb3, Vc1, Vc2, and Vc3. -
FIG. 4 is a diagram for describing an example of the page table entries PTE of the address mapping table 200 ofFIG. 3 . - Referring to
FIG. 4 , the page table entries PTE of the address mapping table 200 may be a table of translation data that may be used to translate a virtual address VA to a system physical address PA. The translation table may store translation data in any way. For example, depending on a translation level, various subsets of virtual address VA bits or system physical addresses PA may be used to index the levels of the translation table. Also, each level may be at the end of a translation (i.e., storing an actual page number for translation) or may point another table (indexed by another set of address bits) in a hierarchical manner. - The page table entries PTE may include pointers to other page tables in a hierarchical manner. The page table entry PTE may indicate a level in a page table layer structure, e.g., page map levels 2, 3, or 4, at which translation needs to be started for requests mapped to the corresponding page table entry PTE. Therefore, the page table entries PTE of a table of a page map level 2, 3, or 4 may include any number of bit entries.
- The page table entries PTE shown in
FIG. 4 are a first level translation table. In the first level translation table, fields related to address translation are provided to map a virtual address to a system physical address. However, the present invention is not limited thereto. In an example embodiment, the present invention may apply to a second level address translation table where a single bit field and/or a plurality of bit fields may be provided for translation level identification, depths of tables, indication of translation invalid/valid, etc. Further action may occur with reference to the bit field or bit fields to complete address translation. In the present embodiment, the page table entry PTE relates to a translation of virtual page address bits into actual system physical page address bits and is a 64-bit entry, for example. - The page table entry PTE may include a virtual address VA field and a system physical address PA field. The virtual address VA field is configured to increase a virtual address space to be used by the
application 170, and the system physical address PA field indicates an address of thememory 120 corresponding to the virtual address VA. For example, the system physical address PA field may include PTE[11:0] bits, and the virtual address VA field may include PTE[63:12] bits. The system physical address PA field may include row address R[5:0] bits and column address C[5:0] bits. -
FIG. 5 is a diagram for describing a row-based fault attribute shown in the address mapping table ofFIG. 3 . - Referring to
FIG. 5 , according to virtual addresses Va1, Va2, and Va3 at which theapplication 170 is being executed on theprocessor 110, theOS 180 may provide system physical addresses Pa1, Pa2, and Pa3 corresponding to the virtual addresses Va1, Va2, and Va3 for accessing thememory 120. For example, a system physical address Pa1 corresponding to a virtual address Va1 may be provided as a row address R[5:0] bits of 011000 and a column address C[5:0] bits of 000001. Theprocessor 110 may execute theapplication 170 by accessing a memory cell indicated by the row address R[5:0] bits of 011000 and the column address C[5:0] bits of 000001 of the system physical address Pa1 through thememory controller 112. - Similarly, a system physical address Pa2 corresponding to a virtual address Va2 may be provided as a row address R[5:0] bits of 011000 and a column address C[5:0] bits of 000010, and a system physical address Pa3 corresponding to a virtual address Va3 may be provided as a row address R[5:0] bits of 011000 and a column address C[5:0] bits of 000100. The
processor 110 may execute theapplication 170 by accessing a memory cell indicated by the row address R[5:0] bits of 011000 and the column address C[5:0] bits of 000010 of the system physical address Pa2 and a memory cell indicated by the row address R[5:0] bits of 011000 and the column address C[5:0] bits of 000100 of the system physical address Pa3 through thememory controller 112. - However, execution associated with the system physical address Pa2 of the
memory 120 does not operate properly. Also, execution associated with the system physical address Pa3 of thememory 120 does not operate properly. In other words, a page fault may occur. From among a plurality of execution paths for theapplications 170, an error may occur in an execution path regarding thememory 120, for example. In detail, memory errors may occur at system physical addresses Pa2 and Pa3. One of the major causes of such memory errors is when memory cells addressed by the system physical addresses Pa2 and Pa3 fail, that is, when a hardware exception event is detected. - Generally, when such exception events frequently occur, the
system 100, which is pending, may be stopped and attempts for resuming the system 100 (i.e., rebooting of the system 100) may be made. Such a solution is unable to achieve acceleration of thesystem 100. TheOS 180 may perform controls to process exception events without stopping thesystem 100. TheOS 180 may continue operating thesystem 100 by combining hardware support from theprocessor 110 with OS codes. As described in more detail below, the mechanism of theOS 180 for this function may be provided. - The
OS 180 may become aware of (i.e., identify) faulty pages (i.e., faulty physical addresses) of the system physical addresses Pa2 and Pa3. In an example embodiment, the faulty addresses may refer to physical addresses associated with faulty memory cells. TheOS 180 may observe (i.e., analyze) bits of the row address R[5:0] and bits of the column address C[5:0] identified at the system physical addresses Pa2 and Pa3, thereby determining that the system physical addresses Pa2 and Pa3 have the same row address R[5:0] bits of 011000. Since memory cells addressed by the system physical addresses Pa2 and Pa3 have the same row address (i.e., the common row address), theOS 180 may expect that there is a high possibility that the memory cells accessed with the row address are faulty. Therefore, theOS 180 may predict or consider a fault type of memory cells accessed with the row address R[5:0] bits of 011000 in the memory region of thememory 120 as possible row-based fault. Hereinafter, the row address R[5:0] bits of 011000 may be referred to as a faulty row address FRA of the fault type. - Although memory cells accessed with the row address R[5:0] bits of 011000 of the system physical address Pa1 of the
memory 120 are not fail, theOS 180 may be given a privilege to specify memory cells accessed with the row address R[5:0] bits of 011000 as row-based fault. When theOS 180 translates a system physical address PA corresponding to a virtual address VA at which theapplication 170 is being executed on theprocessor 110, theOS 180 does not provide a faulty row address FRA as the system physical address PA, such that row-based faulty memory cells are not selected. Also, theOS 180 may store the faulty row address FRA in the BIOS memory 130 (FIG. 1 ). -
FIG. 6 is a diagram for describing a column-based fault attribute shown in the address mapping table ofFIG. 3 . - Referring to
FIG. 6 , according to virtual addresses Vb1, Vb2, and Vb3 at which theapplication 170 is being executed on theprocessor 110, theOS 180 may provide system physical addresses Pb1, Pb2, and Pb3 corresponding to the virtual addresses Vb1, Vb2, and Vb3 for accessing thememory 120. For example, a system physical address Pb1 corresponding to a virtual address Vb1 may be provided as a row address R[5:0]bits 100000 and a column address C[5:0] bits of 000011. Theprocessor 110 may execute theapplication 170 by accessing a memory cell indicated by the row address R[5:0]bits 100000 and the column address C[5:0] bits of 000011 of the system physical address Pb1 through thememory controller 112. - Similarly, a system physical address Pb2 corresponding to a virtual address Vb2 may be provided as a row address R[5:0] bits of 010000 and a column address C[5:0] bits of 000011, and a system physical address Pb3 corresponding to a virtual address Vb3 may be provided as a row address R[5:0] bits of 001000 and a column address C[5:0] bits of 000011. The
processor 110 may execute theapplication 170 by accessing a memory cell indicated by the row address R[5:0] bits of 010000 and the column address C[5:0] bits of 000011 of the system physical address Pb2 and a memory cell indicated by the row address R[5:0] bits of 001000 and the column address C[5:0] bits of 000011 of the system physical address Pb3 through thememory controller 112. - However, execution associated with the system physical address Pb2 and execution associated with the system physical address Pb3 of the
memory 120 does not operate properly. During the execution of theapplication 170, memory errors may occur at system physical addresses Pb2 and Pb3. TheOS 180 may become aware of faulty pages of the system physical addresses Pb2 and Pb3. TheOS 180 may observe bits of the row address R[5:0] and bits of the column address C[5:0] identified at the system physical addresses Pb2 and Pb3. TheOS 180 may determine that the system physical addresses Pb2 and Pb3 have the same column address C[5:0] bits of 000011. Since memory cells addressed by the system physical addresses Pb2 and Pb3 have the same column address, theOS 180 may expect that there is a high possibility that the memory cells accessed with the column address are faulty. Therefore, theOS 180 may predict or consider a fault type of memory cells accessed with the column address C[5:0] bits of 000011 in the memory region of thememory 120 as possible column-based fault. Hereinafter, the column address C[5:0] bits of 000011 may be referred to as a faulty column address FRA of the fault type. - Although memory cells accessed with the column address C[5:0] bits of 000011 of the system physical address Pb1 of the
memory 120 are not fail, theOS 180 may specify memory cells accessed with the column address C[5:0] bits of 000011 as column-based fault. When theOS 180 translates a system physical address PA corresponding to a virtual address VA at which theapplication 170 is being executed on theprocessor 110, theOS 180 does not provide a faulty column address FCA as the system physical address PA, such that column-based faulty memory cells are not selected. Also, theOS 180 may store the faulty column address FCA in theBIOS memory 130 ofFIG. 1 . - As shown in
FIGS. 5 and 6 , theOS 180 may specify the two faulty pages as have been described above as a row-based fault or a column-based fault. The present invention is not limited thereto. In an example embodiment, theOS 180 may specify three or more faulty pages as a row-based fault or a column-based fault when the number of the faulty pages exceeds a reference value. In this embodiment, the reference value may be set to n (n is a natural number equal to or greater than 2). According to other embodiments, the reference value may be set differently and may also be changed. -
FIG. 7 is a diagram for describing a block-based fault attribute shown in the address mapping table ofFIG. 3 . - Referring to
FIG. 7 , according to virtual addresses Vc1, Vc2, Vc3, Vc4, and Vc5 at which theapplication 170 is being executed on theprocessor 110, theOS 180 may provide system physical addresses Pc1, Pc2, Pc3, Pc4, and Pc5 corresponding to the virtual addresses Vc1, Vc2, Vc3, Vc4, and Vc5 for accessing thememory 120. For example, a system physical address Pc1 corresponding to a virtual address Vc1 may be provided as a row address R[5:0] bits of 110001 and a column address C[5:0] bits of 111000. A system physical address Pc2 corresponding to a virtual address Vc2 may be provided as a row address R[5:0] bits of 110010 and a column address C[5:0] bits of 111010, a system physical address Pc3 corresponding to a virtual address Vc3 may be provided as a row address R[5:0] bits of 110100 and a column address C[5:0] bits of 110000, a system physical address Pc4 corresponding to a virtual address Vc4 may be provided as a row address R[5:0] bits of 111000 and a column address C[5:0] bits of 110010, and a system physical address Pc5 corresponding to a virtual address Vc5 may be provided as a row address R[5:0] bits of 111111 and a column address C[5:0] bits of 110100. Theprocessor 110 may access the memory cells indicated by the system physical addresses Pc1, Pc2, Pc3, Pc4, and Pc5 through thememory controller 112, thereby executing theapplication 170. - However, execution associated with the system physical addresses Pc1, Pc2, Pc3, Pc4, and Pc5 of the
memory 120 does not operate properly. During the execution of theapplication 170, memory errors may occur at the system physical addresses Pc1, Pc2, Pc3, Pc4, and Pc5. TheOS 180 may become aware of faulty pages of the system physical addresses Pc1, Pc2, Pc3, Pc4, and Pc5. TheOS 180 may observe bits of the row address R[5:0] and bits of the column address C[5:0] identified at the system physical addresses Pc1, Pc2, Pc3, Pc4, and Pc5. TheOS 180 may determine that the two uppermost bits of the row addresses R[5:0] of the system physical addresses Pc1, Pc2, Pc3, Pc4, and Pc5 (i.e., bits R[5:4] 11) are the same and the two uppermost bits of the column addresses C[5:0] (i.e., bits C[5:4] 11) are the same. - Generally, to access a memory cell, the
memory 120 may be configured to sequentially decodes row address bits by using a row decoder, generate a decoded row address signal, and activate a word line corresponding to a decoded row address signal. Here, decoded row address signal lines may be arranged in a row-wise direction from the bottom or the top of a memory region, wherein upper bits of a row address may serve as a signal for addressing a particular region on the upper side or the lower side based on the center of the memory region. Similarly, thememory 120 is configured to sequentially decode column address bits by using a column decoder, generate a decoded column address signal, and activate bit lines corresponding to a decoded column address signal. Here, decoded column address signal lines may be arranged in a column-wise direction from the left side or the right side of a memory region, wherein upper bits of a column address may serve as a signal for addressing a particular region on the left side or the right side based on the center of the memory region. - Since memory cells addressed by the system physical addresses Pc1, Pc2, Pc3, Pc4, and Pc5 have the same upper row address bits and the same upper column address bits, the
OS 180 may expect that there is a high possibility that the memory cells accessed with the same upper row address bits and the same upper column address bits are faulty. Therefore, theOS 180 may predict or consider a fault type of memory cells accessed with the upper row address bits R[5:4] 11 and the upper column address bits C[5:4] 11 in the memory region of thememory 120 as possible block-based fault. Hereinafter, the upper row address bits R[5:4]11 and the upper column address bits C[5:4] 11 may be referred to as a faulty block address (FBA) of the fault type. The faulty row address, the faulty column address, the faulty block address may be referred to as a faulty address which may be stored in the BIOS and in the post package repair, may correspond to a source address to be replaced or repaired with a destination address of redundancy cells. - The
OS 180 may specify memory cells accessed with the upper row address bits R[5:4]11 and the upper column address bits C[5:4] 11 as block-based fault. When theOS 180 translates a system physical address PA corresponding to a virtual address VA at which theapplication 170 is being executed on theprocessor 110, theOS 180 does not provide a faulty block address FBA as the system physical address PA, such that block-based fault memory cells are not selected. Also, theOS 180 may store the faulty block address FBA in the BIOS memory 130 (FIG. 1 ). - As shown in
FIG. 7 , a privilege that theOS 180 handles a block-based fault of theOS 180 by referring to five faulty pages has been described above, but such a privilege may be given when the number of faulty pages exceeds a reference value. In this embodiment, the reference value may be set to n (n is a natural number equal to or greater than 5). According to other embodiments, the reference value may be set differently and may also be changed. -
FIG. 8 is a flowchart of a method of handling a runtime OS of thesystem 100 according to an embodiment of the inventive concept. - Referring to
FIGS. 1, 2, and 8 , when thesystem 100 is operating, theOS 180 may be executed while providing machine virtualization to execute theapplication 170 in cooperation with the processor 110 (operation S810). TheOS 180 in a virtual machine VM may perform first address translation between virtual addresses VA to be processed by theapplication 170 and system physical addresses PA for thememory 120. In an example embodiment, the processor 110 (i.e., the OS operated in the virtual machine VM executed by the processor 110) may execute theapplication 170 using the virtual addresses VA, and when accessing thememory 120, may use the system physical addresses PA translated from the virtual addresses VA. When at least one page fault occurs during the execution of theapplication 170, theOS 180 may become aware of (i.e., identify) faulty pages among the system physical addresses PA (operation S812). - The
OS 180 may count the faulty pages and determine whether the number of the faulty pages exceeds a reference value (operation S813). When it is determined that the number of the faulty pages exceeds the reference value, theOS 180 may observe (i.e., analyze) bits of row addresses RA and bits of column addresses CA identified from the system physical addresses PA of the faulty pages (operation S814). When it is determined that the number of the faulty pages does not exceed the reference value, theOS 180 may continue to operate theapplication 170 and proceed to operation S812. - The
OS 180 may predict a possible faulty address attribute appearing at the same bad address bits in the system physical addresses PA of the faulty pages (operation S815). TheOS 180 may specify the possible faulty address attribute (i.e., a fault type) of the system physical addresses PA as a row-based fault, a column-based fault, or a block-based fault. Based on the specification of the possible faulty address attribute, when theOS 180 performs second address translation between virtual addresses and system physical addresses for thememory 120, theOS 180 does not provide faulty system physical addresses of a particular fault type (e.g., a row-based fault or a row failure, a column-based fault or a column failure, or a block-based fault or a block failure) (operation S816) as the system physical addresses. The present invention is not limited thereto. In an example embodiment, when another virtual machine executes an application in cooperation with theprocessor 110, an OS of another virtual machine may perform address translation based on the faulty system physical addresses to prevent the faulty system physical addresses from being used in the translation. In an example embodiment, the faulty system physical addresses may be stored as a faulty address in a local system memory of theprocessor 110, and may be referenced by at least one virtual machine or if previously stored in theBIOS memory 130, may be uploaded to the local system memory from theBIOS memory 130. Also, theOS 180 may store the faulty system physical addresses as a faulty address in theBIOS memory 130. - While the
OS 180 is handling page faults, the operation of thesystem 100 is not interrupted or rebooted, and the method proceeds to operation S810. Therefore, the availability of thesystem 100 may be maintained. -
FIG. 9 is a conceptual diagram for describing a repair operation performed when thesystem 100 ofFIG. 1 is booted. - Referring to
FIG. 9 , theBIOS memory 130 may store a BIOS code for booting thesystem 100. Also, theBIOS memory 130 may store faulty addresses specified by theOS 180 as a faulty row address FRA, a faulty column address FCA, and/or a faulty block address FBA. The faulty row address FRA, the faulty column address FCA, and/or the faulty block address FBA may be stored in anon-volatile memory unit 930. Thenon-volatile memory unit 930 is a part of a non-volatile memory device constituting theBIOS memory 130. - The
system 100 may execute boot operations that execute a part of the BIOS code of theBIOS memory 130 by theprocessor 110 as thesystem 100 is powered on. Memory training for thememory 120 may be included in boot operations for executing the BIOS code by theprocessor 110. The memory training may be performed for thememory controller 112 to determine the optimal parameters for core parameters and/or peripheral circuit parameters of thememory 120. Hereinafter, for convenience of explanation, thememory 120 will be collectively referred to as a dynamic random access memory (DRAM) 120. TheDRAM 120 may be any one of a synchronous DRAM (SDRAM), a double data rate SDRAM (DDR SDRAM), a low power double data rate SDRAM (LPDDR SDRAM), a graphics double data rate SDRAM (GDDR SDRAM), a DDR2 SDRAM, a DDR3. SDRAM, a DDR4 SDRAM, DDR5 SDRAM, a wide I/O DRAM, a high bandwidth memory (HBM), and a hybrid memory cube (HMC). - When the
system 100 is booted, thememory controller 112 may initialize theDRAM 120 according to an algorithm set in a register control word (RCW) and perform memory training for theDRAM 120. The memory training may be performed by using a memory PHY provided for signals, frequencies, timings, drivings, detailed operation parameters, and functionality needed for an efficient communication between thememory controller 112 and theDRAM 120. Thememory controller 112 may provide the faulty row address FRA, the faulty column address FCA, and/or the faulty block address FRA stored in thenon-volatile memory unit 930 of theBIOS memory 130 to theDRAM 120 after the memory training of theDRAM 120. - The
DRAM 120 may repair faulty cells showing fault characteristics in a memory cell array. The memory cell array may include a plurality of word lines, a plurality of bit lines, and a plurality of memory cells formed at points where the word lines and the bit lines intersect each other. TheDRAM 120 may include arepair control circuit 920 configured to repair faulty cells with redundancy cells. Therepair control circuit 920 may repair faulty cells detected through a test after a semiconductor manufacturing process of theDRAM 120. Also, therepair control circuit 920 may perform a post package repair (PPR) that repairs faulty cells occurring during continuous use of theDRAM 120 with redundancy cells. - The
repair control circuit 920 may perform PPR to replace the faulty row address FRA, the faulty column address FCA, and/or the faulty block address FBA with a redundancy row address RRA, a redundancy column address RCA, and/or a redundancy block address RBA, respectively. Therepair control circuit 920 may store information about destination addresses D_ADDR (i.e., the redundancy row address RRA, the redundancy column address RCA, and/or the redundancy block address RBA) that replaced source addresses S_ADDR that needed to be repaired (i.e., the faulty row address FRA, the faulty column address FCA, and/or the faulty block address FBA) in an address storage table 921 (i.e., an address storage circuit). In an example embodiment, the address storage table 921 may be included in therepair control circuit 920 or in thememory 120. - The address storage table 921 may include, for example, an anti-fuse array or a content addressable memory (CAM). The anti-fuse is a resistive fuse element having electrical characteristics opposite to those of a fuse element, having a high resistance value in a non-programmed state, and having a low resistance value in a programmed state. The CAM is a special memory structure that simultaneously compares an input address with source addresses S_ADDR stored in respective CAM entries, and an output of the CAM indicates that a source address S_ADDR, if any, is matching a destination address D_ADDR.
- The
repair control circuit 920 may provide the address storage table 921 to thememory controller 112. In an example embodiment, thememory controller 112 may access the address storage table 921 to update the information stored in the address storage table 921 or read the information from the address storage table 921. Thememory controller 112 may store the address storage table 921 information asmemory management information 910 for consistent access of theDRAM 120 by at least oneprocessor 110. The address storage table 921 information may be shared by at least oneprocessor 110. When at least oneprocessor 110 performs an operation for memory allocation during execution of theapplications 170, the memory allocation operation is performed based on information in the address storage table 921. Therefore, at least oneprocessor 110 may perform functions commonly known as those of a memory manager, that is, managing an address space of theOS 180 in theDRAM 120 and evenly distributing memory regions to other virtual machines VM using theDRAM 120, by using thememory management information 910. -
FIGS. 10 to 12 are diagrams for describing a repair operation performed in thememory 120 ofFIG. 1 . - In
FIG. 10 , it is assumed that a faulty row address FRA is repaired with a redundancy row address RRA. Amemory cell array 1000 a may include a normal cell array NMCA and a redundancy cell array RMCA. The normal cell array NMCA may include memory cells connected to word lines and bit lines, and the redundancy cell array RMCA may include memory cells connected to redundancy word lines and redundancy bit lines. Therepair control circuit 920 may include a row repairer 922 (i.e., a row repair circuit) that determines a redundancy row address RRA, such that redundancy resources for repairing a faulty row address FRA do not overlap with one another. - The
row repairer 922 may perform a repair operation, such that the redundancy row address RRA is selected instead of the faulty row address FRA. When an access row address applied to a memory designates the faulty row address FRA of the normal cell array NMCA, redundancy cells corresponding to the redundancy row address RRA of the redundancy cell array RMCA are selected. Therow repairer 922 deactivates a word line corresponding to the faulty row address FRA and activates a redundancy word line corresponding to the redundancy row address RRA instead. Therefore, redundancy cells corresponding to the redundancy row address RRA are selected instead of memory cells corresponding to the faulty row address FRA. - In
FIG. 11 , it is assumed that a faulty column address FCA is repaired with a redundancy column address RCA. Amemory cell array 1000 b may include the normal cell array NMCA and the redundancy cell array RMCA. The normal cell array NMCA may include memory cells connected to word lines and bit lines, and the redundancy cell array RMCA may include memory cells connected to the word lines and redundancy bit lines. Therepair control circuit 920 may include acolumn repairer 924 that determines a redundancy column address RCA, such that redundancy resources for repairing a faulty column address FRA do not overlap with one another. - The
column repairer 924 may perform a repair operation, such that the redundancy column address RCA is selected instead of the faulty column address FCA. When an access column address applied to a memory designates the faulty column address FCA of the normal cell array NMCA, redundancy cells corresponding to the redundancy column address RCA of the redundancy cell array RMCA are selected. Thecolumn repairer 924 prevents a bit line corresponding to the faulty column address FCA from being selected and selects a redundancy bit line corresponding to the redundancy column address RCA instead. Therefore, redundancy cells corresponding to the redundancy column address RCA are selected instead of memory cells corresponding to the faulty column address FCA. - In
FIG. 12 , it is assumed that a faulty block address FBA is repaired with a redundancy block address RBA. Amemory cell array 1000 c may include the normal cell array NMCA and the redundancy cell array RMCA. The normal cell array NMCA may include memory cells connected to word lines and bit lines, and the redundancy cell array RMCA may include memory cells connected to redundancy word lines and redundancy bit lines. Therepair control circuit 920 may include ablock repairer 926 that determines a redundancy block address RBA, such that redundancy resources for repairing a faulty block address FBA do not overlap with one another. - The
block repairer 926 may perform a repair operation, such that the redundancy block address RBA is selected instead of the faulty block address FBA. When an access row address and an access column address applied to a memory designate a faulty block address FBA indicating a certain region of the normal cell array NMCA, redundancy cell regions corresponding to the redundancy block address RBA of the redundancy cell array RMCA are selected. Theblock repairer 926 prevents memory cells in a memory region corresponding to the faulty block address FBA from being selected and selects redundancy cells in a memory region corresponding to the redundancy block address RBA instead. -
FIG. 13 is a flowchart of a method of booting thesystem 100 according to an embodiment of the inventive concept. - Referring to
FIGS. 1, 9 and 13 , when thesystem 100 is powered on (operation S1310), thememory 120 may execute booting operations for executing a part of the BIOS code of theBIOS memory 130 through theprocessor 110. Memory training for thememory 120 may be performed from among the boot operations for executing the BIOS code through the processor 110 (operation S1313). The memory training may be performed for thememory controller 112 to determine the optimal parameters for core parameters and/or peripheral circuit parameters of thememory 120. After the memory training (operation S1313), faulty system physical addresses (e.g., a faulty row address FRA, a faulty column address FCA, and/or a faulty block address FBA) stored in theBIOS memory 130 may be transmitted to the memory 120 (operation S1320). - The
memory 120 may perform an operation for repairing faulty system physical addresses (operation S1314). As described above, thememory 120 may repair the faulty row address FRA with a redundancy row address RRA. The memory may repair the faulty column address FCA with a redundancy column address RCA. The memory may repair the faulty block address FBA with a redundancy block address RBA. Thememory 120 may repair faulty system physical addresses, thereby using the resources of thememory 120 with the maximum efficiency. - Embodiments of the inventive concept may be implemented in many different types of system. Furthermore, embodiments of the inventive concept may be implemented in code and stored in an article comprising a non-transitory machine-readable storage medium storing instructions that may be used to program system to execute instructions. The non-transitory storage medium may include, but is not limited to, any type of disc including a floppy disc, an optical disc, a solid state drive (SSD), a compact disc read only memory (CD-ROM), a compact disc rewritable (CD-RW), and a magneto-optical disc, a ROM, a random access memory (RAM) like a dynamic random access memory (DRAM) and a static random access memory (SRAM), a semiconductor type device like an erasable and programmable read-only memory (EPROM), a flash memory, and an electrically erasable and programmable read-only memory (EEPROM), a magnetic or optical card, or any other types of media suitable for storing electronic instructions.
- While the inventive concept has been particularly shown and described with reference to embodiments thereof, it will be understood that various changes in form and details may be made therein without departing from the spirit and scope of the following claims.
Claims (21)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2020-0058448 | 2020-05-15 | ||
KR1020200058448A KR20210141156A (en) | 2020-05-15 | 2020-05-15 | Handling operation system (OS) in a system for predicting and managing faulty memories based on page faults |
Publications (2)
Publication Number | Publication Date |
---|---|
US20210357279A1 true US20210357279A1 (en) | 2021-11-18 |
US11360837B2 US11360837B2 (en) | 2022-06-14 |
Family
ID=78512347
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/198,979 Active US11360837B2 (en) | 2020-05-15 | 2021-03-11 | Handling operation system (OS) in system for predicting and managing faulty memories based on page faults |
Country Status (3)
Country | Link |
---|---|
US (1) | US11360837B2 (en) |
KR (1) | KR20210141156A (en) |
CN (1) | CN113672430A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210255963A1 (en) * | 2019-09-25 | 2021-08-19 | Nvidia Corp. | Addressing cache slices in a last level cache |
US20230091623A1 (en) * | 2021-09-23 | 2023-03-23 | Nanya Technology Corporation | Defect inspecting method and system performing the same |
US20230162811A1 (en) * | 2021-11-25 | 2023-05-25 | SK Hynix Inc. | Integrated circuit, memory and operation method of memory |
Family Cites Families (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6199177B1 (en) * | 1998-08-28 | 2001-03-06 | Micron Technology, Inc. | Device and method for repairing a semiconductor memory |
US6910152B2 (en) * | 1998-08-28 | 2005-06-21 | Micron Technology, Inc. | Device and method for repairing a semiconductor memory |
US6564346B1 (en) * | 1999-12-07 | 2003-05-13 | Infineon Technologies Richmond, Lp. | Advanced bit fail map compression with fail signature analysis |
US7111145B1 (en) * | 2003-03-25 | 2006-09-19 | Vmware, Inc. | TLB miss fault handler and method for accessing multiple page tables |
US7356735B2 (en) * | 2004-03-30 | 2008-04-08 | Intel Corporation | Providing support for single stepping a virtual machine in a virtual machine environment |
US7337291B2 (en) | 2005-01-14 | 2008-02-26 | Microsoft Corporation | Software memory access control |
US7653840B1 (en) * | 2007-04-27 | 2010-01-26 | Net App, Inc. | Evaluating and repairing errors during servicing of storage devices |
US7779312B2 (en) * | 2007-08-13 | 2010-08-17 | Faraday Technology Corp. | Built-in redundancy analyzer and method for redundancy analysis |
US7865762B2 (en) * | 2007-12-04 | 2011-01-04 | Intel Corporation | Methods and apparatus for handling errors involving virtual machines |
US20110002169A1 (en) | 2009-07-06 | 2011-01-06 | Yan Li | Bad Column Management with Bit Information in Non-Volatile Memory Systems |
JP2012038368A (en) * | 2010-08-04 | 2012-02-23 | Toshiba Corp | Failure analysis device and failure analysis method |
US8627176B2 (en) | 2010-11-30 | 2014-01-07 | Microsoft Corporation | Systematic mitigation of memory errors |
KR101797565B1 (en) | 2011-08-22 | 2017-12-12 | 삼성전자 주식회사 | Memory device employing bad page management |
WO2013080288A1 (en) * | 2011-11-28 | 2013-06-06 | 富士通株式会社 | Memory remapping method and information processing device |
US9003223B2 (en) * | 2012-09-27 | 2015-04-07 | International Business Machines Corporation | Physical memory fault mitigation in a computing environment |
US20140258780A1 (en) * | 2013-03-05 | 2014-09-11 | Micron Technology, Inc. | Memory controllers including test mode engines and methods for repair of memory over busses used during normal operation of the memory |
KR20150040481A (en) | 2013-10-07 | 2015-04-15 | 에스케이하이닉스 주식회사 | Memory device, operation method of memory device and memory system |
US9213491B2 (en) | 2014-03-31 | 2015-12-15 | Intel Corporation | Disabling a command associated with a memory device |
US9652321B2 (en) | 2014-09-23 | 2017-05-16 | Intel Corporation | Recovery algorithm in non-volatile memory |
KR102261815B1 (en) * | 2014-10-30 | 2021-06-07 | 삼성전자주식회사 | Data storage device for reducing firmware update time, and data processing system including the same |
US20160147667A1 (en) * | 2014-11-24 | 2016-05-26 | Samsung Electronics Co., Ltd. | Address translation in memory |
US10546649B2 (en) | 2015-08-18 | 2020-01-28 | Hewlett Packard Enterprise Development Lp | Post package repair for mapping to a memory failure pattern |
US10528476B2 (en) * | 2016-05-24 | 2020-01-07 | International Business Machines Corporation | Embedded page size hint for page fault resolution |
US20180067866A1 (en) * | 2016-09-08 | 2018-03-08 | Intel Corporation | Translate on virtual machine entry |
US10705763B2 (en) * | 2017-02-10 | 2020-07-07 | International Business Machines Corporation | Scale and performance for persistent containers using SCSI second level addressing to map storage volume to host of container environment, wherein said storage volume is scanned at said SCSI second level addressing without rescanning at OS level virtualization |
US10379758B2 (en) * | 2017-06-26 | 2019-08-13 | Western Digital Technologies, Inc. | Managing system data for a data storage system |
US10403390B1 (en) | 2018-04-09 | 2019-09-03 | Micron Technology, Inc. | Post-packaging repair of redundant rows |
-
2020
- 2020-05-15 KR KR1020200058448A patent/KR20210141156A/en active Search and Examination
-
2021
- 2021-03-11 US US17/198,979 patent/US11360837B2/en active Active
- 2021-03-12 CN CN202110270073.2A patent/CN113672430A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210255963A1 (en) * | 2019-09-25 | 2021-08-19 | Nvidia Corp. | Addressing cache slices in a last level cache |
US11429534B2 (en) * | 2019-09-25 | 2022-08-30 | Nvidia Corp. | Addressing cache slices in a last level cache |
US20230091623A1 (en) * | 2021-09-23 | 2023-03-23 | Nanya Technology Corporation | Defect inspecting method and system performing the same |
US20230162811A1 (en) * | 2021-11-25 | 2023-05-25 | SK Hynix Inc. | Integrated circuit, memory and operation method of memory |
US11837311B2 (en) * | 2021-11-25 | 2023-12-05 | SK Hynix Inc. | Integrated circuit, memory and operation method of memory |
Also Published As
Publication number | Publication date |
---|---|
KR20210141156A (en) | 2021-11-23 |
CN113672430A (en) | 2021-11-19 |
US11360837B2 (en) | 2022-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11360837B2 (en) | Handling operation system (OS) in system for predicting and managing faulty memories based on page faults | |
CN112470113B (en) | Isolation performance domains in memory systems | |
US9009580B2 (en) | System and method for selective error checking | |
US7945815B2 (en) | System and method for managing memory errors in an information handling system | |
US7478285B2 (en) | Generation and use of system level defect tables for main memory | |
US5974564A (en) | Method for remapping defective memory bit sets to non-defective memory bit sets | |
EP2026356B1 (en) | Method for creating a memory defect map and optimizing performance using the memory defect map | |
US20150186230A1 (en) | Physical memory fault mitigation in a computing environment | |
CN112667445B (en) | Method and device for repairing packaged memory, storage medium and electronic equipment | |
KR20210050213A (en) | Memory device varying repair unit and repair method thereof | |
US20220245066A1 (en) | Memory system including heterogeneous memories, computer system including the memory system, and data management method thereof | |
US11675724B2 (en) | Memory sub-system with multiple ports having single root virtualization | |
CN112053730A (en) | Redundant cloud memory storage for memory subsystems | |
US20120084496A1 (en) | Validating persistent memory content for processor main memory | |
US20180356994A1 (en) | Software assist memory module hardware architecture | |
US11182313B2 (en) | System, apparatus and method for memory mirroring in a buffered memory architecture | |
US11380418B2 (en) | Memory controllers, storage devices, and operating methods of the storage devices | |
US8667325B2 (en) | Method, apparatus and system for providing memory sparing information | |
CN114649032A (en) | Split protocol approach to enabling devices with enhanced persistent memory region access | |
US10915404B2 (en) | Persistent memory cleaning | |
KR20210034456A (en) | Storage device and method of operating the storage device | |
US11734191B2 (en) | User process identifier based address translation | |
US20220350715A1 (en) | Runtime sparing for uncorrectable errors based on fault-aware analysis | |
CN118113497A (en) | Memory fault processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, REPUBLIC OF Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LEE, JONGYOUNG;KIM, DONGYOON;KIM, MINHYOUK;AND OTHERS;REEL/FRAME:055686/0809 Effective date: 20201111 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |