WO2016115661A1 - Memory fault isolation method and device - Google Patents

Memory fault isolation method and device Download PDF

Info

Publication number
WO2016115661A1
WO2016115661A1 PCT/CN2015/071008 CN2015071008W WO2016115661A1 WO 2016115661 A1 WO2016115661 A1 WO 2016115661A1 CN 2015071008 W CN2015071008 W CN 2015071008W WO 2016115661 A1 WO2016115661 A1 WO 2016115661A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
physical address
block
address block
belongs
Prior art date
Application number
PCT/CN2015/071008
Other languages
French (fr)
Chinese (zh)
Inventor
刘勇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2015/071008 priority Critical patent/WO2016115661A1/en
Priority to CN201580011928.2A priority patent/CN106133704A/en
Publication of WO2016115661A1 publication Critical patent/WO2016115661A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation

Definitions

  • Embodiments of the present invention relate to computer technologies, and in particular, to a memory fault isolation method and apparatus.
  • a memory failure on the server will cause the server or board to be reset.
  • the server reset will cause the running application to be interrupted, and the server needs to be returned to the manufacturer to replace the memory. It is inconvenient to replace the memory.
  • the prior art provides an offline fault memory isolation method, which does not need to return to the manufacturer to replace the memory.
  • the method includes: first, detecting the memory through a Basic Input Output System (BIOS) before the server is running, and Obtain the address information of the faulty memory space, save the address information of the faulty memory space to a non-volatile memory (Non-Volatile Memory, NVM for short), and read the address information of the faulty memory space saved in the NVM, and the address information is obtained.
  • the corresponding fault memory space is marked as unavailable, permanently isolating the fault memory space.
  • the method provided by the above prior art can only perform fault isolation before the server is running, and a memory hardware failure occurs during the running of the server, which still causes service interruption.
  • the embodiment of the invention provides a memory fault isolation method and device, which can perform fault isolation during server operation and avoid service interruption.
  • a first aspect of the present invention provides a memory fault isolation apparatus, including:
  • An exception processing module configured to monitor a state of a memory that is identified by a physical address mapped by a virtual address of the process, where a page to which the virtual address belongs has a mapping relationship with a physical address block to which the original physical address belongs, The physical address block is used to identify a contiguous memory interval allocated to the process;
  • the exception handling module is further configured to mark a status of the physical address block to which the original physical address belongs as a fault, if the memory identified by the original physical address is faulty;
  • a memory management module for using a non-faulty memory for the page block to which the virtual address belongs Newly allocated physical address block
  • the exception processing module is further configured to synchronize data in a memory interval identified by a physical address block to which the original physical address belongs to a memory interval identified by the reallocated physical address block.
  • the exception processing module is further configured to: save the information of the physical address block marked as a fault to a nonvolatile memory in;
  • the memory management module is specifically configured to:
  • the memory management module is specifically configured to:
  • the memory management module is further configured to:
  • the initial memory is allocated to the process from the non-faulty memory.
  • the exception handling The module is also used to:
  • the status of the physical address block to which the original physical address belongs is marked as non-fault.
  • a second aspect of the present invention provides a memory fault isolation method, including:
  • the status of the physical address block to which the original physical address belongs is marked as a fault
  • the method further includes:
  • Reassigning a physical address block to the page to which the virtual address belongs from the non-faulty memory includes:
  • the The physical memory block is re-allocated in the fault memory for the page to which the virtual address belongs, including:
  • the method further includes:
  • the non-faulty memory a memory other than the fault physical address block in the memory
  • the initial memory is allocated to the process from the non-faulty memory.
  • the method further include:
  • the status of the physical address block to which the original physical address belongs is marked as non-fault.
  • the server monitors the state of the memory identified by the original physical address mapped by the virtual address of the process, and if the memory identified by the original physical address fails, the server Mark the status of the physical address block to which the original physical address belongs as a fault to isolate the faulty memory space online. And the server allocates a physical address block to the page to which the virtual address belongs, and synchronizes the data in the memory interval identified by the physical address block to which the original physical address belongs to the memory interval identified by the reallocated physical address block.
  • the virtual address of the process is unchanged during the whole process, so that the service is not interrupted, and the faulty memory space is isolated online.
  • FIG. 1 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a memory fault isolation device according to Embodiment 1 of the present invention.
  • FIG. 3 is a flowchart of a memory fault isolation method according to Embodiment 2 of the present invention.
  • FIG. 1 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • the server includes: a memory, a processor, and an NVM.
  • the processor mainly refers to a central processing unit (CPU), and a processor. It includes a memory management module and an exception handling module.
  • Memory Management Module Implement virtual machine or process virtual address space and physical address space mapping.
  • the exception handling module is configured to perform related processing on the abnormality triggered by the CPU, and is used to process the memory fault in the embodiment of the present invention.
  • the memory and NVM are two separate physical hardware.
  • the NVM is used to store the physical address of the faulty memory space. Even if the server is powered off, the data stored in the NVM will not be lost.
  • the memory is generally a random access memory (RAM) or a dynamic random access memory (DRAM).
  • RAM random access memory
  • DRAM dynamic random access memory
  • the memory generally exists in the form of a memory stick. The size of a single memory stick in the memory of the server is 8GB, 16GB and above, the replacement cost is high, and the virtual machine or process is a business running bearer.
  • FIG. 2 is a schematic structural diagram of a memory fault isolation device according to Embodiment 1 of the present invention.
  • the memory fault isolation device provided in this embodiment may be integrated in a server. As shown in FIG. 2, the memory fault isolation device provided in this embodiment is provided.
  • the method includes an exception processing module 11 and a memory management module 12.
  • the exception processing module 11 is configured to monitor the state of the memory identified by the original physical address mapped by the virtual address of the process, where the page to which the virtual address belongs has a mapping relationship with the physical address block to which the original physical address belongs, and the physical address block Used to identify a contiguous memory interval allocated to a process.
  • the exception handling module 11 is further configured to mark the status of the physical address block to which the original physical address belongs as a fault.
  • a memory management module 12 configured to re-allocate a physical address block from a non-faulty memory for a page block to which the virtual address belongs;
  • the exception handling module 11 is further configured to synchronize data in the memory interval identified by the physical address block to which the original physical address belongs to the memory interval identified by the reallocated physical address block.
  • each page includes a virtual address interval, and each page corresponds to one physical address block, and each physical address The block is used to identify a contiguous memory interval.
  • Each virtual address in the page corresponds to a physical address in the physical address block, and the mapping relationship between the virtual address and the physical address is maintained by the memory management module 12 of the server.
  • the memory management module 12 allocates initial memory for the process, which is: first, the information of the fault physical address block saved in the NVM is read, and then, the non-fault memory is determined according to the information of the fault physical address block, and the non-fault memory is Memory in memory other than the failed physical address block; finally, the initial memory is allocated to the process from non-faulty memory.
  • the memory management module 12 implements the mapping of the virtual address and the physical address in the form of a page table, and the page table organization form is managed in a level 1 or multi-level manner, for example, the Linux kernel adopts a level 3 page table management method, and each The size of the page may be 4K, 2M or 1gbps, etc.
  • the management form of the page table is not limited.
  • the memory management module 12 can avoid the memory identified by the failed physical address block when the initial memory is allocated for the process by reading the information of the failed physical address block in the NVM.
  • the exception handling module 11 monitors the state of the memory identified by the original physical address mapped by the virtual address of the process, and the state of the memory identified by the original physical address includes: a fault state and a normal state.
  • the memory management module 12 receives the memory access request sent by the process, where the memory access request includes the virtual address of the process, and the memory management module 12 maps the virtual address to the original physical address, and the virtual address and the original physical address. The correspondence is stored in the Translation Look-aside Buffer (TLB).
  • TLB Translation Look-aside Buffer
  • the memory management module 12 sends the original physical address to the memory controller through the memory bus, and the memory controller reads the data according to the original physical address, and if the abnormality cannot read the data in the memory identified by the original physical address, the memory control
  • the device issues an exception access instruction through the memory bus, and the exception handling module 11 determines, according to the abnormal access instruction, that the memory identified by the original physical address is faulty.
  • a page table is usually used as the minimum operation unit, and a page table corresponds to a physical address block. Therefore, when the memory identified by the original physical address fails, the exception handling module 11 belongs to the original physical address. The status of the physical address block is marked as a failure, isolating the physical address block to which the original physical address belongs. Usually the information of the failed physical address block is recorded in the NVM, so that even if the server is powered down, the information of the failed physical address block stored in the NVM is also It will not be lost. After the server is powered on, the memory management module 12 can still read the information of the fault physical address block from the NVM. When the initial memory is allocated for the process, the memory interval identified by the fault physical address block can be avoided.
  • the memory management module 12 reallocates the physical address block for the page to which the virtual address belongs, and the physical address mapped by the virtual address changes before and after the allocation, but for the upper application In this case, the virtual address of the process corresponding to the application does not change. As long as the virtual address does not change, the process is not interrupted, thereby ensuring that the user's service is not interrupted.
  • the memory management module 12 specifically allocates a physical address block for the page to which the virtual address belongs by first: first, determining non-faulty memory according to the information of the failed physical address block stored in the NVM, and the non-faulty memory is the memory-depleted physical address block in the memory. Memory outside of the identified memory. Then, based on the virtual address and the process number of the process, the physical address block is reassigned from the non-faulty memory for the page to which the virtual address belongs. The memory management module 12 re-allocates the physical address block from the non-fault memory to the page to which the virtual address belongs according to the virtual address and the process ID of the process. Specifically, the virtual address belongs to the virtual address and the process ID of the process. The page is then selected from the non-faulty memory to establish a mapping relationship between the page to which the virtual address belongs and the selected physical address block.
  • the exception processing module 11 is further configured to synchronize the data in the memory interval identified by the physical address block to which the original physical address belongs to the reallocated physical unit. In the memory interval identified by the address block.
  • the memory controller when the memory identified by the original physical address fails, the memory controller generates an abnormal instruction, and the exception processing module 11 performs data recovery according to the abnormal access instruction, where the abnormal access instruction includes an operation code and an operand, and the operation code indicates the
  • the operation type of the exception access instruction is a read operation or a write operation
  • the operand includes information of a register to be accessed by a read operation or a write operation, and a physical address of data to be accessed by a read operation or a write operation. If the operation type is a write operation, the exception handling module 11 writes the data to be written in the memory identified by the original physical address into the corresponding location in the reallocated physical address block.
  • the exception processing module 11 can perform data recovery on the data to be read according to the backup data of the data to be read in the memory identified by the original physical address, and if the data to be read can be recovered according to the backup data, Then, the recovered data to be read is copied to the corresponding position in the reallocated physical address block. If you want to read If the data cannot be recovered, the exception handling module 11 resets the process. Different from the prior art, in this embodiment, only the currently monitored process needs to be reset without interrupting other processes running on the server. In the prior art, once a process has a memory failure during operation, the server needs to be reset, and all processes running on the server are interrupted, causing all services to be interrupted.
  • the status of the original physical address block is marked as a fault, in the memory interval identified by the original physical address block, only the memory identified by the original physical address is faulty, and the original physical address block is faulty.
  • the other memory in the identified memory interval is normal. Therefore, when the exception processing module 11 synchronizes the data in the memory interval identified by the physical address block to which the original physical address belongs to the memory interval identified by the reallocated physical address block, The data in the normal memory can be directly copied from the memory interval identified by the original physical address block to the memory interval identified by the reallocated physical address block.
  • the exception processing module 11 is further configured to perform fault detection on the memory, and specifically detect whether the faulty memory block in the NVM has returned to normal. If it is detected that the physical address block to which the original physical address belongs has returned to normal, Then, the exception handling module 11 marks the state of the physical address block to which the original physical address belongs as non-fault, and the memory interval identified by the original physical address block can be used for memory allocation. If the physical address block to which the original physical address belongs cannot be recovered, the exception handling module 11 permanently isolates the original physical address block, and the memory interval identified by the original physical address block cannot be used for memory allocation.
  • the exception handling module monitors the state of the memory identified by the original physical address mapped by the virtual address of the process. If the memory identified by the original physical address fails, the exception handling module will The status of the physical address block to which the physical address belongs is marked as a fault to isolate the faulty memory space online. And the exception handling module calls the memory management module to allocate a physical address block for the page to which the virtual address belongs, and synchronizes the data in the memory interval identified by the physical address block to which the original physical address belongs to the memory interval identified by the reallocated physical address block. in.
  • the virtual address of the process is unchanged during the whole process, so that the service is not interrupted, and the faulty memory space is isolated online.
  • FIG. 3 is a flowchart of a memory fault isolation method according to Embodiment 2 of the present invention.
  • the method in this embodiment is performed by a server. As shown in FIG. 2, the method in this embodiment may include the following steps:
  • Step 101 Monitor the state of the memory identified by the original physical address mapped by the virtual address of the process, where the page to which the virtual address belongs has a mapping relationship with the physical address block to which the original physical address belongs, and the physical address block is used to identify the process assigned to the process. A continuous memory interval.
  • the server's operating system creates a process for the application, and the server allocates an initial memory space for the process.
  • the server allocates initial memory for the process, which is: first, the information of the fault physical address block saved in the NVM is read, and then the non-fault memory is determined according to the information of the fault physical address block, and the non-fault memory is the fault physical address block in the memory. External memory; finally, the initial memory is allocated to the process from non-faulty memory.
  • the memory management module implements mapping of the virtual address and the physical address in the form of a page table, and the page table organization form is managed in a level 1 or multi-level manner, for example, the Linux kernel adopts a level 3 page table management method, and each The size of the page may be 4K, 2M or 1gbps, etc.
  • the management form of the page table is not limited.
  • the server can avoid the memory identified by the fault physical address block when the initial memory is allocated for the process.
  • the server monitors the state of the memory identified by the original physical address mapped by the virtual address of the process, and the state of the memory identified by the original physical address includes: a fault state and a normal state.
  • the memory management module of the server receives the memory access request sent by the process, where the memory access request includes a virtual address of the process, and the memory management module maps the virtual address to the original physical address, and the virtual address and the original physical address are The corresponding relationship is stored in the TLB.
  • the memory management module sends the original physical address to the memory controller through the memory bus, and the memory controller reads the data according to the original physical address, and if the abnormality cannot read the data in the memory identified by the original physical address, the memory controller An abnormal access instruction is issued through the memory bus, and the server determines, according to the abnormal access instruction, that the memory identified by the original physical address is faulty.
  • Step 102 If the memory identified by the original physical address fails, the status of the physical address block to which the original physical address belongs is marked as a fault.
  • the server marks the status of the physical address block to which the original physical address belongs as a fault, and isolates the physical address block to which the original physical address belongs.
  • the information of the fault physical address block is recorded in the NVM, so that even if the server is powered off, the information of the fault physical address block saved in the NVM is not lost, and the server can still read the fault physics from the NVM after power-on.
  • Address block information at the beginning of the process allocation When starting memory, you can avoid the memory interval identified by the failed physical address block.
  • Step 103 Reassign the physical address block from the non-fault memory to the page to which the virtual address belongs, and synchronize the data in the memory interval identified by the physical address block to which the original physical address belongs to the memory identified by the reallocated physical address block. In the interval.
  • the server in order to ensure that the running service is not interrupted, the server re-allocates the physical address block to the page to which the virtual address belongs, and the physical address mapped by the virtual address changes before and after the allocation, but for the upper layer application, The virtual address of the process corresponding to the application does not change. As long as the virtual address does not change, the process will not be interrupted, thereby ensuring that the user's service is not interrupted.
  • the server re-allocates the physical address block to the page to which the virtual address belongs by first: first, determining the non-faulty memory according to the information of the failed physical address block saved in the NVM, and the non-faulty memory is the block of the physical address in the memory. Memory outside of the identified memory. Then, based on the virtual address and the process number of the process, the physical address block is reassigned from the non-faulty memory for the page to which the virtual address belongs.
  • the server re-allocates the physical address block from the non-faulty memory to the page to which the virtual address belongs according to the virtual address and the process ID of the process, where: first, the page to which the virtual address belongs is obtained according to the virtual address and the process ID of the process; The physical address block is selected from the non-faulty memory to establish a mapping relationship between the page to which the virtual address belongs and the selected physical address block.
  • the server After reallocating the physical address block for the page block to which the virtual address belongs, the server synchronizes the data in the memory interval identified by the physical address block to which the original physical address belongs to the memory interval identified by the reallocated physical address block.
  • the server For the specific synchronization mode, refer to the related description of the first embodiment, and details are not described herein again.
  • the server monitors the state of the memory identified by the original physical address mapped by the virtual address of the process. If the memory identified by the original physical address fails, the server belongs to the original physical address. The status of the physical address block is marked as faulty to isolate the faulty memory space online. And the server allocates a physical address block to the page to which the virtual address belongs, and synchronizes the data in the memory interval identified by the physical address block to which the original physical address belongs to the memory interval identified by the reallocated physical address block.
  • the virtual address of the process is unchanged during the whole process, so that the service is not interrupted, and the faulty memory space is isolated online.
  • the server detects the fault of the memory, and the server specifically detects whether the faulty memory block in the NVM has returned to normal. If the server detects that the physical address block to which the original physical address belongs has returned to normal, Then, the server marks the status of the physical address block to which the original physical address belongs as non-fault, and the memory interval identified by the original physical address block can be used for memory allocation. If the physical address block to which the original physical address belongs cannot be recovered, the server permanently isolates the original physical address block, and the memory interval identified by the original physical address block cannot be used for memory allocation.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

A memory fault isolation method and device. The method comprises: during the operation of a process, a server monitoring the state of a memory identified by an original physical address mapped by a virtual address of the process; if the memory identified by the original physical address has a fault, the server marking the state of a physical address block to which the original physical address belongs as a fault, so as to realize the on-line isolation of a faulty memory space; and the server allocating the physical address block for a page to which the virtual address belongs, and synchronizing data in a memory interval identified by the physical address block to which the original physical address belongs to the memory interval identified by the re-allocated physical address block. In the above-mentioned method, the virtual address of the process in the whole process is unchanged, so as to ensure that a service is uninterrupted and realize the on-line isolation of a faulty memory space.

Description

内存故障隔离方法和装置Memory fault isolation method and device 技术领域Technical field
本发明实施例涉及计算机技术,尤其涉及一种内存故障隔离方法和装置。Embodiments of the present invention relate to computer technologies, and in particular, to a memory fault isolation method and apparatus.
背景技术Background technique
服务器发生内存故障会导致服务器或单板复位,服务器复位导致正在运行的应用将会被中断,而且需要将服务器返回生产厂家更换内存,更换内存不方便。A memory failure on the server will cause the server or board to be reset. The server reset will cause the running application to be interrupted, and the server needs to be returned to the manufacturer to replace the memory. It is inconvenient to replace the memory.
现有技术提供一种离线故障内存隔离方法,不需要返回生产厂家更换内存,该方法包括:在服务器运行之前,首先通过基本输入输出系统(Basic Input Output System,简称BIOS)对内存进行检测,并获取故障内存空间的地址信息,将故障内存空间的地址信息保存到非易失性存储器(Non-Volatile Memory,简称NVM)中,读取NVM中保存的故障内存空间的地址信息,将该地址信息对应的故障内存空间标记为不可用,永久将故障内存空间隔离。The prior art provides an offline fault memory isolation method, which does not need to return to the manufacturer to replace the memory. The method includes: first, detecting the memory through a Basic Input Output System (BIOS) before the server is running, and Obtain the address information of the faulty memory space, save the address information of the faulty memory space to a non-volatile memory (Non-Volatile Memory, NVM for short), and read the address information of the faulty memory space saved in the NVM, and the address information is obtained. The corresponding fault memory space is marked as unavailable, permanently isolating the fault memory space.
上述现有技术提供的方法,只能在服务器运行之前进行故障隔离,在服务器运行期间出现内存硬件故障,仍会导致业务中断。The method provided by the above prior art can only perform fault isolation before the server is running, and a memory hardware failure occurs during the running of the server, which still causes service interruption.
发明内容Summary of the invention
本发明实施例提供一种内存故障隔离方法和装置,能够在服务器运行期间进行故障隔离,避免业务中断。The embodiment of the invention provides a memory fault isolation method and device, which can perform fault isolation during server operation and avoid service interruption.
本发明第一方面提供一种内存故障隔离装置,包括:A first aspect of the present invention provides a memory fault isolation apparatus, including:
异常处理模块,用于监测进程的虚拟地址所映射的原物理地址所标识的内存的状态,其中,所述虚拟地址所属的页面与所述原物理地址所属的物理地址块存在映射关系,所述物理地址块用于标识分配给所述进程的一段连续内存区间;An exception processing module, configured to monitor a state of a memory that is identified by a physical address mapped by a virtual address of the process, where a page to which the virtual address belongs has a mapping relationship with a physical address block to which the original physical address belongs, The physical address block is used to identify a contiguous memory interval allocated to the process;
若所述原物理地址所标识的内存出现故障,所述异常处理模块还用于将所述原物理地址所属的物理地址块的状态标记为故障;The exception handling module is further configured to mark a status of the physical address block to which the original physical address belongs as a fault, if the memory identified by the original physical address is faulty;
内存管理模块,用于从非故障内存中为所述虚拟地址所属的页面块重 新分配物理地址块;a memory management module for using a non-faulty memory for the page block to which the virtual address belongs Newly allocated physical address block;
所述异常处理模块,还用于将所述原物理地址所属的物理地址块所标识的内存区间中的数据同步到所述重新分配的物理地址块所标识的内存区间中。The exception processing module is further configured to synchronize data in a memory interval identified by a physical address block to which the original physical address belongs to a memory interval identified by the reallocated physical address block.
结合本发明第一方面,在本发明第一方面的第一种可能的实现方式中,所述异常处理模块还用于:将所述标记为故障的物理地址块的信息保存到非易失存储器中;In conjunction with the first aspect of the present invention, in a first possible implementation of the first aspect of the present invention, the exception processing module is further configured to: save the information of the physical address block marked as a fault to a nonvolatile memory in;
所述内存管理模块具体用于:The memory management module is specifically configured to:
根据所述非易失存储器中保存的故障物理地址块的信息,确定所述非故障内存;Determining the non-faulty memory according to information of a failed physical address block stored in the nonvolatile memory;
根据所述虚拟地址和所述进程的进程号,从所述非故障内存中为所述虚拟地址所属的页面重新分配物理地址块。Reassigning a physical address block from the non-faulty memory to a page to which the virtual address belongs according to the virtual address and the process ID of the process.
结合本发明第一方面的第一种可能的实现方式,在本发明第一方面的第二种可能的实现方式中,所述内存管理模块具体用于:With reference to the first possible implementation manner of the first aspect of the present invention, in a second possible implementation manner of the first aspect of the present disclosure, the memory management module is specifically configured to:
根据所述虚拟地址和所述进程的进程号获取述虚拟地址所属的页面;Obtaining, according to the virtual address and a process ID of the process, a page to which the virtual address belongs;
从所述非故障内存中选取物理地址块,建立所述虚拟地址所属的页面到所述选取的物理地址块的映射关系。And selecting a physical address block from the non-faulty memory, and establishing a mapping relationship between the page to which the virtual address belongs to the selected physical address block.
结合本发明第一方面,在本发明第一方面的第三种可能的实现方式中,所述内存管理模块还用于:In conjunction with the first aspect of the present invention, in a third possible implementation of the first aspect of the present invention, the memory management module is further configured to:
当为所述进程分配初始内存时,读取非易失存储器中保存的故障物理地址块的信息;Reading information of a failed physical address block held in the nonvolatile memory when initial memory is allocated to the process;
根据所述故障物理地址块的信息,确定非故障内存,所述非故障内存为所述内存中除所述故障物理地址块之外的内存;Determining non-faulty memory according to the information of the fault physical address block, where the non-fault memory is a memory other than the fault physical address block in the memory;
从所述非故障内存中为所述进程分配所述初始内存。The initial memory is allocated to the process from the non-faulty memory.
结合本发明第一方面以及本发明第一方面的第一种至第三种可能的实现方式中的任意一种,在本发明第一方面的第四种可能的实现方式中,所述异常处理模块还用于:In conjunction with the first aspect of the present invention and any one of the first to third possible implementations of the first aspect of the present invention, in a fourth possible implementation of the first aspect of the present invention, the exception handling The module is also used to:
当服务器重新启动时,对所述内存进行故障检测;When the server is restarted, the memory is fault detected;
若检测到所述原物理地址所属的物理地址块已经恢复正常,则将所述原物理地址所属的物理地址块的状态标记为非故障。 If it is detected that the physical address block to which the original physical address belongs has returned to normal, the status of the physical address block to which the original physical address belongs is marked as non-fault.
本发明第二方面提供一种内存故障隔离方法,包括:A second aspect of the present invention provides a memory fault isolation method, including:
监测进程的虚拟地址所映射的原物理地址所标识的内存的状态,其中,所述虚拟地址所属的页面与所述原物理地址所属的物理地址块存在映射关系,所述物理地址块用于标识分配给所述进程的一段连续内存区间;The state of the memory identified by the original physical address to which the virtual address of the process is mapped, wherein the page to which the virtual address belongs has a mapping relationship with the physical address block to which the original physical address belongs, and the physical address block is used to identify a contiguous memory interval allocated to the process;
若所述原物理地址所标识的内存出现故障,则将所述原物理地址所属的物理地址块的状态标记为故障;If the memory identified by the original physical address fails, the status of the physical address block to which the original physical address belongs is marked as a fault;
从非故障内存中为所述虚拟地址所属的页面块重新分配物理地址块,并将所述原物理地址所属的物理地址块所标识的内存区间中的数据同步到所述重新分配的物理地址块所标识的内存区间中。Relocating a physical address block from the non-faulty memory for the page block to which the virtual address belongs, and synchronizing data in the memory interval identified by the physical address block to which the original physical address belongs to the reallocated physical address block In the identified memory interval.
结合本发明第二方面,在本发明第二方面的第一种可能的实现方式中,所述方法还包括:With reference to the second aspect of the present invention, in a first possible implementation manner of the second aspect of the present invention, the method further includes:
将所述标记为故障的物理地址块的信息保存到非易失存储器中;Saving the information of the physical address block marked as failed to the non-volatile memory;
所述从非故障内存中为所述虚拟地址所属的页面重新分配物理地址块,包括:Reassigning a physical address block to the page to which the virtual address belongs from the non-faulty memory includes:
根据所述非易失存储器中保存的故障物理地址块的信息,确定所述非故障内存;Determining the non-faulty memory according to information of a failed physical address block stored in the nonvolatile memory;
根据所述虚拟地址和所述进程的进程号,从所述非故障内存中为所述虚拟地址所属的页面重新分配物理地址块。Reassigning a physical address block from the non-faulty memory to a page to which the virtual address belongs according to the virtual address and the process ID of the process.
结合本发明第二方面的第一种可能的实现方式,在本发明第二方面的第二种可能的实现方式中,所述根据所述虚拟地址和所述进程的进程号,从所述非故障内存中为所述虚拟地址所属的页面重新分配物理地址块,包括:With reference to the first possible implementation manner of the second aspect of the present invention, in a second possible implementation manner of the second aspect of the present invention, the The physical memory block is re-allocated in the fault memory for the page to which the virtual address belongs, including:
根据所述虚拟地址和所述进程的进程号获取述虚拟地址所属的页面;Obtaining, according to the virtual address and a process ID of the process, a page to which the virtual address belongs;
从所述非故障内存中选取物理地址块,建立所述虚拟地址所属的页面到所述选取的物理地址块的映射关系。And selecting a physical address block from the non-faulty memory, and establishing a mapping relationship between the page to which the virtual address belongs to the selected physical address block.
结合本发明第二方面,在本发明第二方面的第三种可能的实现方式中,所述方法还包括:With reference to the second aspect of the present invention, in a third possible implementation manner of the second aspect of the present invention, the method further includes:
当为所述进程分配初始内存时,读取非易失存储器中保存的故障物理地址块的信息;Reading information of a failed physical address block held in the nonvolatile memory when initial memory is allocated to the process;
根据所述故障物理地址块的信息,确定非故障内存,所述非故障内存 为所述内存中除所述故障物理地址块之外的内存;Determining non-faulty memory according to the information of the fault physical address block, the non-faulty memory a memory other than the fault physical address block in the memory;
从所述非故障内存中为所述进程分配所述初始内存。The initial memory is allocated to the process from the non-faulty memory.
结合本发明第二方面以及本发明第二方面的第一种至第三种可能的实现方式中的任意一种,在本发明第二方面的第四种可能的实现方式中,所述方法还包括:In conjunction with the second aspect of the present invention and any one of the first to third possible implementations of the second aspect of the present invention, in a fourth possible implementation of the second aspect of the present invention, the method further include:
当服务器重新启动时,对所述内存进行故障检测;When the server is restarted, the memory is fault detected;
若检测到所述原物理地址所属的物理地址块已经恢复正常,则将所述原物理地址所属的物理地址块的状态标记为非故障。If it is detected that the physical address block to which the original physical address belongs has returned to normal, the status of the physical address block to which the original physical address belongs is marked as non-fault.
本发明实施例的内存故障隔离方法和装置,在进程运行过程中,服务器监测该进程的虚拟地址所映射的原物理地址所标识的内存的状态,若原物理地址所标识的内存出现故障,则服务器将原物理地址所属的物理地址块的状态标记为故障,以实现在线隔离故障内存空间。并且服务器为虚拟地址所属的页面分配物理地址块,将原物理地址所属的物理地址块所标识的内存区间中的数据同步到重新分配的物理地址块所标识的内存区间中。本实施例的方法,在整个过程中该进程的虚拟地址不变,从而保证业务不中断,实现在线对故障内存空间进行隔离。The memory fault isolation method and device of the embodiment of the present invention, during the process running, the server monitors the state of the memory identified by the original physical address mapped by the virtual address of the process, and if the memory identified by the original physical address fails, the server Mark the status of the physical address block to which the original physical address belongs as a fault to isolate the faulty memory space online. And the server allocates a physical address block to the page to which the virtual address belongs, and synchronizes the data in the memory interval identified by the physical address block to which the original physical address belongs to the memory interval identified by the reallocated physical address block. In the method of this embodiment, the virtual address of the process is unchanged during the whole process, so that the service is not interrupted, and the faulty memory space is isolated online.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description of the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any inventive labor.
图1为本发明实施例适用的一种服务器的结构示意图;1 is a schematic structural diagram of a server according to an embodiment of the present invention;
图2为本发明实施例一提供的一种内存故障隔离装置的结构示意图;2 is a schematic structural diagram of a memory fault isolation device according to Embodiment 1 of the present invention;
图3为本发明实施例二提供的一种内存故障隔离方法的流程图。FIG. 3 is a flowchart of a memory fault isolation method according to Embodiment 2 of the present invention.
具体实施方式detailed description
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描 述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the objectives, technical solutions, and advantages of the embodiments of the present invention more clearly, the technical solutions in the embodiments of the present invention are clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. It is apparent that the described embodiments are a part of the embodiments of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
本发明实施例的方法主要应用于具有页表映射机制和异常处理机制的操作系统内核和VMM内核中。图1为本发明实施例适用的一种服务器的结构示意图,如图1所示,服务器包括:内存、处理器和NVM,处理器主要指中央处理单元(Central Processing Unit,简称CPU),处理器中包括内存管理模块和异常处理模块。内存管理模块:实现虚拟机或进程虚拟地址空间和物理地址空间映射。异常处理模块用于对CPU触发的异常进行相关处理,本发明实施例中用于对内存故障进行处理。内存和NVM为两个单独的物理硬件,NVM用于存储故障内存空间的物理地址,即使服务器下电后,NVM中保存的数据也不会丢失。内存一般为随机存取存储器(Random Access Memory,简称RAM)或动态随机存取存储器(Dynamic Random Access Memory,简称DRAM),内存一般以内存条形态存在,服务器的内存中单根内存条的大小在8GB,16GB以及以上,替换代价较高,虚拟机或进程为业务运行承载形式。The method of the embodiment of the invention is mainly applied to an operating system kernel and a VMM kernel having a page table mapping mechanism and an exception handling mechanism. FIG. 1 is a schematic structural diagram of a server according to an embodiment of the present invention. As shown in FIG. 1 , the server includes: a memory, a processor, and an NVM. The processor mainly refers to a central processing unit (CPU), and a processor. It includes a memory management module and an exception handling module. Memory Management Module: Implement virtual machine or process virtual address space and physical address space mapping. The exception handling module is configured to perform related processing on the abnormality triggered by the CPU, and is used to process the memory fault in the embodiment of the present invention. The memory and NVM are two separate physical hardware. The NVM is used to store the physical address of the faulty memory space. Even if the server is powered off, the data stored in the NVM will not be lost. The memory is generally a random access memory (RAM) or a dynamic random access memory (DRAM). The memory generally exists in the form of a memory stick. The size of a single memory stick in the memory of the server is 8GB, 16GB and above, the replacement cost is high, and the virtual machine or process is a business running bearer.
图2为本发明实施例一提供的一种内存故障隔离装置的结构示意图,本实施例提供的内存故障隔离装置可以集成在服务器中,如图2所示,本实施例提供的内存故障隔离装置包括:异常处理模块11和内存管理模块12。2 is a schematic structural diagram of a memory fault isolation device according to Embodiment 1 of the present invention. The memory fault isolation device provided in this embodiment may be integrated in a server. As shown in FIG. 2, the memory fault isolation device provided in this embodiment is provided. The method includes an exception processing module 11 and a memory management module 12.
其中,异常处理模块11,用于监测进程的虚拟地址所映射的原物理地址所标识的内存的状态,其中,虚拟地址所属的页面与原物理地址所属的物理地址块存在映射关系,物理地址块用于标识分配给进程的一段连续内存区间。The exception processing module 11 is configured to monitor the state of the memory identified by the original physical address mapped by the virtual address of the process, where the page to which the virtual address belongs has a mapping relationship with the physical address block to which the original physical address belongs, and the physical address block Used to identify a contiguous memory interval allocated to a process.
若原物理地址所标识的内存出现故障,则异常处理模块11还用于将原物理地址所属的物理地址块的状态标记为故障。If the memory identified by the original physical address fails, the exception handling module 11 is further configured to mark the status of the physical address block to which the original physical address belongs as a fault.
内存管理模块12,用于从非故障内存中为虚拟地址所属的页面块重新分配物理地址块;a memory management module 12, configured to re-allocate a physical address block from a non-faulty memory for a page block to which the virtual address belongs;
异常处理模块11,还用于将原物理地址所属的物理地址块所标识的内存区间中的数据同步到重新分配的物理地址块所标识的内存区间中。The exception handling module 11 is further configured to synchronize data in the memory interval identified by the physical address block to which the original physical address belongs to the memory interval identified by the reallocated physical address block.
当用户启动程序时,服务器的操作系统为该应用程序创建进程,内存 管理模块12还用于为该进程分配初始内存空间,即为该进程分配页面,在页表映射机制中,每个页面包括一段虚拟地址区间,每个页面对应一个物理地址块,每个物理地址块用于标识一段连续内存区间,页面中的每个虚拟地址与物理地址块中的物理地址一一对应,虚拟地址和物理地址的映射关系由服务器的内存管理模块12进行维护。When the user starts the program, the server's operating system creates a process for the application, memory The management module 12 is further configured to allocate an initial memory space for the process, that is, allocate a page for the process. In the page table mapping mechanism, each page includes a virtual address interval, and each page corresponds to one physical address block, and each physical address The block is used to identify a contiguous memory interval. Each virtual address in the page corresponds to a physical address in the physical address block, and the mapping relationship between the virtual address and the physical address is maintained by the memory management module 12 of the server.
本实施例中,内存管理模块12为该进程分配初始内存具体为:首先读取NVM中保存的故障物理地址块的信息,然后,根据故障物理地址块的信息确定非故障内存,非故障内存为内存中除故障物理地址块之外的内存;最后,从非故障内存中为该进程分配初始内存。本实施例中,内存管理模块12以页表的形式实现虚拟地址和物理地址的映射,页表组织形式以1级或多级形式进行管理,例如在Linux内核采用3级页表管理方式,每个页面的大小可以为4K、2M或1gbps等,本实施例,并不对页表的管理形式进行限制。In this embodiment, the memory management module 12 allocates initial memory for the process, which is: first, the information of the fault physical address block saved in the NVM is read, and then, the non-fault memory is determined according to the information of the fault physical address block, and the non-fault memory is Memory in memory other than the failed physical address block; finally, the initial memory is allocated to the process from non-faulty memory. In this embodiment, the memory management module 12 implements the mapping of the virtual address and the physical address in the form of a page table, and the page table organization form is managed in a level 1 or multi-level manner, for example, the Linux kernel adopts a level 3 page table management method, and each The size of the page may be 4K, 2M or 1gbps, etc. In this embodiment, the management form of the page table is not limited.
内存管理模块12通过读取NVM中的故障物理地址块的信息,在为该进程分配初始内存时,能够避开故障物理地址块所标识的内存。在该进程后续运行过程中,异常处理模块11监测该进程的虚拟地址所映射的原物理地址所标识的内存的状态,原物理地址所标识的内存的状态包括:故障状态和正常状态。具体地,内存管理模块12接收该进程发送的内存访问请求,该内存访问请求中包括该进程的虚拟地址,内存管理模块12将该虚拟地址映射为原物理地址,并将虚拟地址和原物理地址的对应关系存储到页表缓存(Translation Look-aside Buffer,简称TLB)中。然后,内存管理模块12将原物理地址通过内存总线发送给内存控制器,内存控制器根据原物理地址读取数据,若出现异常无法读取原物理地址所标识的内存中的数据,则内存控制器通过内存总线发出一个异常访问指令,异常处理模块11根据该异常访问指令确定原物理地址所标识的内存出现故障。The memory management module 12 can avoid the memory identified by the failed physical address block when the initial memory is allocated for the process by reading the information of the failed physical address block in the NVM. During the subsequent running of the process, the exception handling module 11 monitors the state of the memory identified by the original physical address mapped by the virtual address of the process, and the state of the memory identified by the original physical address includes: a fault state and a normal state. Specifically, the memory management module 12 receives the memory access request sent by the process, where the memory access request includes the virtual address of the process, and the memory management module 12 maps the virtual address to the original physical address, and the virtual address and the original physical address. The correspondence is stored in the Translation Look-aside Buffer (TLB). Then, the memory management module 12 sends the original physical address to the memory controller through the memory bus, and the memory controller reads the data according to the original physical address, and if the abnormality cannot read the data in the memory identified by the original physical address, the memory control The device issues an exception access instruction through the memory bus, and the exception handling module 11 determines, according to the abnormal access instruction, that the memory identified by the original physical address is faulty.
在页表映射机制中通常以一个页表作为最小操作单元,而一个页表对应一个物理地址块,因此,当原物理地址所标识的内存出现故障时,异常处理模块11将原物理地址所属的物理地址块的状态标记为故障,将原物理地址所属的物理地址块隔离。通常故障物理地址块的信息被记录在NVM中,这样即使服务器掉电,NVM中保存的故障物理地址块的信息也 不会丢失,服务器在上电后,内存管理模块12仍可以从NVM中读取到故障物理地址块的信息,在为进程分配初始内存时,可以避开故障物理地址块所标识的内存区间。In the page table mapping mechanism, a page table is usually used as the minimum operation unit, and a page table corresponds to a physical address block. Therefore, when the memory identified by the original physical address fails, the exception handling module 11 belongs to the original physical address. The status of the physical address block is marked as a failure, isolating the physical address block to which the original physical address belongs. Usually the information of the failed physical address block is recorded in the NVM, so that even if the server is powered down, the information of the failed physical address block stored in the NVM is also It will not be lost. After the server is powered on, the memory management module 12 can still read the information of the fault physical address block from the NVM. When the initial memory is allocated for the process, the memory interval identified by the fault physical address block can be avoided.
本实施例中,为了保证正在运行的业务不中断,内存管理模块12为该虚拟地址所属的页面重新分配物理地址块,分配前后该虚拟地址所映射的物理地址发生了变化,但是,对于上层应用来说,该应用对应的进程的虚拟地址没有发生变化,只要虚拟地址不变,进程就不会中断,从而保证了用户的业务不中断。In this embodiment, in order to ensure that the running service is not interrupted, the memory management module 12 reallocates the physical address block for the page to which the virtual address belongs, and the physical address mapped by the virtual address changes before and after the allocation, but for the upper application In this case, the virtual address of the process corresponding to the application does not change. As long as the virtual address does not change, the process is not interrupted, thereby ensuring that the user's service is not interrupted.
内存管理模块12具体通过如下方式为虚拟地址所属的页面重新分配物理地址块:首先,根据NVM中保存的故障物理地址块的信息,确定非故障内存,非故障内存为内存中除故障物理地址块所标识的内存之外的内存。然后,根据虚拟地址和进程的进程号,从非故障内存中为虚拟地址所属的页面重新分配物理地址块。其中,内存管理模块12根据虚拟地址和进程的进程号,从非故障内存中为虚拟地址所属的页面重新分配物理地址块,具体为:首先,根据虚拟地址和进程的进程号获取虚拟地址所属的页面;然后,从非故障内存中选取物理地址块,建立虚拟地址所属的页面到选取的物理地址块的映射关系。The memory management module 12 specifically allocates a physical address block for the page to which the virtual address belongs by first: first, determining non-faulty memory according to the information of the failed physical address block stored in the NVM, and the non-faulty memory is the memory-depleted physical address block in the memory. Memory outside of the identified memory. Then, based on the virtual address and the process number of the process, the physical address block is reassigned from the non-faulty memory for the page to which the virtual address belongs. The memory management module 12 re-allocates the physical address block from the non-fault memory to the page to which the virtual address belongs according to the virtual address and the process ID of the process. Specifically, the virtual address belongs to the virtual address and the process ID of the process. The page is then selected from the non-faulty memory to establish a mapping relationship between the page to which the virtual address belongs and the selected physical address block.
内存管理模块12在为虚拟地址所属的页面块重新分配好物理地址块之后,异常处理模块11还用于将原物理地址所属的物理地址块所标识的内存区间中的数据同步到重新分配的物理地址块所标识的内存区间中。具体地,当原物理地址所标识的内存出现故障时,内存控制器会产生异常指令,异常处理模块11根据异常访问指令进行数据恢复,异常访问指令中包含操作码和操作数,操作码表示该异常访问指令的操作类型为读操作还是写操作,操作数包括读操作或写操作要访问的寄存器的信息以及读操作或写操作要访问的数据的物理地址。若操作类型为写操作,那么异常处理模块11将需要写入原物理地址所标识的内存中的待写入数据写入重新分配的物理地址块中的对应位置。若操作类型为读操作,那么异常处理模块11可以根据原物理地址所标识的内存中的待读取数据的备份数据对待读取数据进行数据恢复,若根据备份数据能够恢复出待读取数据,那么将恢复得到的待读取数据复制到重新分配的物理地址块中的对应位置。若待读 取数据无法恢复,那么异常处理模块11对进程进行复位。和现有技术不同,本实施例中,只需要对当前监测的进程进行复位,而不会中断服务器上运行的其他进程。现有技术中,一旦某个进程在运行过程中出现内存故障,需要对服务器进行复位,服务器上运行的所有进程都会中断,导致所有业务都中断。After the memory management module 12 reallocates the physical address block for the page block to which the virtual address belongs, the exception processing module 11 is further configured to synchronize the data in the memory interval identified by the physical address block to which the original physical address belongs to the reallocated physical unit. In the memory interval identified by the address block. Specifically, when the memory identified by the original physical address fails, the memory controller generates an abnormal instruction, and the exception processing module 11 performs data recovery according to the abnormal access instruction, where the abnormal access instruction includes an operation code and an operand, and the operation code indicates the The operation type of the exception access instruction is a read operation or a write operation, and the operand includes information of a register to be accessed by a read operation or a write operation, and a physical address of data to be accessed by a read operation or a write operation. If the operation type is a write operation, the exception handling module 11 writes the data to be written in the memory identified by the original physical address into the corresponding location in the reallocated physical address block. If the operation type is a read operation, the exception processing module 11 can perform data recovery on the data to be read according to the backup data of the data to be read in the memory identified by the original physical address, and if the data to be read can be recovered according to the backup data, Then, the recovered data to be read is copied to the corresponding position in the reallocated physical address block. If you want to read If the data cannot be recovered, the exception handling module 11 resets the process. Different from the prior art, in this embodiment, only the currently monitored process needs to be reset without interrupting other processes running on the server. In the prior art, once a process has a memory failure during operation, the server needs to be reset, and all processes running on the server are interrupted, causing all services to be interrupted.
需要说明的是,本实施例中,虽然将原物理地址块的状态标记为故障,但实际上原物理地址块所标识的内存区间中,只有原物理地址所标识的内存出现故障,原物理地址块所标识的内存区间中的其他内存正常,因此,异常处理模块11将原物理地址所属的物理地址块所标识的内存区间中的数据同步到重新分配的物理地址块所标识的内存区间时,对于正常内存中的数据可以直接从原物理地址块所标识的内存区间中复制到重新分配的物理地址块所标识的内存区间中。It should be noted that, in this embodiment, although the status of the original physical address block is marked as a fault, in the memory interval identified by the original physical address block, only the memory identified by the original physical address is faulty, and the original physical address block is faulty. The other memory in the identified memory interval is normal. Therefore, when the exception processing module 11 synchronizes the data in the memory interval identified by the physical address block to which the original physical address belongs to the memory interval identified by the reallocated physical address block, The data in the normal memory can be directly copied from the memory interval identified by the original physical address block to the memory interval identified by the reallocated physical address block.
可选的,若服务器重新启动,异常处理模块11还用于对内存进行故障检测,具体检测NVM中的故障内存块是否已经恢复正常,若检测到原物理地址所属的物理地址块已经恢复正常,则异常处理模块11将原物理地址所属的物理地址块的状态标记为非故障,原物理地址块所标识的内存区间可用于内存分配。若原物理地址所属的物理地址块未能恢复,则异常处理模块11永久隔离原物理地址块,原物理地址块所标识的内存区间不能用于内存分配。Optionally, if the server is restarted, the exception processing module 11 is further configured to perform fault detection on the memory, and specifically detect whether the faulty memory block in the NVM has returned to normal. If it is detected that the physical address block to which the original physical address belongs has returned to normal, Then, the exception handling module 11 marks the state of the physical address block to which the original physical address belongs as non-fault, and the memory interval identified by the original physical address block can be used for memory allocation. If the physical address block to which the original physical address belongs cannot be recovered, the exception handling module 11 permanently isolates the original physical address block, and the memory interval identified by the original physical address block cannot be used for memory allocation.
本实施例的装置,在进程运行过程中,异常处理模块监测该进程的虚拟地址所映射的原物理地址所标识的内存的状态,若原物理地址所标识的内存出现故障,则异常处理模块将原物理地址所属的物理地址块的状态标记为故障,以实现在线隔离故障内存空间。并且异常处理模块调用内存管理模块为虚拟地址所属的页面分配物理地址块,同时将原物理地址所属的物理地址块所标识的内存区间中的数据同步到重新分配的物理地址块所标识的内存区间中。本实施例的装置,在整个过程中该进程的虚拟地址不变,从而保证业务不中断,实现在线对故障内存空间进行隔离。In the device of this embodiment, during the running of the process, the exception handling module monitors the state of the memory identified by the original physical address mapped by the virtual address of the process. If the memory identified by the original physical address fails, the exception handling module will The status of the physical address block to which the physical address belongs is marked as a fault to isolate the faulty memory space online. And the exception handling module calls the memory management module to allocate a physical address block for the page to which the virtual address belongs, and synchronizes the data in the memory interval identified by the physical address block to which the original physical address belongs to the memory interval identified by the reallocated physical address block. in. In the device of this embodiment, the virtual address of the process is unchanged during the whole process, so that the service is not interrupted, and the faulty memory space is isolated online.
图3为本发明实施例二提供的一种内存故障隔离方法的流程图,本实施例的方法由服务器执行,如图2所示,本实施例的方法可以包括以下步骤: FIG. 3 is a flowchart of a memory fault isolation method according to Embodiment 2 of the present invention. The method in this embodiment is performed by a server. As shown in FIG. 2, the method in this embodiment may include the following steps:
步骤101、监测进程的虚拟地址所映射的原物理地址所标识的内存的状态,其中,虚拟地址所属的页面与原物理地址所属的物理地址块存在映射关系,物理地址块用于标识分配给进程的一段连续内存区间。Step 101: Monitor the state of the memory identified by the original physical address mapped by the virtual address of the process, where the page to which the virtual address belongs has a mapping relationship with the physical address block to which the original physical address belongs, and the physical address block is used to identify the process assigned to the process. A continuous memory interval.
当用户启动程序时,服务器的操作系统为该应用程序创建进程,服务器为该进程分配初始内存空间。服务器为该进程分配初始内存具体为:首先读取NVM中保存的故障物理地址块的信息,然后,根据故障物理地址块的信息确定非故障内存,非故障内存为内存中除故障物理地址块之外的内存;最后,从非故障内存中为该进程分配初始内存。本实施例中,内存管理模块以页表的形式实现虚拟地址和物理地址的映射,页表组织形式以1级或多级形式进行管理,例如在Linux内核采用3级页表管理方式,每个页面的大小可以为4K、2M或1gbps等,本实施例,并不对页表的管理形式进行限制。When a user launches a program, the server's operating system creates a process for the application, and the server allocates an initial memory space for the process. The server allocates initial memory for the process, which is: first, the information of the fault physical address block saved in the NVM is read, and then the non-fault memory is determined according to the information of the fault physical address block, and the non-fault memory is the fault physical address block in the memory. External memory; finally, the initial memory is allocated to the process from non-faulty memory. In this embodiment, the memory management module implements mapping of the virtual address and the physical address in the form of a page table, and the page table organization form is managed in a level 1 or multi-level manner, for example, the Linux kernel adopts a level 3 page table management method, and each The size of the page may be 4K, 2M or 1gbps, etc. In this embodiment, the management form of the page table is not limited.
本实施例中,服务器通过读取NVM中的故障物理地址块的信息,在为该进程分配初始内存时,能够避开故障物理地址块所标识的内存。在该进程后续运行过程中,服务器监测该进程的虚拟地址所映射的原物理地址所标识的内存的状态,原物理地址所标识的内存的状态包括:故障状态和正常状态。具体地,服务器的内存管理模块接收该进程发送的内存访问请求,该内存访问请求中包括该进程的虚拟地址,内存管理模块将该虚拟地址映射为原物理地址,并将虚拟地址和原物理地址的对应关系存储到TLB中。然后,内存管理模块将原物理地址通过内存总线发送给内存控制器,内存控制器根据原物理地址读取数据,若出现异常无法读取原物理地址所标识的内存中的数据,则内存控制器通过内存总线发出一个异常访问指令,服务器根据该异常访问指令确定原物理地址所标识的内存出现故障。In this embodiment, by reading the information of the fault physical address block in the NVM, the server can avoid the memory identified by the fault physical address block when the initial memory is allocated for the process. During the subsequent running of the process, the server monitors the state of the memory identified by the original physical address mapped by the virtual address of the process, and the state of the memory identified by the original physical address includes: a fault state and a normal state. Specifically, the memory management module of the server receives the memory access request sent by the process, where the memory access request includes a virtual address of the process, and the memory management module maps the virtual address to the original physical address, and the virtual address and the original physical address are The corresponding relationship is stored in the TLB. Then, the memory management module sends the original physical address to the memory controller through the memory bus, and the memory controller reads the data according to the original physical address, and if the abnormality cannot read the data in the memory identified by the original physical address, the memory controller An abnormal access instruction is issued through the memory bus, and the server determines, according to the abnormal access instruction, that the memory identified by the original physical address is faulty.
步骤102、若原物理地址所标识的内存出现故障,则将原物理地址所属的物理地址块的状态标记为故障。Step 102: If the memory identified by the original physical address fails, the status of the physical address block to which the original physical address belongs is marked as a fault.
本实施例中,当原物理地址所标识的内存出现故障时,服务器将原物理地址所属的物理地址块的状态标记为故障,将原物理地址所属的物理地址块隔离。通常故障物理地址块的信息被记录在NVM中,这样即使服务器掉电,NVM中保存的故障物理地址块的信息也不会丢失,服务器在上电后,仍可以从NVM中读取到故障物理地址块的信息,在为进程分配初 始内存时,可以避开故障物理地址块所标识的内存区间。In this embodiment, when the memory identified by the original physical address fails, the server marks the status of the physical address block to which the original physical address belongs as a fault, and isolates the physical address block to which the original physical address belongs. Usually, the information of the fault physical address block is recorded in the NVM, so that even if the server is powered off, the information of the fault physical address block saved in the NVM is not lost, and the server can still read the fault physics from the NVM after power-on. Address block information, at the beginning of the process allocation When starting memory, you can avoid the memory interval identified by the failed physical address block.
步骤103、从非故障内存中为虚拟地址所属的页面重新分配物理地址块,并将原物理地址所属的物理地址块所标识的内存区间中的数据同步到重新分配的物理地址块所标识的内存区间中。Step 103: Reassign the physical address block from the non-fault memory to the page to which the virtual address belongs, and synchronize the data in the memory interval identified by the physical address block to which the original physical address belongs to the memory identified by the reallocated physical address block. In the interval.
本实施例中,为了保证正在运行的业务不中断,服务器为该虚拟地址所属的页面重新分配物理地址块,分配前后该虚拟地址所映射的物理地址发生了变化,但是,对于上层应用来说,该应用对应的进程的虚拟地址没有发生变化,只要虚拟地址不变,进程就不会中断,从而保证了用户的业务不中断。In this embodiment, in order to ensure that the running service is not interrupted, the server re-allocates the physical address block to the page to which the virtual address belongs, and the physical address mapped by the virtual address changes before and after the allocation, but for the upper layer application, The virtual address of the process corresponding to the application does not change. As long as the virtual address does not change, the process will not be interrupted, thereby ensuring that the user's service is not interrupted.
具体地,服务器通过如下方式为虚拟地址所属的页面重新分配物理地址块:首先,根据NVM中保存的故障物理地址块的信息,确定非故障内存,非故障内存为内存中除故障物理地址块所标识的内存之外的内存。然后,根据虚拟地址和进程的进程号,从非故障内存中为虚拟地址所属的页面重新分配物理地址块。其中,服务器根据虚拟地址和进程的进程号,从非故障内存中为虚拟地址所属的页面重新分配物理地址块,具体为:首先,根据虚拟地址和进程的进程号获取虚拟地址所属的页面;然后,从非故障内存中选取物理地址块,建立虚拟地址所属的页面到选取的物理地址块的映射关系。Specifically, the server re-allocates the physical address block to the page to which the virtual address belongs by first: first, determining the non-faulty memory according to the information of the failed physical address block saved in the NVM, and the non-faulty memory is the block of the physical address in the memory. Memory outside of the identified memory. Then, based on the virtual address and the process number of the process, the physical address block is reassigned from the non-faulty memory for the page to which the virtual address belongs. The server re-allocates the physical address block from the non-faulty memory to the page to which the virtual address belongs according to the virtual address and the process ID of the process, where: first, the page to which the virtual address belongs is obtained according to the virtual address and the process ID of the process; The physical address block is selected from the non-faulty memory to establish a mapping relationship between the page to which the virtual address belongs and the selected physical address block.
服务器在为虚拟地址所属的页面块重新分配好物理地址块之后,将原物理地址所属的物理地址块所标识的内存区间中的数据同步到重新分配的物理地址块所标识的内存区间中。具体同步方式请参照实施例一的相关描述,这里不再赘述。After reallocating the physical address block for the page block to which the virtual address belongs, the server synchronizes the data in the memory interval identified by the physical address block to which the original physical address belongs to the memory interval identified by the reallocated physical address block. For the specific synchronization mode, refer to the related description of the first embodiment, and details are not described herein again.
本实施例的方法,在进程运行过程中,服务器监测该进程的虚拟地址所映射的原物理地址所标识的内存的状态,若原物理地址所标识的内存出现故障,则服务器将原物理地址所属的物理地址块的状态标记为故障,以实现在线隔离故障内存空间。并且服务器为虚拟地址所属的页面分配物理地址块,将原物理地址所属的物理地址块所标识的内存区间中的数据同步到重新分配的物理地址块所标识的内存区间中。本实施例的方法,在整个过程中该进程的虚拟地址不变,从而保证业务不中断,实现在线对故障内存空间进行隔离。 In the method of the embodiment, during the running of the process, the server monitors the state of the memory identified by the original physical address mapped by the virtual address of the process. If the memory identified by the original physical address fails, the server belongs to the original physical address. The status of the physical address block is marked as faulty to isolate the faulty memory space online. And the server allocates a physical address block to the page to which the virtual address belongs, and synchronizes the data in the memory interval identified by the physical address block to which the original physical address belongs to the memory interval identified by the reallocated physical address block. In the method of this embodiment, the virtual address of the process is unchanged during the whole process, so that the service is not interrupted, and the faulty memory space is isolated online.
在实施例二的基础上,若服务器重新启动,服务器对内存进行故障检测,服务器具体检测NVM中的故障内存块是否已经恢复正常,若服务器检测到原物理地址所属的物理地址块已经恢复正常,则服务器将原物理地址所属的物理地址块的状态标记为非故障,原物理地址块所标识的内存区间可用于内存分配。若原物理地址所属的物理地址块未能恢复,则服务器永久隔离原物理地址块,原物理地址块所标识的内存区间不能用于内存分配。On the basis of the second embodiment, if the server is restarted, the server detects the fault of the memory, and the server specifically detects whether the faulty memory block in the NVM has returned to normal. If the server detects that the physical address block to which the original physical address belongs has returned to normal, Then, the server marks the status of the physical address block to which the original physical address belongs as non-fault, and the memory interval identified by the original physical address block can be used for memory allocation. If the physical address block to which the original physical address belongs cannot be recovered, the server permanently isolates the original physical address block, and the memory interval identified by the original physical address block cannot be used for memory allocation.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。A person skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by using hardware related to the program instructions. The foregoing program may be stored in a computer readable storage medium, and the program is executed when executed. The foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。 Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present invention. range.

Claims (10)

  1. 一种内存故障隔离装置,其特征在于,包括:A memory fault isolation device, comprising:
    异常处理模块,用于监测进程的虚拟地址所映射的原物理地址所标识的内存的状态,其中,所述虚拟地址所属的页面与所述原物理地址所属的物理地址块存在映射关系,所述物理地址块用于标识分配给所述进程的一段连续内存区间;An exception processing module, configured to monitor a state of a memory that is identified by a physical address mapped by a virtual address of the process, where a page to which the virtual address belongs has a mapping relationship with a physical address block to which the original physical address belongs, The physical address block is used to identify a contiguous memory interval allocated to the process;
    若所述原物理地址所标识的内存出现故障,所述异常处理模块还用于将所述原物理地址所属的物理地址块的状态标记为故障;The exception handling module is further configured to mark a status of the physical address block to which the original physical address belongs as a fault, if the memory identified by the original physical address is faulty;
    内存管理模块,用于从非故障内存中为所述虚拟地址所属的页面块重新分配物理地址块;a memory management module, configured to re-allocate a physical address block from a non-faulty memory for a page block to which the virtual address belongs;
    所述异常处理模块,还用于将所述原物理地址所属的物理地址块所标识的内存区间中的数据同步到所述重新分配的物理地址块所标识的内存区间中。The exception processing module is further configured to synchronize data in a memory interval identified by a physical address block to which the original physical address belongs to a memory interval identified by the reallocated physical address block.
  2. 根据权利要求1所述的装置,其特征在于,所述异常处理模块还用于:将所述标记为故障的物理地址块的信息保存到非易失存储器中;The apparatus according to claim 1, wherein the exception processing module is further configured to: save the information of the physical address block marked as a fault into a non-volatile memory;
    所述内存管理模块具体用于:The memory management module is specifically configured to:
    根据所述非易失存储器中保存的故障物理地址块的信息,确定所述非故障内存;Determining the non-faulty memory according to information of a failed physical address block stored in the nonvolatile memory;
    根据所述虚拟地址和所述进程的进程号,从所述非故障内存中为所述虚拟地址所属的页面重新分配物理地址块。Reassigning a physical address block from the non-faulty memory to a page to which the virtual address belongs according to the virtual address and the process ID of the process.
  3. 根据权利要求2所述的装置,其特征在于,所述内存管理模块具体用于:The device according to claim 2, wherein the memory management module is specifically configured to:
    根据所述虚拟地址和所述进程的进程号获取述虚拟地址所属的页面;Obtaining, according to the virtual address and a process ID of the process, a page to which the virtual address belongs;
    从所述非故障内存中选取物理地址块,建立所述虚拟地址所属的页面到所述选取的物理地址块的映射关系。And selecting a physical address block from the non-faulty memory, and establishing a mapping relationship between the page to which the virtual address belongs to the selected physical address block.
  4. 根据权利要求1所述的装置,其特征在于,所述内存管理模块还用于:The device according to claim 1, wherein the memory management module is further configured to:
    当为所述进程分配初始内存时,读取非易失存储器中保存的故障物理地址块的信息;Reading information of a failed physical address block held in the nonvolatile memory when initial memory is allocated to the process;
    根据所述故障物理地址块的信息,确定非故障内存,所述非故障内存 为所述内存中除所述故障物理地址块之外的内存;Determining non-faulty memory according to the information of the fault physical address block, the non-faulty memory a memory other than the fault physical address block in the memory;
    从所述非故障内存中为所述进程分配所述初始内存。The initial memory is allocated to the process from the non-faulty memory.
  5. 根据权利要求1-4中任一项所述的装置,其特征在于,所述异常处理模块还用于:The apparatus according to any one of claims 1 to 4, wherein the exception processing module is further configured to:
    当服务器重新启动时,对所述内存进行故障检测;When the server is restarted, the memory is fault detected;
    若检测到所述原物理地址所属的物理地址块已经恢复正常,则将所述原物理地址所属的物理地址块的状态标记为非故障。If it is detected that the physical address block to which the original physical address belongs has returned to normal, the status of the physical address block to which the original physical address belongs is marked as non-fault.
  6. 一种内存故障隔离方法,其特征在于,包括:A memory fault isolation method, comprising:
    监测进程的虚拟地址所映射的原物理地址所标识的内存的状态,其中,所述虚拟地址所属的页面与所述原物理地址所属的物理地址块存在映射关系,所述物理地址块用于标识分配给所述进程的一段连续内存区间;The state of the memory identified by the original physical address to which the virtual address of the process is mapped, wherein the page to which the virtual address belongs has a mapping relationship with the physical address block to which the original physical address belongs, and the physical address block is used to identify a contiguous memory interval allocated to the process;
    若所述原物理地址所标识的内存出现故障,则将所述原物理地址所属的物理地址块的状态标记为故障;If the memory identified by the original physical address fails, the status of the physical address block to which the original physical address belongs is marked as a fault;
    从非故障内存中为所述虚拟地址所属的页面块重新分配物理地址块,并将所述原物理地址所属的物理地址块所标识的内存区间中的数据同步到所述重新分配的物理地址块所标识的内存区间中。Relocating a physical address block from the non-faulty memory for the page block to which the virtual address belongs, and synchronizing data in the memory interval identified by the physical address block to which the original physical address belongs to the reallocated physical address block In the identified memory interval.
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:The method of claim 6 wherein the method further comprises:
    将所述标记为故障的物理地址块的信息保存到非易失存储器中;Saving the information of the physical address block marked as failed to the non-volatile memory;
    所述从非故障内存中为所述虚拟地址所属的页面重新分配物理地址块,包括:Reassigning a physical address block to the page to which the virtual address belongs from the non-faulty memory includes:
    根据所述非易失存储器中保存的故障物理地址块的信息,确定所述非故障内存;Determining the non-faulty memory according to information of a failed physical address block stored in the nonvolatile memory;
    根据所述虚拟地址和所述进程的进程号,从所述非故障内存中为所述虚拟地址所属的页面重新分配物理地址块。Reassigning a physical address block from the non-faulty memory to a page to which the virtual address belongs according to the virtual address and the process ID of the process.
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述虚拟地址和所述进程的进程号,从所述非故障内存中为所述虚拟地址所属的页面重新分配物理地址块,包括:The method according to claim 7, wherein the reallocating the physical address block for the page to which the virtual address belongs from the non-faulty memory according to the virtual address and the process number of the process, including :
    根据所述虚拟地址和所述进程的进程号获取述虚拟地址所属的页面;Obtaining, according to the virtual address and a process ID of the process, a page to which the virtual address belongs;
    从所述非故障内存中选取物理地址块,建立所述虚拟地址所属的页面到所述选取的物理地址块的映射关系。 And selecting a physical address block from the non-faulty memory, and establishing a mapping relationship between the page to which the virtual address belongs to the selected physical address block.
  9. 根据权利要求6所述的方法,其特征在于,所述方法还包括:The method of claim 6 wherein the method further comprises:
    当为所述进程分配初始内存时,读取非易失存储器中保存的故障物理地址块的信息;Reading information of a failed physical address block held in the nonvolatile memory when initial memory is allocated to the process;
    根据所述故障物理地址块的信息,确定非故障内存,所述非故障内存为所述内存中除所述故障物理地址块之外的内存;Determining non-faulty memory according to the information of the fault physical address block, where the non-fault memory is a memory other than the fault physical address block in the memory;
    从所述非故障内存中为所述进程分配所述初始内存。The initial memory is allocated to the process from the non-faulty memory.
  10. 根据权利要求6-9中任一项所述的方法,其特征在于,所述方法还包括:The method of any of claims 6-9, wherein the method further comprises:
    当服务器重新启动时,对所述内存进行故障检测;When the server is restarted, the memory is fault detected;
    若检测到所述原物理地址所属的物理地址块已经恢复正常,则将所述原物理地址所属的物理地址块的状态标记为非故障。 If it is detected that the physical address block to which the original physical address belongs has returned to normal, the status of the physical address block to which the original physical address belongs is marked as non-fault.
PCT/CN2015/071008 2015-01-19 2015-01-19 Memory fault isolation method and device WO2016115661A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2015/071008 WO2016115661A1 (en) 2015-01-19 2015-01-19 Memory fault isolation method and device
CN201580011928.2A CN106133704A (en) 2015-01-19 2015-01-19 Memory failure partition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/071008 WO2016115661A1 (en) 2015-01-19 2015-01-19 Memory fault isolation method and device

Publications (1)

Publication Number Publication Date
WO2016115661A1 true WO2016115661A1 (en) 2016-07-28

Family

ID=56416247

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/071008 WO2016115661A1 (en) 2015-01-19 2015-01-19 Memory fault isolation method and device

Country Status (2)

Country Link
CN (1) CN106133704A (en)
WO (1) WO2016115661A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532124A (en) * 2019-09-06 2019-12-03 西安易朴通讯技术有限公司 Memory partition method and device
WO2021185279A1 (en) * 2020-03-20 2021-09-23 华为技术有限公司 Memory failure processing method and related device
CN113515405A (en) * 2021-07-09 2021-10-19 维沃移动通信有限公司 Address management method and device
CN114020525A (en) * 2021-10-21 2022-02-08 苏州浪潮智能科技有限公司 Fault isolation method, device, equipment and storage medium
CN114461436A (en) * 2022-04-08 2022-05-10 苏州浪潮智能科技有限公司 Memory fault processing method and device and computer readable storage medium
CN115617274A (en) * 2022-10-27 2023-01-17 亿铸科技(杭州)有限责任公司 Memory computing device with bad block management function and operation method

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108579093B (en) * 2018-05-10 2023-11-03 腾讯科技(上海)有限公司 Method, device and readable medium for protecting operation of target process
CN110858167B (en) * 2018-08-22 2023-06-27 阿里巴巴集团控股有限公司 Memory fault isolation method, device and equipment
CN109522122B (en) * 2018-11-14 2021-12-17 郑州云海信息技术有限公司 Memory management method, system, device and computer readable storage medium
CN109753378A (en) * 2019-01-02 2019-05-14 浪潮商用机器有限公司 A kind of partition method of memory failure, device, system and readable storage medium storing program for executing
CN114780473A (en) * 2022-05-18 2022-07-22 长鑫存储技术有限公司 Memory bank hot plug method and device and memory bank
CN115686901B (en) * 2022-10-25 2023-08-04 超聚变数字技术有限公司 Memory fault analysis method and computer equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090307563A1 (en) * 2008-06-05 2009-12-10 Ibm Corporation (Almaden Research Center) Replacing bad hard drive sectors using mram
CN102541676A (en) * 2011-12-22 2012-07-04 福建新大陆通信科技股份有限公司 Method for detecting and mapping states of NAND FLASH
CN103064804A (en) * 2012-12-13 2013-04-24 华为技术有限公司 Method and device for access control of disk data
CN103186471A (en) * 2011-12-30 2013-07-03 深圳市共进电子股份有限公司 Method and system for managing bad blocks in storage equipment
CN103778065A (en) * 2012-10-25 2014-05-07 北京兆易创新科技股份有限公司 Flash memory and bad block managing method thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103631721A (en) * 2012-08-23 2014-03-12 华为技术有限公司 Method and system for isolating bad blocks in internal storage

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090307563A1 (en) * 2008-06-05 2009-12-10 Ibm Corporation (Almaden Research Center) Replacing bad hard drive sectors using mram
CN102541676A (en) * 2011-12-22 2012-07-04 福建新大陆通信科技股份有限公司 Method for detecting and mapping states of NAND FLASH
CN103186471A (en) * 2011-12-30 2013-07-03 深圳市共进电子股份有限公司 Method and system for managing bad blocks in storage equipment
CN103778065A (en) * 2012-10-25 2014-05-07 北京兆易创新科技股份有限公司 Flash memory and bad block managing method thereof
CN103064804A (en) * 2012-12-13 2013-04-24 华为技术有限公司 Method and device for access control of disk data

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532124A (en) * 2019-09-06 2019-12-03 西安易朴通讯技术有限公司 Memory partition method and device
WO2021185279A1 (en) * 2020-03-20 2021-09-23 华为技术有限公司 Memory failure processing method and related device
CN113495799A (en) * 2020-03-20 2021-10-12 华为技术有限公司 Memory fault processing method and related equipment
CN113495799B (en) * 2020-03-20 2024-04-12 华为技术有限公司 Memory fault processing method and related equipment
CN113515405A (en) * 2021-07-09 2021-10-19 维沃移动通信有限公司 Address management method and device
CN114020525A (en) * 2021-10-21 2022-02-08 苏州浪潮智能科技有限公司 Fault isolation method, device, equipment and storage medium
CN114020525B (en) * 2021-10-21 2024-04-19 苏州浪潮智能科技有限公司 Fault isolation method, device, equipment and storage medium
CN114461436A (en) * 2022-04-08 2022-05-10 苏州浪潮智能科技有限公司 Memory fault processing method and device and computer readable storage medium
CN115617274A (en) * 2022-10-27 2023-01-17 亿铸科技(杭州)有限责任公司 Memory computing device with bad block management function and operation method

Also Published As

Publication number Publication date
CN106133704A (en) 2016-11-16

Similar Documents

Publication Publication Date Title
WO2016115661A1 (en) Memory fault isolation method and device
CN106776159B (en) Fast peripheral component interconnect network system with failover and method of operation
US7644252B2 (en) Multi-processor system and memory accessing method
US9501231B2 (en) Storage system and storage control method
US20100325471A1 (en) High availability support for virtual machines
EP2966571B1 (en) Method for migrating memory data and computer therefor
WO2013081616A1 (en) Hardware based memory migration and resilvering
US10235282B2 (en) Computer system, computer, and method to manage allocation of virtual and physical memory areas
JP2008269142A (en) Disk array device
US9646721B1 (en) Solid state drive bad block management
JP6165964B2 (en) calculator
WO2022193768A1 (en) Method for executing memory read-write instruction, and computing device
JP2011227766A (en) Management method for storage means, virtual computer system and program
JP6540334B2 (en) SYSTEM, INFORMATION PROCESSING DEVICE, AND INFORMATION PROCESSING METHOD
US11822419B2 (en) Error information processing method and device, and storage medium
US20170277632A1 (en) Virtual computer system control method and virtual computer system
US20190317861A1 (en) High availability state machine and recovery
CN109558068B (en) Data migration method and migration system
US11256585B2 (en) Storage system
EP2921965B1 (en) Information processing device and shared memory management method
JP2017033375A (en) Parallel calculation system, migration method, and migration program
WO2014024279A1 (en) Memory failure recovery device, method, and program
JP6677021B2 (en) Information processing apparatus, information processing method, and program
JP6175566B2 (en) Storage system and storage control method
JP6682897B2 (en) Communication setting method, communication setting program, information processing apparatus, and information processing system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15878339

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15878339

Country of ref document: EP

Kind code of ref document: A1