WO2023169185A1 - 内存管理方法和装置 - Google Patents

内存管理方法和装置 Download PDF

Info

Publication number
WO2023169185A1
WO2023169185A1 PCT/CN2023/077012 CN2023077012W WO2023169185A1 WO 2023169185 A1 WO2023169185 A1 WO 2023169185A1 CN 2023077012 W CN2023077012 W CN 2023077012W WO 2023169185 A1 WO2023169185 A1 WO 2023169185A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
partition
controller
memory unit
capacity
Prior art date
Application number
PCT/CN2023/077012
Other languages
English (en)
French (fr)
Inventor
屈欢
高军
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023169185A1 publication Critical patent/WO2023169185A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the embodiments of the present application relate to the field of storage technology, and specifically relate to a memory management method and device.
  • the functions of memory in the storage system can be divided into two types, one is used for the operation of upper-layer services, and the other is used for data caching of upper-layer services. Due to the influence of various environmental factors, various problems may occur with the memory in the storage system.
  • Embodiments of the present application provide a memory management method and device, which can reduce the impact of memory failures on upper-layer business operations after fault isolation occurs in the memory used for upper-layer business operations.
  • a memory management method includes: determining that a memory failure occurs in a first partition and the memory failure causes a cold reset of a first controller of a first computer device, the first controller including a second Partition and the first partition, the first partition is used for the operation of upper-layer services, and the second partition is used for data caching of the upper-layer services; isolate a fault memory area, and the fault memory area is the first partition The memory area in the partition where the memory failure occurs; redistribute the memory of the first controller to obtain a third partition and a fourth partition.
  • the third partition is used for the operation of the upper-layer service, and the fourth partition
  • the memory capacity of the third partition is greater than or equal to the memory capacity of the first partition, and the sum of the memory capacities of the fourth partition and the third partition is equal to the first partition.
  • the memory capacity of the controller minus the capacity of the failed memory area.
  • the faulty memory area may be a faulty memory module, memory particle, memory array, memory row or memory column in the first controller.
  • the faulty memory area can also be an area including normal memory and faulty memory.
  • the entire memory stick can be isolated.
  • the computer device may include at least one first partition and at least one second partition.
  • the first partition and the third partition are both fixed partitions, and the fixed partitions are used for the operation of upper-layer services, such as creating objects, creating data structures related to objects, etc.
  • the second partition and the fourth partition are both flowing partitions.
  • the flowing partition is an extended partition of the cache (cache).
  • Upper-layer business input/output (input/output, I/O) requests and metadata cache are applied from this partition.
  • a memory page is a partition that holds the main memory capacity of the controller.
  • the memory capacity of the fixed partition after the first controller is cold reset is not less than the memory capacity before the cold reset. That is, after a memory failure occurs in the fixed partition, the first controller cold reset reallocates the fixed partition and the mobile partition, and replaces the mobile partition with the fixed partition.
  • the memory capacity in is divided into fixed partitions. For example, the memory capacity of the first controller before cold reset is 100G, and there are three fixed partitions. The capacity of each fixed partition is 10G, and the size of the floating partition is 70G.
  • controller 0 If the capacity of the faulty memory area is 10G, The memory capacity of controller 0 is reduced to 90G, correspondingly reducing the memory capacity of the flow pool in the controller. That is, after the first controller is cold reset, the capacity of the three fixed partitions is still 10G each, and the size of the mobile partition is reduced to 60G. Optionally, after the first controller is cold reset, you can also set the capacity of three fixed partitions to 11G each, and the size of the mobile partition to 57G.
  • the embodiment of the present application can isolate the faulty memory area when a memory failure occurs in the fixed partition running the upper-layer business, so that the memory capacity of the fixed partition after the controller is cold reset is not less than the memory capacity before the cold reset, reducing the impact of the memory failure on the upper-layer business operation. Influence.
  • re-allocating the memory of the first controller to obtain a third partition and a fourth partition includes: determining a target processor, and the target processor
  • the processor is one of a plurality of processors included in the first controller, the target processor corresponds to a first memory unit, and the first memory unit is one of the plurality of memory units included in the first partition.
  • the capacity of the at least one second memory unit is not less than the capacity of the faulty memory area; reallocate the first controller according to the first memory unit and the at least one second memory unit.
  • memory, obtaining the third partition and the fourth partition, and the third partition includes the at least one second memory unit.
  • the faulty memory area is the smallest granular memory unit for memory isolation, and the memory unit is the smallest granular memory unit that can be allocated during partitioning.
  • the memory unit corresponding to the processor is the local memory of the processor, and the speed of the processor accessing the corresponding memory unit is much faster than the speed of accessing the non-corresponding memory unit.
  • the memory unit corresponding to the central processing unit (CPU) 1 in the first partition is the first memory unit
  • the memory unit corresponding to CPU1 in the second partition is the second memory unit
  • the memory unit corresponding to CPU2 in the second partition The memory unit is the sixth memory unit. Then the speed of CPU1 accessing the first memory unit and the second memory unit is much faster than that of accessing the sixth memory unit.
  • the memory in the second memory unit in the second partition can be divided into the first memory unit after the first controller is cold reset.
  • the third memory unit is the first memory unit updated after the first controller is cold reset
  • the fourth memory unit is the second memory unit updated after the first controller is cold reset.
  • the embodiment of the present application can isolate the faulty memory area when a memory fault occurs in the fixed partition running upper-layer services, confirm the processor corresponding to the memory unit where the faulty memory area is located, and divide the memory of the memory unit in the flowing partition corresponding to the processor.
  • Give a fixed partition so that the memory capacity of the memory unit corresponding to the fixed partition of the processor after the controller is cold reset is not less than the memory capacity before the cold reset.
  • the processor's access speed to the memory remains unchanged, reducing the impact of memory failures on upper-layer business operations. Impact.
  • the method further includes: obtaining the memory utilization of each of the K memory units, the K memory units belonging to the fourth partition.
  • K is an integer greater than or equal to 1; if the memory utilization of the k-th memory unit among the K memory units exceeds the preset threshold, all or part of the memory resources of the k-th memory unit will be released, k is greater than or equal to 1 An integer less than or equal to K.
  • the processor will preferentially access the corresponding memory unit in the flow pool. Since the memory capacity of the memory units corresponding to each processor in the mobile partition may be uneven, in order to ensure that the reduced memory unit in the mobile partition does not have insufficient capacity resulting in cross-memory unit access, you can periodically query the memory unit in the mobile partition. Memory utilization per memory unit. For example, a distributed virtual memory system (vcache) can be created at the processor granularity, and vcache can be used to query the memory utilization of each memory unit.
  • vcache distributed virtual memory system
  • the page replacement algorithm can be used to release all or part of the memory resources of the memory unit, such as the optimal replacement algorithm (OPT), the first in first out replacement algorithm (FIFO), the most recent most recent Use (least recently used, LRU) algorithm and clock (CLOCK) replacement algorithm.
  • OPT optimal replacement algorithm
  • FIFO first in first out replacement algorithm
  • LRU most recent most recent Use algorithm
  • CLOCK clock replacement algorithm
  • the embodiment of this application can regularly clean up the memory units in the flow pool when the controller memory fails and causes the memory to shrink, ensuring that there will be no cross-memory unit access by the processor due to insufficient memory unit capacity, and ensuring the upper-layer business operating speed.
  • the method before re-allocating the memory of the first controller to obtain the third partition and the fourth partition, the method further includes: receiving a second A first input and output request of a computer device; forwarding the first input and output request to another controller other than the first controller of the first computer device, the first computer device including at least two controllers device.
  • Embodiments of the present application can forward the input and output requests of the second computer device to other controllers during the cold reset period caused by a memory failure of the first controller, that is, when the first controller cannot work normally, to achieve seamless upper-layer services. Seamless switching reduces the impact of faulty memory on upper-layer business operations.
  • the method further includes: receiving the A second input and output request of the second computer device; forwarding the second input and output request to any one or more controllers of the first computer device.
  • Embodiments of the present application can forward the input and output requests of the second computer device to any controller of the first computer device after a memory failure of the first controller causes a cold reset, that is, when the first controller resumes normal operation, making full use of All controllers in the first computer equipment reduce the impact of faulty memory on upper-layer business operations.
  • embodiments of the present application provide a computer device, which includes a unit for implementing the first aspect or any possible implementation of the first aspect.
  • inventions of the present application provide a computer device.
  • the computer device includes a processor, the processor being coupled to a memory, reading and executing instructions and/or program codes in the memory to execute the first aspect or Any possible implementation of the first aspect.
  • inventions of the present application provide a chip system.
  • the chip system includes a logic circuit and a forwarding interface.
  • the forwarding interface is used to forward the input and output requests of the second computer device.
  • the logic circuit is used to couple with the forwarding interface. Data is transmitted through the forwarding interface to perform the first aspect or any possible implementation manner of the first aspect.
  • embodiments of the present application provide a computer-readable storage medium that stores program code.
  • the computer storage medium When the computer storage medium is run on a computer, it causes the computer to execute the first aspect or the first aspect. any possible implementation.
  • inventions of the present application provide a computer program product.
  • the computer program product includes: computer program code.
  • the computer program code When the computer program code is run on a computer, it causes the computer to execute the first aspect or any of the first aspects.
  • Figure 1 is a schematic diagram of an application scenario of a memory management device provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an application scenario of another memory management device provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of a memory fault isolation scenario provided by an embodiment of the present application.
  • Figure 4 is an exemplary flow chart of a memory management method provided by an embodiment of the present application.
  • Figure 5 is an application schematic diagram of memory access management provided by an embodiment of the present application.
  • Figure 6 is a schematic structural diagram of a memory partition provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of memory shrinkage after a fault provided by an embodiment of the present application.
  • Figure 8 is a schematic diagram of a flow partition page management provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of a service forwarding scenario provided by an embodiment of the present application.
  • FIG. 10 is an example structural diagram of a computer device provided by an embodiment of the present application.
  • FIG. 11 is a structural example diagram of another computer device provided by an embodiment of the present application.
  • Figure 12 is an example diagram of a computer program product provided by an embodiment of the present application.
  • Figure 1 is a schematic diagram of an application scenario of a memory management device provided by an embodiment of the present application.
  • the application server 100 may be a physical machine or a virtual machine. Physical application servers include, but are not limited to, desktop computers, servers, laptops, and mobile devices.
  • the application server accesses the storage system through the fiber switch 110 to access data.
  • the switch 110 is an optional device, and the application server 100 can also directly communicate with the storage system 120 through the network.
  • the optical fiber switch 110 can also be replaced with an Ethernet switch, InfiniBand switch, RoCE (RDMA over Converged Ethernet) switch, etc.
  • the storage system 120 shown in Figure 1 is a centralized storage system.
  • the characteristic of the centralized storage system is that it has a unified entrance. All data from external devices must pass through this entrance.
  • This entrance is the engine 121 of the centralized storage system.
  • Engine 121 is the most core component in the centralized storage system, and many advanced functions of the storage system are implemented in it.
  • controllers in the engine 121 there are one or more controllers in the engine 121.
  • Figure 1 illustrates this by taking the engine including two controllers as an example.
  • controller 0 writes a copy of data to its memory 124, it can send a copy of the data to controller 1 through the mirror channel, and controller 1 will The copy is stored in own local memory 124. Therefore, controller 0 and controller 1 are mutual backups.
  • controller 0 fails, controller 1 can take over the services of controller 0.
  • controller 1 fails, controller 0 can take over the services of controller 1. services, thereby preventing hardware failure from causing the entire storage system 120 to become unavailable.
  • four controllers are deployed in the engine 121, there are mirror channels between any two controllers, so any two controllers are backups of each other.
  • the engine 121 also includes a front-end shared interface 125 and a back-end interface 126.
  • the front-end shared interface 125 is used to communicate with the application server 100, receive input and output requests from the server 100 and forward them to the corresponding controller, thereby providing storage services for the application server 100.
  • the backend interface 126 is used to communicate with the hard disk 134 to expand the capacity of the storage system. Through the backend interface 126, the engine 121 can connect more hard disks 134, thereby forming a very large storage resource pool.
  • the main circuit system that makes up the controller is installed on the motherboard 128, including a basic input output system (BIOS) chip, an input/output (I/O) control chip, an expansion slot and other components.
  • BIOS basic input output system
  • I/O input/output
  • CPU 123, memory 124 and backend interface 126 are located on motherboard 128.
  • the controller 0 at least includes a processor 123 and a memory 124 .
  • the processor 123 is a CPU that is used to process data access requests from outside the storage system (server or other storage system), and is also used to process requests generated within the storage system. For example, when the processor 123 receives write data requests sent by the application server 100 through the front-end shared port 125, the data in these write data requests will be temporarily stored in the memory 124. When the total amount of data in the memory 124 reaches a certain threshold, the processor 123 sends the data stored in the memory 124 to the hard disk 134 for persistent storage through the back-end port.
  • Memory 124 refers to the internal memory that directly exchanges data with the processor. It can read and write data at any time and very quickly, and serves as a temporary data storage for the operating system or other running programs.
  • Memory includes at least two types of storage, For example, the memory can be either random access memory or read only memory (ROM).
  • random access memory is dynamic random access memory (DRAM) or storage class memory (SCM).
  • DRAM is a semiconductor memory that, like most random access memories (RAM), is a volatile memory device.
  • SCM is a composite storage technology that combines the characteristics of traditional storage devices and memory. Storage-level memory can provide faster read and write speeds than hard disks, but is slower than DRAM in terms of access speed and cheaper than DRAM in cost. .
  • DRAM and SCM are only illustrative examples in this embodiment, and the memory may also include other random access memories, such as static random access memory (static random access memory, SRAM).
  • the read-only memory for example, it can be a programmable read only memory (PROM), an erasable programmable read only memory (EPROM), etc.
  • the memory 124 can also be a dual in-line memory module or a dual in-line memory module (DIMM for short), that is, a module composed of dynamic random access memory (DRAM), or a solid state drive. (solid state disk, SSD).
  • the controller 0 may be configured with multiple memories 124 and different types of memories 124 . This embodiment does not limit the number and type of memories 113 .
  • the memory 124 can be configured to have a power-saving function.
  • the power-guaranteing function means that the data stored in the memory 124 will not be lost when the system is powered off and then on again.
  • Memory with a power-saving function is called non-volatile memory.
  • the memory 124 stores software programs, and the processor 123 runs the software programs in the memory 124 to manage the hard disk.
  • the hard disk is abstracted into a storage resource pool, and then divided into LUNs for server use.
  • the LUN here is actually the hard disk seen on the server.
  • some centralized storage systems are also file servers themselves and can provide shared file services to servers.
  • controller 1 (and other controllers not shown in Figure 1) are similar to controller 0 and will not be described again here.
  • Figure 1 shows a centralized storage system with separate disk and control.
  • the engine 121 may not have a hard disk slot, the hard disk 134 needs to be placed in the hard disk enclosure 130, and the backend interface 126 communicates with the hard disk enclosure 130.
  • the back-end interface 126 exists in the engine 121 in the form of an adapter card. Two or more back-end interfaces 126 can be used simultaneously on one engine 121 to connect multiple hard disk enclosures.
  • the adapter card can also be integrated on the motherboard, in which case the adapter card can communicate with the processor 112 through the PCIE bus.
  • the storage system may include two or more engines 121, and redundancy or load balancing is performed between the multiple engines 121.
  • the hard disk enclosure 130 includes a control unit 131 and several hard disks 134 .
  • the control unit 131 may have various forms.
  • the number of control units 131 may be one, two or more.
  • the hard disk enclosure 130 does not have a control unit 131 inside, but the network card 104 completes data reading and writing, address conversion, and other computing functions.
  • the network card 104 is a smart network card. It can contain CPU and memory.
  • the hard disk enclosure 130 may be a serial attached small computer system interface (SAS) hard disk enclosure or a non-volatile memory host controller interface.
  • SAS serial attached small computer system interface
  • SAS hard disk enclosures adopt the SAS3.0 protocol, and each enclosure supports 25 SAS hard disks.
  • the engine 121 is connected to the hard disk enclosure 130 through an onboard SAS interface or a SAS interface module.
  • the NVMe hard disk enclosure is more like a complete computer system.
  • the NVMe hard disk is inserted into the NVMe hard disk enclosure.
  • the NVMe hard disk enclosure is connected to the engine 121 through the remote direct memory access (RDMA) port.
  • RDMA remote direct memory access
  • FIG. 2 is a schematic diagram of an application scenario of another memory management device provided by an embodiment of the present application.
  • Figure 2 shows a centralized storage system with integrated disk and control.
  • the engine 121 has a hard disk slot.
  • the hard disk 134 can be directly deployed in the engine 121.
  • the back-end interface 126 is an optional configuration. When the storage space of the system is insufficient, more hard disks can be connected through the back-end interface 126. Or hard disk enclosure.
  • Figure 3 is a schematic diagram of a memory fault isolation scenario provided by an embodiment of the present application.
  • the storage units of the memory can be divided into chip>array>row/column according to the hierarchy from large to small. Memory failures are often accompanied by entire row, column or bank failures. If a large number of failures or uncorrected error (UCE) storms occur, it will cause memory repair failure and affect the controller. Power on normally.
  • UCE uncorrected error
  • the memory 124 in controller 0 and controller 1 may include multiple memory bars.
  • the black vertical lines can be regarded as normal memory bars, and the white vertical lines can be regarded as faulty memory bars.
  • a single memory module failure in controller 0 will cause the entire controller 0 to fail and cause a cold reset of controller 0.
  • the storage system 120 only relies on controller 1 to operate under single control, and the reliability is greatly reduced.
  • the reset that occurs automatically when the microcontroller is powered on is called a cold reset.
  • the health of the memory is detected.
  • the memory isolation method is used to isolate the failed memory area.
  • the isolation levels include memory strip level, memory particle level, memory array level, memory row level and memory column level.
  • Upper-layer services will be affected to a certain extent due to memory shrinkage.
  • Upper-layer business refers to the business generated by the operation of upper-layer applications, such as the creation of data objects, etc. When a large-scale failure occurs in the memory related to the upper-layer business, the business may not be able to continue running.
  • the technical solution provided by this application can be applied to the storage system shown in Figure 1 or Figure 2.
  • a memory failure causes a cold reset of the controller, the faulty memory is isolated, the device is powered on normally, and then the flow partition of the controller memory is reduced to reduce the impact on upper-layer services.
  • Figure 4 is an exemplary flow chart of a memory management method provided by an embodiment of the present application.
  • the health of the memory module is first checked.
  • the health threshold that triggers scaling can be set to 30%.
  • the health of the memory module at this time is determined to be 20%, shrinking is triggered, and the faulty memory area is isolated.
  • controller 0 has 50G of memory.
  • the capacity of the faulty memory module is 10G, after isolating the faulty memory module, controller 0 The capacity size is reduced to 40G.
  • Figure 5 is an application schematic diagram of memory access management provided by an embodiment of the present application.
  • the memory access management of the controller may include processor management 310 and memory management 320.
  • Each node from node 1 to node n can be regarded as a CPU.
  • memory access can be managed in various ways.
  • NUMA non-uniform memory access
  • each CPU has its own "local” memory, the memory of other CPUs can be regarded as "remote” memory. Under this architecture, the CPU will have lower latency and higher performance when accessing its own local memory.
  • NAMA non-uniform memory access
  • the processor Each node in the management 310 is a non-uniform memory access node (numa node).
  • MPP massive parallel processing
  • each node only accesses local memory and cannot access remote memory.
  • the memory management 320 divides the memory in the controller into at least one flowing partition and at least one fixed partition.
  • the memory capacity of each fixed partition is the same.
  • the fixed partition is used for the operation of upper-level business, such as creating objects related to applications, creating data structures related to objects, etc.
  • the flowing partition is an extended partition of the cache memory.
  • Upper-layer business input and output requests and metadata caching all apply for memory pages from this partition. It is a partition that carries the main memory capacity of the controller.
  • the memory in the memory management 320 can be partitioned using a slab allocation algorithm, a simple list of blocks (slob) allocation algorithm, or a slub allocation algorithm.
  • the slab, slob and slub algorithms are used to solve the allocation problem of small-granularity memory (less than 4K).
  • Figure 6 is a schematic structural diagram of a memory partition provided by an embodiment of the present application.
  • each fixed partition can be divided into one or more memory units, and each memory unit corresponds to a node, that is, the memory unit corresponding to the node is the local memory of the node.
  • Each memory unit is a collection of one or more memory areas in the controller.
  • fixed partition 1 includes two memory units, a first memory unit and a second memory unit.
  • the first memory unit corresponds to node 1
  • the second memory unit corresponds to node 2. That is, the memory area in the first memory unit belongs to the local memory of node 1, and the memory area in the second memory unit belongs to the local memory of node 2.
  • numa node1 accesses the first memory unit much faster than the second memory unit.
  • floating partitions can also be divided into one or more memory units.
  • the number of memory units is equal to the number of nodes.
  • the i-th memory unit corresponds to node i, that is, the memory area in the i-th memory unit belongs to the local memory of node i. It should be understood that each memory unit is a collection of one or more memory areas in the controller, and i is greater than or equal to 1 and less than or equal to the number of nodes.
  • the capacity of controller 0 before shrinkage is 100G, and it has three fixed partitions.
  • the capacity of each fixed partition is 10G, and the size of the flowing partition is 70G.
  • the faulty memory area is isolated. If the faulty memory area is 10G, the memory capacity of controller 0 is reduced to 90G.
  • memory failures can occur in fixed partitions or in floating partitions.
  • the embodiment of this application only reduces the capacity of the flowing partition in controller 0, that is, when rebuilding the memory partition of controller 0, the size of the flowing partition is 60G, and the capacity of each fixed partition among the three fixed partitions is still 10G. .
  • FIG. 7 is a schematic diagram of memory shrinkage after a fault provided by an embodiment of the present application.
  • the controller memory includes three fixed partitions and one floating partition before failure.
  • Memory 1 to 20 can be regarded as a collection of addresses of one or more memory areas.
  • memory 1 represents the memory address of memory bank 1
  • memory 2 represents the address of memory bank 2.
  • the fixed partition before the failure can be called the first partition
  • the flowing partition before the failure is called the second partition
  • the fixed partition reallocated after the failure is called the third partition
  • the flowing partition reallocated after the failure is called Division 4.
  • the entire memory module 2 in the controller fails, that is, the memory 2 in the first partition fails, causing the entire controller to be cold reset.
  • the faulty memory 2 is isolated during the cold reset startup of the controller, and the memory in the second partition is 14 is allocated to the first partition, and the capacity of memory 14 is greater than or equal to the capacity of memory 2.
  • the first partition includes memories 1-12
  • the second partition includes memories 13-20.
  • Memory 1 to 20 have the same capacity. If memory 2 fails, after the memory 2 failure, the controller cold resets and restarts to isolate the faulty memory 2, divide the memory 14 in the second partition into the first partition, and generate a new third and fourth partitions.
  • the third partition includes memory 1, 14, 3 to 12, and the fourth partition includes 13, 15 to 20.
  • the memory fault area can be isolated.
  • a memory isolation device can be used to isolate the faulty memory area, or the faulty memory area can be marked for isolation.
  • BIOS basic input output system
  • the memory is first detected through the basic input output system (BIOS) in the motherboard 128, and the address information of the faulty memory area is obtained, and the address information of the faulty memory area is Save it to non-volatile memory (NVM), read the address information of the faulty memory area saved in NVM, mark the faulty memory area corresponding to the address information as unavailable, and isolate the faulty memory area.
  • BIOS After BIOS isolates the faulty memory area, it reports the normally available memory addresses in the controller to the controller's operating system for memory management.
  • controller 0 has 100G of memory and 2 nodes
  • the local memory corresponding to each node has 50G.
  • the BIOS reports the local memory corresponding to node 1 to the operating system. Less than other nodes.
  • the local memory corresponding to node 1 has a capacity of 50G before the controller is cold reset, that is, the total local memory capacity corresponding to node 1 in all fixed partitions and flowing partitions is 50G.
  • the first memory corresponding to node 1 in the flowing partition The capacity of the memory unit is 30G, and the capacity of the second memory unit corresponding to node 1 in the fixed partition is 20G.
  • the capacity of the faulty memory is 10G
  • the capacity of the first memory unit corresponding to node 1 in the flowing partition can be reduced to 20G, that is, the corresponding memory unit in the flowing partition bears the reduced memory capacity.
  • memory 1 and memory 2 of fixed partition 1 in Figure 7 belong to the local memory corresponding to node 1, and memory 1 and memory 2 can be called the first memory unit.
  • Memory 13, memory 14 and memory 18 of the flow partition belong to the local memory corresponding to node 1, and memory 13, memory 14 and memory 18 can be called the second memory unit. If a memory failure occurs in memory 2 and causes a cold reset of the controller, the faulty memory 2 will be isolated when the controller cold reset is started, and all or part of the memory in memory 13, memory 14 or memory 18 in the mobile partition will be allocated to fixed partition 1. Just ensure that the memory capacity allocated to the first memory unit by the flowing partition is greater than or equal to the capacity of memory 2.
  • the embodiment of the present application can isolate the faulty memory area when a memory fault occurs in the fixed partition running upper-layer services, confirm the processor corresponding to the memory unit where the faulty memory area is located, and divide the memory of the memory unit in the flowing partition corresponding to the processor.
  • Give a fixed partition so that the memory capacity of the memory unit corresponding to the fixed partition of the processor after the controller is cold reset is not less than the memory capacity before the cold reset.
  • the processor's access speed to the memory remains unchanged, reducing the impact of memory failures on upper-layer business operations. Impact.
  • the reduced memory capacity can also be assumed by other memory units in the flowing partition.
  • memory 1 and memory 2 in fixed partition 1 in Figure 7 belong to the local memory corresponding to numa node1, and memory 1 and memory 2 can be called the first memory unit.
  • Memory 13, memory 14 and memory 18 of the flow partition belong to the local memory corresponding to numa node1.
  • Memory 13, memory 14 and memory 18 can be called the second memory unit.
  • Memory 17 and memory 19 in the flow partition belong to the local memory corresponding to numa node2.
  • Memory 17 and memory 19 can be called the third memory unit.
  • the memory in the second memory unit or the third memory unit can be It is divided into the first memory unit, but because the third memory unit belongs to the remote memory of numa node1, numa node1 accesses the memory in the third memory unit not as fast as numa node1 accesses the memory in the first memory unit, that is, the third memory
  • the speed of numa node1 accessing the redivided first memory unit will decrease.
  • the memory in the second memory unit belongs to the local memory of numa node1. Therefore, after the memory in the second memory unit is divided into the first memory unit, the speed of numa node1 accessing the redivided first memory unit will not decrease.
  • numa node1 accesses the first memory unit much faster than numa node2 accesses the first memory unit. Therefore, when the application is running, it is necessary to avoid accessing memory across numa as much as possible. In order to reduce cross-numa access to memory, this can be achieved by setting the CPU affinity of the thread. When creating a memory partition, the memory is evenly distributed on each numa node to achieve memory allocation affinity. For example, if the controller has a total of 100G memory and 2 numa nodes, each numa node corresponds to 50G memory. Optionally, you can also set anti-affinity or priority to reduce cross-numa memory accesses.
  • Figure 8 is a schematic diagram of a flow partition page management provided by an embodiment of the present application.
  • Floating partition page management includes business process 410 and memory management 420.
  • the IO sent by the front-end shared interface 125 will be applied for pages from the memory unit in the flowing partition corresponding to the current numa node according to the affinity. Since the memory capacity corresponding to each numa node in the flowing partition may be uneven, in order to ensure the shrinkage in the flowing partition If the memory unit corresponding to the numa node does not have insufficient capacity, the controller's global cache cache can create a distributed virtual memory system (Vcache) based on numa node granularity. For example, if the storage system 120 has three numa nodes, three Vcache, Vcache1 to Vcache3, are created.
  • Vcache distributed virtual memory system
  • Each Vcache periodically queries the memory utilization of the memory unit corresponding to this numa node in the flowing partition. If it exceeds the preset threshold, the page will be eliminated. For example, vcache3 queries that the memory utilization rate of the third memory unit in the flowing partition corresponding to numa node3 is 90%, and the preset threshold is 80%, then some or all memory pages of the third memory unit will be eliminated until the third memory unit Memory utilization is below 80%. This ensures that after scaling down, the upper-layer business can always apply for cache pages in the memory corresponding to this numa node, and no cross-numa node applications will occur.
  • Embodiments of the present application can use page replacement algorithms to eliminate pages, including the best replacement algorithm, the first-in-first-out replacement algorithm, the most recently unused algorithm, and the clock replacement algorithm.
  • Figure 9 is a schematic diagram of a service forwarding scenario provided by an embodiment of the present application.
  • Steps 220 and 230 in Figure 4 are both implemented when controller 0 is reset. During the period of cold reset caused by a memory failure in controller 0, the embodiment of the present application realizes seamless switching of upper-layer services by utilizing the forwarding function of the front-end shared interface 125.
  • Front-end shared interface 125 includes connection service 510 and forwarding service 520.
  • the connection service 510 is used to establish a connection with the application server 100 and receive I/O requests from the application server 100.
  • the forwarding service 520 can forward the I/O request to the corresponding controller through a bus interface, such as a peripheral component interconnect express (PCIE), for execution.
  • PCIE peripheral component interconnect express
  • PCIE is only an example of a bus interface, and the bus interface can also be other interfaces.
  • a memory module in controller 0 fails, at time t 1 , controller 0 is cold reset and starts powering on, and at time t 2 , controller 0 restarts and rebuilds the fixed partition and the mobile partition. Then, during the period from t 0 to t 2 , the forwarding service 520 forwards the I/O request originally transferred to the controller 0 for processing to the controller 1 for processing. After t 2 , controller 0 isolates the faulty memory and returns to normal, and the forwarding service 520 forwards the corresponding I/O request to controller 0 for processing.
  • Embodiments of the present application also provide a computer storage medium, which stores program instructions.
  • the program When the program is executed, it may include the steps in the corresponding embodiments as shown in Figure 4, Figure 5, Figure 7, Figure 8, and Figure 9. Some or all steps of a memory management method.
  • FIG. 10 is an example structural diagram of a computer device 1000 provided by an embodiment of the present application.
  • the computer device 1000 includes an acquisition module 1010 and a processing module 1020.
  • the processing module 1020 is used to isolate the faulty memory area and reallocate the memory of the first controller, and perform some or all of the steps in the method of Figure 4, the method of Figure 5, the method of Figure 7, and the method of Figure 9 .
  • the acquisition module 1010 is used to obtain the memory utilization of the memory unit and execute some or all of the steps in the method in Figure 8.
  • FIG. 11 is a structural example diagram of another computer device 1300 provided by an embodiment of the present application.
  • Computer device 1300 includes a processor 1302, a communication interface 1303, and memory 1304.
  • One example of computer device 1300 is a chip.
  • Another example of computer device 1300 is a computing device.
  • computer device 1300 may also include forwarding interface 1305.
  • the forwarding interface 1305 is used to receive input and output requests sent from the front-end sharing interface 125, and then forward the input and output requests to the processor 1302 for processing.
  • a memory failure occurs in the fixed partition of memory 1304, causing a cold reset of controller 0.
  • the BIOS in the motherboard 128 of controller 0 isolates the failed memory area.
  • CPU123 will re-divide the fixed partition and the mobile partition, and divide the memory area in the mobile partition into fixed partitions, so that the capacity of the fixed partition after controller 0 is cold reset is not less than that before controller 0 is cold reset. Fixed partition capacity.
  • the front-end shared interface 125 detects that the controller 0 is being cold reset and does not send the input and output requests of the application server 100 to the forwarding interface 1305 of the computer device 1300.
  • the front-end shared interface 125 detects that the controller 0 returns to normal, and will send the input and output requests of the application server 100 to the forwarding interface 1305 of the computer device 1300.
  • the forwarding interface 1305 forwards the received input and output requests to the CPU 123 for processing, and responds to upper-layer service requests.
  • the methods disclosed in the above embodiments of the present application can be applied to the processor 1302 or implemented by the processor 1302.
  • the processor 1302 can be a central processing unit (CPU), or other general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), or an on-site processor.
  • a general-purpose processor can be a microprocessor or any conventional processor, etc.
  • each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 1302 .
  • Each method, step and logical block diagram disclosed in the embodiment of this application can be implemented or executed.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • Memory 1304 may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory.
  • ROM read-only memory
  • PROM programmable ROM
  • EPROM erasable programmable read-only memory
  • Erase programmable read-only memory electrically EPROM, EEPROM
  • Volatile memory may be random access memory (RAM), which is used as an external cache.
  • RAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • Double data rate SDRAM double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous link dynamic random access memory direct rambus RAM, DR RAM
  • direct rambus RAM direct rambus RAM
  • the processor 1302, the memory 1304, the forwarding interface 1305 and the communication interface 1303 can communicate through a bus.
  • Executable code is stored in the memory 1304, and the processor 1302 reads the executable code in the memory 1304 to execute the corresponding method.
  • the memory 1304 may also include an operating system and other software modules required for running processes.
  • the operating system can be LINUX TM , UNIX TM , WINDOWS TM , etc.
  • the executable code in the memory 1304 is used to implement the methods shown in Figures 4, 5, 7, 8, and 9.
  • the processor 1302 reads the executable code in the memory 1304 to execute the methods shown in Figures 4 and 9. 5.
  • example computer program product 1400 is provided using signal bearing medium 1401.
  • the signal bearing medium 1401 may include one or more program instructions 1402, which when executed by one or more processors may provide the methods shown above for FIGS. 4, 5, 7, 8, and 9. Describe the function or part of the function.
  • FIGS. 4 , 5 , 7 , 8 , and 9 one or more features thereof may be performed by one or more instructions associated with signal bearing medium 1401 .
  • signal bearing media 1401 may include computer readable media 1403 such as, but not limited to, a hard drive, a compact disk (CD), a digital video disc (DVD), a digital tape, memory, read only memory (read-only storage memory), -only memory, ROM) or random access memory (random access memory, RAM), etc.
  • signal bearing media 1401 may include computer recordable media 1404 such as, but not limited to, memory, read/write (R/W) CDs, R/W DVDs, and the like.
  • signal bearing medium 1401 may include communication media 1405 such as, but not limited to, digital and/or analog communication media (eg, fiber optic cables, waveguides, wired communication links, wireless communication links, etc.).
  • signal bearing medium 1401 may be conveyed by a wireless form of communication medium 1405 (eg, a wireless communication medium that complies with the IEEE 802.11 standard or other transmission protocol).
  • One or more program instructions 1402 may be, for example, computer-executable instructions or logic-implemented instructions.
  • the aforementioned computing device may be configured to, in response to program instructions 1402 communicated to the computing device via one or more of computer-readable media 1403, computer-recordable media 1404, and/or communication media 1405, Provide various operations, functions, or actions. It should be understood that the arrangements described here are for example purposes only.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Human Computer Interaction (AREA)
  • Hardware Redundancy (AREA)

Abstract

提供了一种内存管理方法,其中,该方法包括:在用于上层业务运行的固定分区出现了内存故障导致控制器冷复位的情况下,重新分配控制器的固定分区和流动分区,重新分配后的固定分区的容量不小于故障前的固定分区的容量。该方法可以在用于上层业务运行的内存发生故障隔离后,减少内存故障对上层业务运行的影响。

Description

内存管理方法和装置 技术领域
本申请实施例涉及存储技术领域,具体涉及一种内存管理方法和装置。
背景技术
随着科技水平的飞速进步,金融和电信运营商等用户对存储系统的性能要求越来越高。存储系统中的内存的功能可以分为两种,一种用于上层业务的运行,另一种用于上层业务的数据缓存。由于多种环境因素的影响,存储系统中的内存可能会出现各种各样的问题。
现有的技术方案在存储服务器的上电启动过程中,会对内存的健康情况进行检测,当内存出现故障时,通常采用内存隔离的方法对发生故障的内存进行隔离。但是,如果内存故障发生在用于上层业务运行的内存区域,那么直接对该故障区域进行隔离会影响上层业务的运行。尤其是当与上层业务有关的内存发生大面积故障时,可能导致业务无法运行。
因此,在用于上层业务运行的内存发生故障隔离后,如何减少对上层业务的影响是一个亟待解决的技术问题。
发明内容
本申请实施例提供一种内存管理方法和装置,可以在用于上层业务运行的内存发生故障隔离后,减少内存故障对上层业务运行的影响。
第一方面,提供了一种内存管理方法,该方法包括:确定第一分区发生内存故障且所述内存故障导致第一计算机设备的第一控制器冷复位,所述第一控制器包括第二分区和所述第一分区,所述第一分区用于上层业务的运行,所述第二分区用于所述上层业务的数据缓存;隔离故障内存区域,所述故障内存区域为所述第一分区中发生所述内存故障的内存区域;重新分配所述第一控制器的内存,得到第三分区和第四分区,所述第三分区用于所述上层业务的运行,所述第四分区用于所述上层业务的数据缓存,所述第三分区的内存容量大于等于所述第一分区的内存容量,所述第四分区和所述第三分区的内存容量之和等于所述第一控制器的内存容量减去所述故障内存区域的容量。
应理解,故障内存区域可以是第一控制器中发生故障的内存条、内存颗粒、内存阵列、内存行或者内存列。
可选的,故障内存区域还可以是包括了正常内存和故障内存的区域,示例性地,一根内存条上的部分内存颗粒出现了内存故障,可以将整个内存条隔离。
可选的,计算机设备可以包括至少一个第一分区和至少一个第二分区。
应理解,第一分区和第三分区都是固定分区,固定分区用于上层业务的运行,例如创建对象、创建和对象相关的数据结构等。第二分区和第四分区都是流动分区,流动分区是高速缓冲存储器(cache)的一个扩展分区,上层业务输入输出(input/output,I/O)请求和元数据缓存等都从该分区申请内存页面,是承载控制器主要内存容量的一个分区。
应理解,固定分区在第一控制器冷复位之后的内存容量不小于冷复位之前的内存容量,即固定分区出现内存故障后,第一控制器冷复位重新分配固定分区和流动分区,将流动分区中的内存容量划分给固定分区。示例性地,第一控制器冷复位之前的内存容量大小为100G,有3个固定分区,每个固定分区的容量大小为10G,流动分区的大小为70G。当固定分区出现内存故障导致第一控制器冷复位时,隔离故障内存区域,若故障内存区域的容量大小为10G, 控制器0的内存容量缩容到90G,对应减少控制器中流动池的内存容量。即第一控制器冷复位后,3个固定分区的容量大小依然是每个10G,流动分区的大小缩减为60G。可选的,第一控制器冷复位后,还可以设置3个固定分区的容量大小为每个11G,流动分区的大小为57G。
本申请实施例可以在运行上层业务的固定分区发生内存故障时隔离故障内存区域,令固定分区在控制器冷复位之后的内存容量不小于冷复位之前的内存容量,减少内存故障对上层业务运行的影响。
结合第一方面,在第一方面的某些可能实现方式中,所述重新分配所述第一控制器的内存,得到第三分区和第四分区,包括:确定目标处理器,所述目标处理器是所述第一控制器包括的多个处理器中的一个,所述目标处理器与第一内存单元对应,所述第一内存单元为所述第一分区包括的多个内存单元中包括所述故障内存区域的内存单元;根据所述目标处理器,确定至少一个第二内存单元,所述第二内存单元为所述第二分区包括的多个内存单元中与所述目标处理器对应的内存单元,所述至少一个第二内存单元的容量不小于所述故障内存区域的容量;根据所述第一内存单元和所述至少一个第二内存单元,重新分配所述第一控制器的内存,得到所述第三分区和所述第四分区,所述第三分区包括所述至少一个第二内存单元。
应理解,故障内存区域是内存隔离的最小粒度的内存单位,内存单元是能够在分区时分配的最小粒度的内存单位。
应理解,处理器对应的内存单元为处理器的本地内存,处理器访问对应内存单元的速度要比访问不对应内存单元的速度快得多。示例性地,中央处理器(central processing unit,CPU)1在第一分区对应的内存单元为第一内存单元,CPU1在第二分区对应的内存单元为第二内存单元,CPU2在第二分区对应的内存单元为第六内存单元。则CPU1访问第一内存单元和第二内存单元的速度要比访问第六内存单元快的多。
应理解,当内存故障出现在第一分区中的第一内存单元时,在第一控制器冷复位后可以将第二分区中第二内存单元中的内存划分给第一内存单元。第三内存单元为第一控制器冷复位后更新的第一内存单元,第四内存单元为第一控制器冷复位后更新的第二内存单元。
本申请实施例可以在运行上层业务的固定分区发生内存故障时隔离故障内存区域,确认该故障内存区域所在的内存单元对应的处理器,将该处理器对应的流动分区中的内存单元的内存分给固定分区,使得处理器在固定分区对应的内存单元在控制器冷复位之后的内存容量不小于冷复位之前的内存容量,处理器对内存的访问速度保持不变,减少内存故障对上层业务运行的影响。
结合第一方面,在第一方面的某些可能实现方式中,所述方法还包括:获取K个内存单元中每个内存单元的内存利用率,所述K个内存单元属于所述第四分区,K为大于等于1的整数;若所述K个内存单元中第k内存单元的内存利用率超过预设阈值,则释放所述第k内存单元的全部或者部分内存资源,k为大于等于1小于等于K的整数。
应理解,处理器会优先访问流动池中对应的内存单元。由于流动分区中各处理器对应的内存单元的内存容量可能不均,为了保障流动分区中缩容后的内存单元不会出现容量不够导致跨内存单元访问的情况,可以周期性的查询流动分区中每个内存单元的内存利用率。示例性地,可以以处理器为粒度创建分布式虚拟内存系统(vcache),利用vcache来查询每个内存单元的内存利用率。
可选的,释放内存单元的全部或者部分内存资源可以利用页面置换算法,例如,最佳置换算法(optimal replacement algorithm,OPT)、先进先出置换算法(first in first out,FIFO)、最近最久未使用(least recently used,LRU)算法和时钟(CLOCK)置换算法。
本申请实施例可以在控制器内存发生故障导致内存缩容时,定期对流动池中的内存单元进行内存清理,保证不会出现内存单元容量不足导致的处理器跨内存单元访问,保证了上层业务的运行速度。
结合第一方面,在第一方面的某些可能实现方式中,在所述重新分配所述第一控制器的内存,得到第三分区和第四分区之前,所述方法还包括:接收第二计算机设备的第一输入输出请求;将所述第一输入输出请求转发至所述第一计算机设备的除所述第一控制器以外的其他控制器,所述第一计算机设备包括至少两个控制器。
本申请实施例可以在第一控制器发生内存故障导致冷复位这一期间,即第一控制器不能正常工作时,将第二计算机设备的输入输出请求转发给其他控制器,实现上层业务的无缝切换,减少故障内存对上层业务运行的影响。
结合第一方面,在第一方面的某些可能实现方式中,在所述重新分配所述第一控制器的内存,得到第三分区和第四分区之后,所述方法还包括:接收所述第二计算机设备的第二输入输出请求;将所述第二输入输出请求转发至所述第一计算机设备的任意一个或多个所述控制器。
本申请实施例可以在第一控制器发生内存故障导致冷复位之后,即第一控制器恢复正常工作时,将第二计算机设备的输入输出请求转发给第一计算机设备的任意控制器,充分利用第一计算机设备中的所有控制器,减少故障内存对上层业务运行的影响。
第二方面,本申请实施例提供一种计算机装置,该计算机装置包括用于实现第一方面或第一方面的任一种可能的实现方式的单元。
第三方面,本申请实施例提供一种计算机装置,该计算机装置包括处理器,该处理器用于与存储器耦合,读取并执行该存储器中的指令和/或程序代码,以执行第一方面或第一方面的任一种可能的实现方式。
第四方面,本申请实施例提供一种芯片系统,该芯片系统包括逻辑电路和转发接口,该转发接口用于转发第二计算机设备的输入输出请求,该逻辑电路用于与该转发接口耦合,通过该转发接口传输数据,以执行第一方面或第一方面的任一种可能的实现方式。
第五方面,本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质存储有程序代码,当该计算机存储介质在计算机上运行时,使得计算机执行如第一方面或第一方面的任一种可能的实现方式。
第六方面,本申请实施例提供一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码在计算机上运行时,使得该计算机执行如第一方面或第一方面的任一种可能的实现方式。
附图说明
图1是本申请实施例提供的一种内存管理装置的应用场景示意图。
图2是本申请实施例提供的另一种内存管理装置的应用场景示意图。
图3是本申请实施例提供的一种内存故障隔离的场景示意图。
图4是本申请实施例提供的一种内存管理方法的示例性流程图。
图5是本申请实施例提供的一种内存访问管理的应用示意图。
图6是本申请实施例提供的一种内存分区的结构示意图。
图7是本申请实施例提供的一种故障后内存缩容的示意图。
图8是本申请实施例提供的一种流动分区页面管理的示意图。
图9是本申请实施例提供的一种业务转发的场景示意图。
图10是本申请实施例提供的一种计算机装置的结构示例图。
图11是本申请实施例提供的另一种计算机装置的结构示例图。
图12是本申请实施例提供的一种计算机程序产品的示例图。
具体实施方式
下面将结合附图,对本申请实施例中的技术方案进行描述。显然,所描述的实施例是本申请的一部分实施例,而不是全部实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都应属于本申请保护的范围。
图1是本申请实施例提供的一种内存管理装置的应用场景示意图。
在图1所示的应用场景中,用户通过应用程序来存取数据。运行这些应用程序的计算机被称为“应用服务器”。应用服务器100可以是物理机,也可以是虚拟机。物理应用服务器包括但不限于桌面电脑、服务器、笔记本电脑以及移动设备。应用服务器通过光纤交换机110访问存储系统以存取数据。应理解,交换机110是一个可选设备,应用服务器100也可以直接通过网络与存储系统120通信。或者,光纤交换机110也可以替换成以太网交换机、InfiniBand交换机、RoCE(RDMA over Converged Ethernet)交换机等。
图1所示的存储系统120是一个集中式存储系统。集中式存储系统的特点是有一个统一的入口,所有从外部设备来的数据都要经过这个入口,这个入口就是集中式存储系统的引擎121。引擎121是集中式存储系统中最为核心的部件,许多存储系统的高级功能都在其中实现。
如图1所示,引擎121中有一个或多个控制器,图1以引擎包含两个控制器为例予以说明。控制器0与控制器1之间具有镜像通道,当控制器0将一份数据写入其内存124后,可以通过所述镜像通道将所述数据的副本发送给控制器1,控制器1将所述副本存储在自己本地的内存124中。由此,控制器0和控制器1互为备份,当控制器0发生故障时,控制器1可以接管控制器0的业务,当控制器1发生故障时,控制器0可以接管控制器1的业务,从而避免硬件故障导致整个存储系统120的不可用。当引擎121中部署有4个控制器时,任意两个控制器之间都具有镜像通道,因此任意两个控制器互为备份。
引擎121还包含前端共享接口125和后端接口126,其中前端共享接口125用于与应用服务器100通信,接收服务器100的输入输出请求并转发给对应控制器,从而为应用服务器100提供存储服务。而后端接口126用于与硬盘134通信,以扩充存储系统的容量。通过后端接口126,引擎121可以连接更多的硬盘134,从而形成一个非常大的存储资源池。
主板128上面安装了组成控制器的主要电路系统,包括基本输入输出系统(basic input output system,BIOS)芯片、输入输出(input/output,I/O)控制芯片、扩充插槽等元件。CPU123、内存124和后端接口126位于主板128上。
在硬件上,如图1所示,控制器0至少包括处理器123、内存124。处理器123是一个CPU,用于处理来自存储系统外部(服务器或者其他存储系统)的数据访问请求,也用于处理存储系统内部生成的请求。示例性的,处理器123通过前端共享端口125接收应用服务器100发送的写数据请求时,会将这些写数据请求中的数据暂时保存在内存124中。当内存124中的数据总量达到一定阈值时,处理器123通过后端端口将内存124中存储的数据发送给硬盘134进行持久化存储。
内存124是指与处理器直接交换数据的内部存储器,它可以随时读写数据,而且速度很快,作为操作系统或其他正在运行中的程序的临时数据存储器。内存包括至少两种存储器, 例如内存既可以是随机存取存储器,也可以是只读存储器(read only memory,ROM)。举例来说,随机存取存储器是动态随机存取存储器(dynamic random access memory,DRAM),或者存储级存储器(storage class memory,SCM)。DRAM是一种半导体存储器,与大部分随机存取存储器(random access memory,RAM)一样,属于一种易失性存储器(volatile memory)设备。SCM是一种同时结合传统储存装置与存储器特性的复合型储存技术,存储级存储器能够提供比硬盘更快速的读写速度,但存取速度上比DRAM慢,在成本上也比DRAM更为便宜。然而,DRAM和SCM在本实施例中只是示例性的说明,内存还可以包括其他随机存取存储器,例如静态随机存取存储器(static random access memory,SRAM)等。而对于只读存储器,举例来说,可以是可编程只读存储器(programmable read only memory,PROM)、可抹除可编程只读存储器(erasable programmable read only memory,EPROM)等。另外,内存124还可以是双列直插式存储器模块或双线存储器模块(dual in-line memory module,简称DIMM),即由动态随机存取存储器(DRAM)组成的模块,还可以是固态硬盘(solid state disk,SSD)。实际应用中,控制器0中可配置多个内存124,以及不同类型的内存124。本实施例不对内存113的数量和类型进行限定。此外,可对内存124进行配置使其具有保电功能。保电功能是指系统发生掉电又重新上电时,内存124中存储的数据也不会丢失。具有保电功能的内存被称为非易失性存储器。
内存124中存储有软件程序,处理器123运行内存124中的软件程序可实现对硬盘的管理。例如将硬盘抽象化为存储资源池,然后划分为LUN提供给服务器使用等。这里的LUN其实就是在服务器上看到的硬盘。当然,一些集中式存储系统本身也是文件服务器,可以为服务器提供共享文件服务。
控制器1(以及其他图1中未示出的控制器)的硬件组件和软件结构与控制器0类似,这里不再赘述。
图1所示的是一种盘控分离的集中式存储系统。在该系统中,引擎121可以不具有硬盘槽位,硬盘134需要放置在硬盘框130中,后端接口126与硬盘框130通信。后端接口126以适配卡的形态存在于引擎121中,一个引擎121上可以同时使用两个或两个以上后端接口126来连接多个硬盘框。或者,适配卡也可以集成在主板上,此时适配卡可通过PCIE总线与处理器112通信。
需要说明的是,图1中只示出了一个引擎121,然而在实际应用中,存储系统中可包含两个或两个以上引擎121,多个引擎121之间做冗余或者负载均衡。
硬盘框130包括控制单元131和若干个硬盘134。控制单元131可具有多种形态。控制单元131的数量可以是一个,也可以是两个或两个以上。在一些实施方式中,硬盘框130内部不具有控制单元131,而是由网卡104来完成数据读写、地址转换以及其他计算功能。此时,网卡104是一个智能网卡。它可以包含CPU和内存。
按照引擎121与硬盘框130之间通信协议的类型,硬盘框130可能是串行连接小型计算机系统接口(serial attached small computer system interface,SAS)硬盘框,也可能是非易失性内存主机控制器接口规范(non-volatile memory express,NVMe)硬盘框,网际协议(internet protocol,IP)硬盘框以及其他类型的硬盘框。SAS硬盘框,采用SAS3.0协议,每个框支持25块SAS硬盘。引擎121通过板载SAS接口或者SAS接口模块与硬盘框130连接。NVMe硬盘框,更像一个完整的计算机系统,NVMe硬盘插在NVMe硬盘框内。NVMe硬盘框再通过远程直接内存访问(remote direct memory access,RDMA)端口与引擎121连接。
图2是本申请实施例提供的另一种内存管理装置的应用场景示意图。
图2所示的是一种盘控一体的集中式存储系统。在该系统中,引擎121具有硬盘槽位,硬盘134可直接部署在引擎121中,后端接口126属于可选配置,当系统的存储空间不足时,可通过后端接口126连接更多的硬盘或硬盘框。
图2中其他部分可以参见图1中的介绍,本申请不再重复赘述。
下面,本申请在图1和图2所示的应用场景的基础上,详细说明内存管理方法的流程。
图3是本申请实施例提供的一种内存故障隔离的场景示意图。
内存的存储单元按层级由大到小可以分为颗粒(chip)>阵列(bank)>行(row)/列(column)。内存故障往往伴随着整个行(row)、列(column)或是阵列(bank)故障,如果出现大量的故障或不可纠正错误(uncorrected error,UCE)风暴,就会导致内存修复失败进而影响控制器正常上电。
如图3所示,控制器0和控制器1中的内存124可以包括多个内存条,黑色竖线可以看作是正常的内存条,白色竖线可以看作是故障内存条。示例性地,控制器0中的单个内存条故障会导致整个控制器0故障进而引起控制器0冷复位,此时存储系统120中仅靠控制器1单控运行,可靠性大幅降低。单片机从没加电到加上电源,而自动产生的复位称为冷复位。本申请实施例在控制器的冷复位上电启动过程中,会对内存的健康情况进行检测,当内存出现故障时,采用内存隔离的方法对发生故障的内存区域进行隔离。隔离的级别包括内存条级、内存颗粒级、内存阵列级、内存行级和内存列级。将故障内存区域进行隔离后,由于内存的缩容,上层业务会受到一定程度的影响。上层业务指的是上层应用程序的运行所产生的业务,例如数据对象的创建等。当与上层业务有关的内存发生大面积故障时,可能导致业务无法继续运行。本申请提供的技术方案可以应用于图1或图2所示的存储系统。当出现内存故障导致控制器冷复位时,隔离该故障内存,设备正常上电,接着通过对控制器内存的流动分区进行缩容来减少对上层业务的影响。
图4是本申请实施例提供的一种内存管理方法的示例性流程图。
210,内存故障导致内存缩容。
如图3所示,在存储系统120中,当控制器内存出现严重的内存故障时会导致控制器冷复位。在控制器冷复位上电过程中,首先对内存条做健康度检查,当检测出该内存条健康度小于等于预设阈值时触发缩容流程。示例性地,可以设定触发缩容的健康度阈值为30%。例如,当检测出内存条中整个row、column或是bank故障且修复无效时,判定此时的内存条健康度为20%,触发缩容,隔离故障内存区域。
应理解,当控制器内存中出现可纠正错误时,如历史页面故障,可以直接通过错误检查和纠正(error checking and correcting,ECC)修复内存错误,此时的修复操作不影响设备的正常运行。只有当内存损坏到达一定程度,即健康度低于预设阈值时,才会隔离故障内存区域,触发缩容流程。
示例性地,图3中控制器0有50G的内存,当控制器0中有一个内存条出现了故障,且该故障内存条的容量大小为10G,则隔离该故障内存条后,控制器0的容量大小缩减为40G。
220,初始化内存管理分区。
图5是本申请实施例提供的一种内存访问管理的应用示意图。
控制器的内存访问管理可以包括处理器管理310和内存管理320。
节点1至节点n中每一个节点都可以看作一个CPU。处理器管理310中,可以利用多种方式来对内存访问进行管理。示例性地,可以利用非一致性内存访问(non-uniform memory access,NUMA)架构管理CPU对内存的访问。在NUMA架构中,每个CPU都拥有自己的一块“本 地”内存,其他CPU的内存则可以看作是“远程”内存。这种架构下,CPU访问自己的本地内存会有更低的延迟以及更高的性能表现。当使用NAMA架构时,处理器管理310中的每个节点都是非一致性内存访问节点(non-uniform memory access node,numa node)。可选的,还可以使用海量并行处理(massive parallel processing,MPP)架构来管理CPU对内存的访问,不同于NUMA架构的是,在MPP架构中,每个节点只访问本地内存,不能访问远程内存。
内存管理320将控制器中的内存分为至少一个流动分区和至少一个固定分区。每个固定分区的内存容量大小相同。其中,固定分区用于上层业务的运行,例如创建与应用程序相关的对象、创建和对象相关的数据结构等。流动分区是高速缓冲存储器的一个扩展分区,上层业务输入输出请求和元数据缓存等都从该分区申请内存页面,是承载控制器主要内存容量的一个分区。
可选的,可以用slab分配算法、简单块列表(simple list of blocks,slob)分配算法或者slub分配算法来对内存管理320中的内存进行分区。slab、slob和slub算法用于解决小粒度内存(小于4K)的分配问题。
图6是本申请实施例提供的一种内存分区的结构示意图。
应理解,每个固定分区都可以分为一个或多个内存单元,每个内存单元对应一个节点,即节点对应的内存单元为节点的本地内存。每一个内存单元都是控制器中一个或多个内存区域的集合。示例性地,固定分区1包括两个内存单元,第1内存单元和第2内存单元。第1内存单元对应节点1,第2内存单元对应节点2,即第1内存单元中的内存区域都属于节点1的本地内存,第2内存单元中的内存区域都属于节点2的本地内存,在NUMA架构中,numa node1访问第1内存单元的速度比访问第2内存单元的速度要快得多。
和固定分区类似,流动分区也可以分为一个或多个内存单元。流动分区中,内存单元的数量和节点的数量相等,第i内存单元对应节点i,即第i内存单元中的内存区域都属于节点i的本地内存。应理解,每一个内存单元都是控制器中一个或多个内存区域的集合,i大于等于1小于等于节点的数量。
本申请实施例中,在控制器0冷复位后,会根据正常可使用的内存重新分配固定分区和流动分区。对于缩容的部分,只减小流动分区的容量大小,其余固定分区的容量大小不变或者增加。
示例性地,控制器0缩容前的容量大小为100G,有3个固定分区,每个固定分区的容量大小为10G,流动分区的大小为70G。当出现内存故障导致控制器0冷复位时,隔离故障内存区域,若故障内存区域为10G,控制器0的内存容量缩容到90G。应理解,内存故障可以出现在固定分区也可以出现在流动分区。本申请实施例仅减小控制器0中流动分区的容量大小,即重建控制器0的内存分区时,令流动分区的大小为60G,3个固定分区中每个固定分区的容量大小依然为10G。
图7是本申请实施例提供的一种故障后内存缩容的示意图。
以NUMA架构为例,如图7所示,控制器内存出现故障前包括3个固定分区和一个流动分区。内存1~20都可以看作是一个或者多个内存区域的地址的集合。示例性地,内存1代表了内存条1的内存地址,内存2代表了内存条2的地址。为了便于描述,可以将故障前的固定分区称为第一分区,故障前的流动分区称为第二分区,故障后重新分配的固定分区称为第三分区,故障后重新分配的流动分区称为第四分区。假设控制器中整个内存条2出现了故障,即第一分区中的内存2发生故障,导致整个控制器冷复位,则在控制器冷复位启动时隔离故障内存2,将第二分区中的内存14划分给第一分区,内存14的容量大于等于内存2的容量。 可选的,还可以将第二分区中的13、15~20中任一个或多个划分给第一分区,或者将第二分区中任一个内存区域中的部分划分给内存2,例如将内存14中的部分内存划分给第一分区,只要保证第三分区的内存容量不小于第一分区的内存容量即可,即第二分区划分给第一分区的内存容量大于或者等于发生故障的内存2的容量。
示例性的,第一分区包括内存1~12,第二分区包括内存13~20。内存1~20的容量大小相等。若内存2出现了故障,在内存2故障后控制器冷复位重启时隔离故障内存2,将第二分区中的内存14划分给第一分区,生成新的第三分区和第四分区。第三分区包括内存1、14、3~12,第四分区包括13、15~20。
在控制器冷复位上电过程中,可以将内存故障区域进行隔离。可选的,可以使用内存隔离装置对故障内存区域进行隔离,或者还可以通过对故障内存区域标记来进行隔离。示例性地,在控制器冷复位时,首先通过主板128中的基本输入输出系统(basic input output system,BIOS)对内存进行检测,并获取故障内存区域的地址信息,将故障内存区域的地址信息保存到非易失性存储器(non-volatile memory,NVM)中,并读取NVM中保存的故障内存区域的地址信息,将该地址信息对应的故障内存区域标记为不可用,将故障内存区域隔离。BIOS将故障内存区域隔离后会向控制器的操作系统上报控制器中正常可用的内存地址用于内存管理。示例性地,若控制器0有100G的内存,2个节点,每个节点对应的本地内存有50G,当节点1对应的本地内存发生故障时,BIOS上报给操作系统的节点1对应的本地内存少于其它节点。
示例性地,节点1对应的本地内存在控制器冷复位之前容量为50G,即所有固定分区和流动分区中节点1对应的本地内存总容量为50G,其中,流动分区中节点1对应的第一内存单元的容量为30G,固定分区中节点1对应的第二内存单元的容量为20G。若节点1对应的本地内存(包括固定分区中的内存和流动分区中的内存)发生了内存故障,且该故障内存的容量为10G,则在控制器0冷复位后重建固定分区和流动分区时,可以将流动分区中节点1对应的第一内存单元的容量缩减为20G,即由流动分区中对应的内存单元承担缩减的内存容量。
示例性地,图7中固定分区1的内存1和内存2属于节点1对应的本地内存,内存1和内存2可以称为第一内存单元。流动分区的内存13、内存14和内存18属于节点1对应的本地内存,内存13、内存14和内存18可以称为第二内存单元。若内存2发生内存故障导致控制器冷复位,则在控制器冷复位启动时隔离故障内存2,将流动分区中的内存13、内存14或内存18中的全部或者部分内存划分给固定分区1。只要保证流动分区划分给第一内存单元的内存容量大于等于内存2的容量即可。
本申请实施例可以在运行上层业务的固定分区发生内存故障时隔离故障内存区域,确认该故障内存区域所在的内存单元对应的处理器,将该处理器对应的流动分区中的内存单元的内存分给固定分区,使得处理器在固定分区对应的内存单元在控制器冷复位之后的内存容量不小于冷复位之前的内存容量,处理器对内存的访问速度保持不变,减少内存故障对上层业务运行的影响。
可选的,在NUMA架构中,还可以由流动分区中其他的内存单元承担缩减的内存容量。示例性地,图7中固定分区1中的内存1和内存2属于numa node1对应的本地内存,内存1和内存2可以称为第一内存单元。流动分区的内存13、内存14和内存18属于numa node1对应的本地内存,内存13、内存14和内存18可以称为第二内存单元。流动分区中的内存17和内存19属于numa node2对应的本地内存,内存17和内存19可以称为第三内存单元。当第一内存单元中的内存2出现内存故障时,可以将第二内存单元或者第三内存单元中的内存 划分给第一内存单元,但是由于第三内存单元属于numa node1的远程内存,因此numa node1访问第三内存单元中的内存不如numa node1访问第一内存单元中的内存的速度快,即将第三内存单元中的内存划分给第一内存单元后,numa node1访问重新划分后的第一内存单元速度会下降。而第二内存单元中的内存属于numa node1的本地内存,因此将第二内存单元中的内存划分给第一内存单元后,numa node1访问重新划分后的第一内存单元速度不会下降。
本申请实施例中,在控制器发生故障隔离冷复位后,通过只缩减流动分区的容量而保持固定分区的容量不变甚至增加,可以在控制器冷复位后依然能够提供上层业务运行所需要的内存配置和规格,有效减少了内存故障对上层业务的影响。
230,流动分区页面管理。
在NUMA架构中,不同numa内的CPU核访问同一个位置的内存性能不同。如图6所示,numa node1访问第1内存单元的速度要比numa node2访问第1内存单元的速度快得多,因此在应用程序运行时要尽可能地避免跨numa访问内存。为了减少跨numa访问内存,可以通过设置线程的CPU亲和性来实现,创建内存分区时,在各个numa node上均分内存,实现内存分配的亲和性。例如,控制器共有100G内存,有2个numa node,则每个numa node对应50G内存。可选的,还可以设置反亲和性或者优先级来减少跨numa访问内存。
图8是本申请实施例提供的一种流动分区页面管理的示意图。
流动分区页面管理包括业务进程410和内存管理420。前端共享接口125发送的IO会根据亲和性优先从本numa node对应的流动分区中的内存单元进行页面申请,由于流动分区中各numa node对应的内存容量可能不均,为了保障流动分区中缩容后的numa node对应的内存单元不出现容量不够的情况,控制器的全局高速缓冲存储器cache可以按照numa node为粒度创建分布式虚拟内存系统(Vcache)。示例性地,存储系统120有3个numa node,则创建3个Vcache,Vcache1至Vcache3。各Vcache周期性查询流动分区中本numa node对应的内存单元的内存利用率,若超过预设阈值就进行页面淘汰。例如,vcache3查询到numa node3对应的流动分区中第3内存单元的内存利用率为90%,预设阈值为80%,则淘汰第3内存单元的部分或全部内存页面,直至第3内存单元的内存利用率在80%以下。从而保证缩容后,上层业务始终能在此numa node对应的内存上申请到缓存页面,不产生跨numa node申请。本申请实施例可以使用页面置换算法来进行页面淘汰,包括最佳置换算法、先进先出置换算法、最近最久未使用算法和时钟置换算法。
图9是本申请实施例提供的一种业务转发的场景示意图。
图4中的220和230步骤都是在控制器0复位的情况下实现的。在控制器0发生内存故障导致冷复位这一期间,本申请实施例通过利用前端共享接口125的转发功能,实现上层业务的无缝切换。
前端共享接口125包括连接服务510和转发服务520。连接服务510用于和应用服务器100建立连接,并接收应用服务器100发来的I/O请求。转发服务520可以通过总线接口,例如外部设备互连总线接口(peripheral component interconnect express,PCIE)将I/O请求转发给对应的控制器来执行。应理解,PCIE仅为一种总线接口实例,总线接口也可以是其他接口。
示例性地,t0时刻控制器0中的一个内存条出现了故障,t1时刻控制器0冷复位开始上电,t2时刻控制器0重启且重建了固定分区和流动分区。则在t0到t2期间,转发服务520将原本转给控制器0处理的I/O请求转发给控制器1来处理。t2之后,控制器0隔离故障内存后恢复正常,转发服务520将对应的I/O请求转发给控制器0处理。
以上描述了根据本申请实施例的内存管理方法,下面分别结合图10和图11描述根据本申请实施例的装置和设备。
本申请实施例还提供了一种计算机存储介质,该计算机存储介质中存储有程序指令,所述程序执行时可包括如图4、图5、图7、图8、图9对应实施例中的内存管理方法的部分或全部步骤。
图10为本申请实施例提供的一种计算机装置1000的结构示例图。该计算机装置1000包括获取模块1010和处理模块1020。
其中,处理模块1020,用于隔离故障内存区域和重新分配所述第一控制器的内存,执行图4的方法、图5的方法、图7的方法、图9的方法中的部分或全部步骤。
获取模块1010,用于获取内存单元的内存利用率,执行图8方法中部分或全部步骤。
图11为本申请实施例提供的另一种计算机装置1300的结构示例图。计算机装置1300包括处理器1302、通信接口1303和存储器1304。计算机装置1300的一种示例为芯片。计算机装置1300的另一种示例为计算设备。
在某些实施方式中,计算机装置1300还可以包括转发接口1305。转发接口1305用于接收前端共享接口125发来的输入输出请求,然后将该输入输出请求转发给处理器1302进行处理。以图1为例,存储器1304的固定分区中出现内存故障导致控制器0冷复位,控制器0中主板128中的BIOS隔离该故障内存区域。控制器0冷复位重启时CPU123会重新划分固定分区和流动分区,将流动分区中的内存区域划分给固定分区,使得控制器0冷复位之后的固定分区的容量不小于控制器0冷复位之前的固定分区的容量。在控制器0冷复位期间,前端共享接口125检测到控制器0正在冷复位,不会将应用服务器100的输入输出请求发送给计算机装置1300的转发接口1305。在控制器0冷复位之后,即完成固定分区和流动分区的重建之后,前端共享接口125检测到控制器0恢复正常,会将应用服务器100的输入输出请求发送给计算机装置1300的转发接口1305,转发接口1305将接收到的输入输出请求转发给CPU123进行处理,响应上层业务的请求。
上述本申请实施例揭示的方法可以应用于处理器1302中,或者由处理器1302实现。处理器1302可以是中央处理器(central processing unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。在实现过程中,上述方法的各步骤可以通过处理器1302中的硬件的集成逻辑电路或者软件形式的指令完成。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。
存储器1304可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、 双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
处理器1302、存储器1304、转发接口1305和通信接口1303之间可以通过总线通信。存储器1304中存储有可执行代码,处理器1302读取存储器1304中的可执行代码以执行对应的方法。存储器1304中还可以包括操作系统等其他运行进程所需的软件模块。操作系统可以为LINUXTM,UNIXTM,WINDOWSTM等。
例如,存储器1304中的可执行代码用于实现图4、图5、图7、图8、图9所示的方法,处理器1302读取存储器1304中的该可执行代码以执行图4、图5、图7、图8、图9所示的方法。
在本申请的一些实施例中,所公开的方法可以实施为以机器可读格式被编码在计算机可读存储介质上的或者被编码在其它非瞬时性介质或者制品上的计算机程序指令。图12示意性地示出根据这里展示的至少一些实施例而布置的示例计算机程序产品的概念性局部视图,所述示例计算机程序产品包括用于在计算设备上执行计算机进程的计算机程序。在一个实施例中,示例计算机程序产品1400是使用信号承载介质1401来提供的。所述信号承载介质1401可以包括一个或多个程序指令1402,其当被一个或多个处理器运行时可以提供以上针对图4、图5、图7、图8、图9所示的方法中描述的功能或者部分功能。因此,例如,参考图4、图5、图7、图8、图9中所示的实施例,其中的一个或多个特征可以由与信号承载介质1401相关联的一个或多个指令来承担。
在一些示例中,信号承载介质1401可以包含计算机可读介质1403,诸如但不限于,硬盘驱动器、紧密盘(CD)、数字视频光盘(DVD)、数字磁带、存储器、只读存储记忆体(read-only memory,ROM)或随机存储记忆体(random access memory,RAM)等等。在一些实施方式中,信号承载介质1401可以包含计算机可记录介质1404,诸如但不限于,存储器、读/写(R/W)CD、R/W DVD、等等。在一些实施方式中,信号承载介质1401可以包含通信介质1405,诸如但不限于,数字和/或模拟通信介质(例如,光纤电缆、波导、有线通信链路、无线通信链路、等等)。因此,例如,信号承载介质1401可以由无线形式的通信介质1405(例如,遵守IEEE 802.11标准或者其它传输协议的无线通信介质)来传达。一个或多个程序指令1402可以是,例如,计算机可执行指令或者逻辑实施指令。在一些示例中,前述的计算设备可以被配置为,响应于通过计算机可读介质1403、计算机可记录介质1404、和/或通信介质1405中的一个或多个传达到计算设备的程序指令1402,提供各种操作、功能、或者动作。应该理解,这里描述的布置仅仅是用于示例的目的。因而,本领域技术人员将理解,其它布置和其它元素(例如,机器、接口、功能、顺序、和功能组等等)能够被取而代之地使用,并且一些元素可以根据所期望的结果而一并省略。另外,所描述的元素中的许多是可以被实现为离散的或者分布式的组件的、或者以任何适当的组合和位置来结合其它组件实施的功能实体。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置 和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (13)

  1. 一种内存管理方法,其特征在于,包括:
    确定第一分区发生内存故障且所述内存故障导致第一计算机设备的第一控制器冷复位,所述第一控制器包括第二分区和所述第一分区,所述第一分区用于上层业务的运行,所述第二分区用于所述上层业务的数据缓存;
    隔离故障内存区域,所述故障内存区域为所述第一分区中发生所述内存故障的内存区域;
    重新分配所述第一控制器的内存,得到第三分区和第四分区,所述第三分区用于所述上层业务的运行,所述第四分区用于所述上层业务的数据缓存,所述第三分区的内存容量大于等于所述第一分区的内存容量,所述第四分区和所述第三分区的内存容量之和等于所述第一控制器的内存容量减去所述故障内存区域的容量。
  2. 根据权利要求1所述的方法,其特征在于,所述重新分配所述第一控制器的内存,得到第三分区和第四分区,包括:
    确定目标处理器,所述目标处理器是所述第一控制器包括的多个处理器中的一个,所述目标处理器与第一内存单元对应,所述第一内存单元为所述第一分区包括的多个内存单元中包括所述故障内存区域的内存单元;
    根据所述目标处理器,确定至少一个第二内存单元,所述第二内存单元为所述第二分区包括的多个内存单元中与所述目标处理器对应的内存单元,所述至少一个第二内存单元的容量不小于所述故障内存区域的容量;
    根据所述第一内存单元和所述至少一个第二内存单元,重新分配所述第一控制器的内存,得到所述第三分区和所述第四分区,所述第三分区包括所述至少一个第二内存单元。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    获取K个内存单元中每个内存单元的内存利用率,所述K个内存单元属于所述第四分区,K为大于等于1的整数;
    若所述K个内存单元中第k内存单元的内存利用率超过预设阈值,则释放所述第k内存单元的全部或者部分内存资源,k为大于等于1小于等于K的整数。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,在所述重新分配所述第一控制器的内存,得到第三分区和第四分区之前,所述方法还包括:
    接收第二计算机设备的第一输入输出请求;
    将所述第一输入输出请求转发至所述第一计算机设备的除所述第一控制器以外的其他控制器,所述第一计算机设备包括至少两个控制器。
  5. 根据权利要求4所述的方法,其特征在于,在所述重新分配所述第一控制器的内存,得到第三分区和第四分区之后,所述方法还包括:
    接收所述第二计算机设备的第二输入输出请求;
    将所述第二输入输出请求转发至所述第一计算机设备的任意一个或多个所述控制器。
  6. 一种计算机装置,其特征在于,包括:
    处理模块,用于确定第一分区发生内存故障且所述内存故障导致第一计算机设备的第一控制器冷复位,所述第一控制器包括第二分区和所述第一分区,所述第一分区用于上层业务的运行,所述第二分区用于所述上层业务的数据缓存;
    所述处理模块,还用于隔离故障内存区域,所述故障内存区域为所述第一分区中发生所述内存故障的内存区域;
    所述处理模块,还用于重新分配所述第一控制器的内存,得到第三分区和第四分区,所 述第三分区用于所述上层业务的运行,所述第四分区用于所述上层业务的数据缓存,所述第三分区的内存容量大于等于所述第一分区的内存容量,所述第四分区和所述第三分区的内存容量之和等于所述第一控制器的内存容量减去所述故障内存区域的容量。
  7. 根据权利要求6所述的装置,其特征在于,所述处理模块,具体用于:
    确定目标处理器,所述目标处理器是所述第一控制器包括的多个处理器中的一个,所述目标处理器与第一内存单元对应,所述第一内存单元为所述第一分区包括的多个内存单元中包括所述故障内存区域的内存单元;
    根据所述目标处理器,确定至少一个第二内存单元,所述第二内存单元为所述第二分区包括的多个内存单元中与所述目标处理器对应的内存单元,所述至少一个第二内存单元的容量不小于所述故障内存区域的容量;
    根据所述第一内存单元和所述至少一个第二内存单元,重新分配所述第一控制器的内存,得到所述第三分区和所述第四分区,所述第三分区包括所述至少一个第二内存单元。
  8. 根据权利要求6或7所述的装置,其特征在于,还包括:获取模块,用于获取K个内存单元中每个内存单元的内存利用率,所述K个内存单元属于所述第四分区,K为大于等于1的整数;
    所述处理模块,还用于:若所述K个内存单元中第k内存单元的内存利用率超过预设阈值,则释放所述第k内存单元的全部或者部分内存资源,k为大于等于1小于等于K的整数。
  9. 根据权利要求6至8任一项所述的装置,其特征在于,所述处理模块,具体用于:
    接收第二计算机设备的第一输入输出请求;
    将所述第一输入输出请求转发至所述第一计算机设备的除所述第一控制器以外的其他控制器,所述第一计算机设备包括至少两个控制器。
  10. 根据权利要求9所述的装置,其特征在于,所述处理模块,具体用于:
    接收所述第二计算机设备的第二输入输出请求;
    将所述第二输入输出请求转发至所述第一计算机设备的任意一个或多个所述控制器。
  11. 一种计算机设备,其特征在于,包括:处理器,所述处理器用于与存储器耦合,读取并执行所述存储器中的指令和/或程序代码,以执行如权利要求1-5中任一项所述的方法。
  12. 一种芯片系统,其特征在于,包括:逻辑电路,所述逻辑电路用于与输入/输出接口耦合,通过所述输入/输出接口传输数据,以执行如权利要求1-5中任一项所述的方法。
  13. 一种计算机可读介质,其特征在于,所述计算机可读介质存储有程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行如权利要求1-5中任一项所述的方法。
PCT/CN2023/077012 2022-03-10 2023-02-18 内存管理方法和装置 WO2023169185A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210240326.6A CN116774911A (zh) 2022-03-10 2022-03-10 内存管理方法和装置
CN202210240326.6 2022-03-10

Publications (1)

Publication Number Publication Date
WO2023169185A1 true WO2023169185A1 (zh) 2023-09-14

Family

ID=87937195

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/077012 WO2023169185A1 (zh) 2022-03-10 2023-02-18 内存管理方法和装置

Country Status (2)

Country Link
CN (1) CN116774911A (zh)
WO (1) WO2023169185A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117667603A (zh) * 2024-01-30 2024-03-08 苏州元脑智能科技有限公司 内存容量调整方法、装置、服务器、电子设备和存储介质
CN118035140A (zh) * 2024-04-11 2024-05-14 中诚华隆计算机技术有限公司 一种服务器内存通道的切换系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140325261A1 (en) * 2013-04-26 2014-10-30 Lsi Corporation Method and system of using a partition to offload pin cache from a raid controller dram
CN109408222A (zh) * 2017-08-18 2019-03-01 深圳天珑无线科技有限公司 智能终端及其空间管理方法、具有存储功能的装置
CN110737924A (zh) * 2018-07-20 2020-01-31 中移(苏州)软件技术有限公司 一种数据保护的方法和设备
CN111177024A (zh) * 2019-12-30 2020-05-19 青岛海尔科技有限公司 一种内存优化处理方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140325261A1 (en) * 2013-04-26 2014-10-30 Lsi Corporation Method and system of using a partition to offload pin cache from a raid controller dram
CN109408222A (zh) * 2017-08-18 2019-03-01 深圳天珑无线科技有限公司 智能终端及其空间管理方法、具有存储功能的装置
CN110737924A (zh) * 2018-07-20 2020-01-31 中移(苏州)软件技术有限公司 一种数据保护的方法和设备
CN111177024A (zh) * 2019-12-30 2020-05-19 青岛海尔科技有限公司 一种内存优化处理方法及装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117667603A (zh) * 2024-01-30 2024-03-08 苏州元脑智能科技有限公司 内存容量调整方法、装置、服务器、电子设备和存储介质
CN117667603B (zh) * 2024-01-30 2024-04-05 苏州元脑智能科技有限公司 内存容量调整方法、装置、服务器、电子设备和存储介质
CN118035140A (zh) * 2024-04-11 2024-05-14 中诚华隆计算机技术有限公司 一种服务器内存通道的切换系统
CN118035140B (zh) * 2024-04-11 2024-06-11 中诚华隆计算机技术有限公司 一种服务器内存通道的切换系统

Also Published As

Publication number Publication date
CN116774911A (zh) 2023-09-19

Similar Documents

Publication Publication Date Title
US11444641B2 (en) Data storage system with enforced fencing
US11438411B2 (en) Data storage system with redundant internal networks
US11467732B2 (en) Data storage system with multiple durability levels
US11137940B2 (en) Storage system and control method thereof
WO2023169185A1 (zh) 内存管理方法和装置
US8489914B2 (en) Method apparatus and system for a redundant and fault tolerant solid state disk
US20190235777A1 (en) Redundant storage system
US20180341419A1 (en) Storage System
US20210064234A1 (en) Systems, devices, and methods for implementing in-memory computing
US11262918B1 (en) Data storage system with uneven drive wear reduction
US10782898B2 (en) Data storage system, load rebalancing method thereof and access control method thereof
US11593000B2 (en) Data processing method and apparatus
JP6652647B2 (ja) ストレージシステム
US20230251931A1 (en) System and device for data recovery for ephemeral storage
CN105468296A (zh) 基于虚拟化平台的无共享存储管理方法
US11416403B2 (en) Method and apparatus for performing pipeline-based accessing management in storage server with aid of caching metadata with hardware pipeline module during processing object write command
US20210271393A1 (en) Method and apparatus for performing data access management of all flash array server
US10782989B2 (en) Method and device for virtual machine to access storage device in cloud computing management platform
US11086379B2 (en) Efficient storage system battery backup usage through dynamic implementation of power conservation actions
CN107515723B (zh) 用于管理存储系统中的存储器的方法和系统
US11487654B2 (en) Method for controlling write buffer based on states of sectors of write buffer and associated all flash array server
WO2023143039A1 (zh) 一种数据处理方法及装置
US20220398156A1 (en) Distributed multi-level protection in a hyper-converged infrastructure
CN115858237A (zh) 一种数据处理方法以及存储设备
Baek A byte direct i/o for ram-based storages

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23765763

Country of ref document: EP

Kind code of ref document: A1