WO2023051715A1 - 数据处理的方法、装置、处理器和混合内存系统 - Google Patents

数据处理的方法、装置、处理器和混合内存系统 Download PDF

Info

Publication number
WO2023051715A1
WO2023051715A1 PCT/CN2022/122693 CN2022122693W WO2023051715A1 WO 2023051715 A1 WO2023051715 A1 WO 2023051715A1 CN 2022122693 W CN2022122693 W CN 2022122693W WO 2023051715 A1 WO2023051715 A1 WO 2023051715A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
data
medium
page
migration
Prior art date
Application number
PCT/CN2022/122693
Other languages
English (en)
French (fr)
Inventor
祝晓平
陈欢
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22875096.4A priority Critical patent/EP4390648A1/en
Publication of WO2023051715A1 publication Critical patent/WO2023051715A1/zh
Priority to US18/611,664 priority patent/US20240231669A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • G06F3/0649Lifecycle management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0653Monitoring storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0685Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays

Definitions

  • the present application relates to the computer field, in particular to a data processing method, device, processor and hybrid memory system.
  • the number of cores of a single processor has gradually increased, but the number of memory channels through which the processor accesses memory has not increased accordingly.
  • the maximum number of cores of a single central processing unit (CPU) It can reach 64, while the number of memory channels is only 8, which leads to the memory bandwidth and memory capacity allocated to each processor core becoming smaller and smaller with the increase of the number of processor cores, making the memory performance serious.
  • Limiting the performance of the CPU the problem of the memory wall has become increasingly prominent.
  • An open memory interface (open memory interface, OMI) is proposed to remove the double data rate controller (double data rate controller, DDRC) and double data rate physical layer (double data rate controller physical layer, DDR PHY) in the CPU ), and realize the functions of open memory interface, DDRC and DDR PHY in the memory.
  • the above-mentioned memory module is also called differential dual in-line memory module (DDIMM).
  • DDIMM differential dual in-line memory module
  • the memory interface is connected with the CPU. Since DDRC and DDR PHY in the CPU are removed, and OMI is an interface implemented based on a serial bus interface, this increases the number of pins (PINs) that can be used to expand memory channels in the CPU, for example, a single POWER9 can provide 16 memory channels.
  • the processor that supports OMI in the above-mentioned scheme only supports the memory granule of dynamic random access memory (dynamic random access memory, DRAM) type, and the price of this type of memory granule is higher, for big data (for example, Apache SparkTM ), in-memory databases (for example, Redis), or cloud services (for example, virtual machines provided by the memory over-allocation mechanism in cloud infrastructure), etc., which require the deployment of large-capacity memory scenarios, the cost is higher. Therefore, how to provide a low-cost memory wall solution has become an urgent technical problem to be solved.
  • big data for example, Apache SparkTM
  • in-memory databases for example, Redis
  • cloud services for example, virtual machines provided by the memory over-allocation mechanism in cloud infrastructure
  • This application provides a data processing method, device, processor and hybrid memory system, which can provide a low-cost memory wall solution
  • a method applied to a hybrid memory system includes multiple different types of memory media, for example, a DRAM-type memory medium and an SCM-type memory medium, and the processor can obtain data from the hybrid memory system. distribution; and determine the data migration method according to the above data distribution, and the data migration method is used to realize the migration processing of the migration data set between different memory media according to the data distribution; finally, perform the migration of the migration data set according to the migration method deal with.
  • this application proposes to use different types of memory media to form a hybrid memory system, and determine the data migration method based on the data distribution between different types of memory media, and then combine different types of memory media. Attributes enable data migration, which not only meets the computing power requirements of the processor, but also enables low-cost memory media, and reduces the cost of the entire hybrid memory system while ensuring data processing delay.
  • data distribution refers to how data is stored in different types of memory media, which can be determined in the following two ways:
  • Mode 1 determine the data distribution according to the degree of hotness and coldness of the data, that is, store the data in different types of storage media according to the degree of hotness and coldness of the data.
  • Mode 2 determined according to the degree of hotness and coldness of the data and the physical properties of the memory medium, where the physical properties include at least one of delay, cost, capacity and lifetime.
  • the hybrid memory system also includes multiple processors.
  • memory resources can be allocated for each processor, and the LRU linked list is used to record the degree of coldness and hotness of memory pages, and then based on the above-mentioned hotness and coldness Execute data migration processing to a certain extent.
  • the memory layout of the hybrid memory system which includes the number and type of memory deployed in the memory system; then, allocate memory resources for each processor according to the memory layout, and use the latest and most recently used LRU linked list to record each processing
  • the temperature of the data stored in the allocated memory resources of the processor, and the LRU linked list includes the active list Active list and the inactive list Inactive list, where the Active list is used to identify the information of the memory page where the hot data associated with the processor is located, and the Inactive list Information used to identify the memory page where the processor-associated cold data resides.
  • the management method of the LRU linked list you can know the degree of hotness and coldness of the data stored in the memory page, and then perform data migration processing between different memory media according to the attribute label of the memory page.
  • the linked list management method of LRU provides a management method of memory page access heat, which is convenient for performing reasonable migration operations according to data attributes during the use of memory media, and improves the performance of the entire system. Processing efficiency, reducing the processing delay of memory access.
  • the LRU linked list can also traverse the data distribution of memory pages in different memory media associated with each processor in the hierarchical storage system through scanning requests, where the data distribution includes memory pages in different memory media
  • the status of the memory page includes hot page, cold page or free page. That is, through the management of the LRU linked list, the state of the memory page can be further known, hot pages and cold pages can be distinguished, and data migration operations can be performed according to the state of the memory page, so that hot data is stored in low-latency memory media, and cold data Stored in low-cost memory media, it not only ensures the efficiency of data access, but also reduces the memory cost of the entire system.
  • different memory media may respectively build memory pools, each memory pool includes multiple huge page memories, and the size of each huge page memory is greater than the first threshold.
  • each memory pool includes multiple huge page memories, and the size of each huge page memory is greater than the first threshold.
  • memory access is accessed at the granularity of large pages.
  • small page memory the amount of data read each time increases, improving the The efficiency of data processing.
  • data migration can be further combined with the memory classification mechanism, which can further improve the processing speed of the entire system and reduce memory costs at the same time.
  • the distribution of data in the memory medium may also periodically count the hotness and coldness of each memory page in different types of memory media through scanning requests.
  • the data migration process can be dynamically realized, so that the hot and cold attributes of the data are associated with the characteristics of low latency and low cost of the memory media, ensuring frequent access
  • the hot data is stored in low-latency memory media to improve the efficiency of data access.
  • the cold data that is accessed less often is stored in low-cost memory media, thereby reducing the memory media in the entire system. cost.
  • the hotness of each memory page can be obtained in the following ways: within a unit cycle, count the number of times data in each memory page is read; when the data in any memory page is executed During a read operation, the temperature of the memory page is increased by one, and the temperature can indicate the degree of hotness or coldness of the data in the memory page being accessed within the unit cycle.
  • the degree of hotness and coldness of the memory page can be clarified, and then the data migration operation can be performed according to the hot and cold attributes of the data, and different types of memory media resources can be reasonably used.
  • the migration data set includes a first data set, and the first data set includes at least one piece of hot data, where the number of data reads and writes in a unit cycle is greater than a first threshold.
  • the hybrid memory system may determine the data migration method according to the following method: first determine the hierarchical memory mechanism in the hybrid memory system, and the hierarchical memory mechanism is used to indicate the different types of memory media in the hybrid memory system.
  • Hierarchy for example, a hybrid memory system includes first and second hierarchies; then, data migration methods are determined based on data distribution and hierarchical memory mechanisms.
  • data can be further combined with the physical properties of the memory media to store data
  • different types of memory media can be classified according to the physical properties of the memory media such as delay, cost, life, and capacity , combined with the degree of hot and cold data for data storage, can make full use of the advantages of memory media to realize the classified management of hot data and cold data, and improve the utilization rate of memory media.
  • delay-sensitive data can also be stored in low-latency memory media according to application requirements and physical properties of memory media, and non-delay-sensitive data can be stored in In the low-cost memory medium, it not only ensures the data processing efficiency but also reduces the cost of the memory medium in the whole system.
  • the data migration method includes: selecting one or more free memory from the first memory pool page, migrating the first data set including the data of the hot page in the second memory medium to one or more free memory pages in the first memory pool.
  • the migration operation before performing the migration operation, it is also possible to first determine whether the number of free pages in the first memory medium is greater than the number of hot pages in the second storage medium, when the number of free pages in the first memory medium is greater than the number of hot pages in the second storage medium When the number of hot pages is large, migrate all the hot page data to the first memory medium, or migrate part of the hot page data to the first memory medium.
  • the first memory medium belongs to the first level
  • the second memory medium belongs to the second level
  • the migration data set further includes a second data set
  • the second data set includes at least one cold data
  • Cold data is data whose number of reads and writes in a unit cycle is less than or equal to the second threshold
  • the data migration method includes: selecting one or more free memory pages from the second memory pool, and will include cold pages in the first memory medium
  • the second data set of data in the above-mentioned second memory pool is migrated to the one or more free memory pages in the above-mentioned second memory pool. That is to say, the cold data of cold pages can be migrated to low-cost memory media, so that low-latency memory media can store latency-sensitive data first, and the resources of memory media can be fully utilized in combination with different physical properties of memory media.
  • the migration operation before performing the migration operation, it is also possible to first determine whether the number of free pages in the first memory medium is less than or equal to the number of hot pages in the second memory medium, and when the number of free pages in the first memory medium is less than or equal to When it is equal to the number of hot pages in the second memory medium, migrate the second data set including data in the cold pages in the first memory medium to the one or more free memory pages in the second memory pool.
  • some data in the cold page data may also be migrated to the second memory medium.
  • the first data set including the data of the hot page in the second memory medium may also be migrated to one or more idle memory pools in the above-mentioned first memory pool. memory pages. That is, after the cold data is migrated in the low-latency memory mechanism, it can continue to judge whether there is hot data in the low-cost memory medium, and then migrate the hot data to the low-latency memory medium.
  • memory medium resources can be more reasonably utilized in combination with the physical properties of the memory medium.
  • the latency of the first memory medium is lower than the latency of the second memory medium, and the cost of the first memory medium is higher than that of the second memory medium.
  • the lifetime of the first storage medium is higher than that of the first memory medium, and the capacity of the first storage medium is lower than the capacity of the second memory medium.
  • the above-mentioned first memory medium is a dynamic random access memory DRAM
  • the second memory medium is a storage class memory SCM
  • the SCM includes a phase change memory PCM, a magnetic random access memory MRAM, a resistive random access memory RRAM, iron At least one of electric memory FRAM, fast NAND or nano random access memory NRAM.
  • the processor is connected to multiple different types of memory media through an interface that supports memory semantics, and the interface includes at least one interface that supports computer fast link CXL, cache coherent interconnection protocol CCIX, or unified bus UB .
  • the above hybrid memory system is a server or a server cluster, and the server cluster includes two or more servers.
  • the hybrid memory system is applied to a scenario of deploying large-capacity memory, and the scenario includes at least one of big data, in-memory database, or cloud service.
  • the present application provides a data processing device, and the data processing device includes various modules for performing the operation steps of the data processing method in the first aspect or any possible implementation manner of the first aspect.
  • the present application provides a processor, the processor includes an integrated circuit, the integrated circuit is connected to multiple different types of memory media, and the integrated circuit is used to implement the first aspect or the first aspect. In one aspect, any one of the operational steps of the data processing method in the possible implementation modes.
  • the present application provides a hierarchical storage system
  • the hierarchical storage system includes a processor and various memory media, the processor and the various memory media are connected through a bus and complete mutual communication, the Any one of the various memory media is used to store computer-executable instructions.
  • the processor executes the computer-executable instructions in the memory to use hardware resources in the hierarchical storage system to execute the first step. Operation steps of the method described in one aspect or any possible implementation manner of the first aspect.
  • the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer, it causes the computer to execute the methods described in the above aspects.
  • the present application provides a computer program product containing instructions, which, when run on a computer, causes the computer to execute the methods described in the above aspects.
  • FIG. 1 is a schematic diagram of the architecture of a cloud service scenario provided by the present application
  • Fig. 2 is a system architecture diagram of a kind of big data and in-memory database provided by the application;
  • Fig. 3 is a structural diagram of a hybrid memory system provided by the present application.
  • FIG. 4 is a schematic diagram of implementing a hierarchical memory mechanism in a hybrid memory system provided by the present application
  • FIG. 5 is a schematic structural diagram of a least recently used linked list provided by the present application.
  • FIG. 6 is a schematic flow chart of a data processing method provided by the present application.
  • FIG. 7 is a schematic structural diagram of a data processing device provided by the present application.
  • SCM Storage-class-memory
  • PCM phase-change memory
  • memory-level storage media also include magnetic random access memory (magnetoresistive random-access memory, MRAM), resistive random access memory (resistive random access memory, RRAM/ReRAM), ferroelectric random access memory (FRAM) ), fast NAND (fast NAND), nano random access memory (Nano-RAM, NRAM) and other types.
  • MRAM magnetic random access memory
  • RRAM/ReRAM resistive random access memory
  • FRAM ferroelectric random access memory
  • fast NAND fast NAND
  • nano random access memory Nano-RAM, NRAM
  • Kernel mode also known as host kernel mode, needs to restrict access between different programs to prevent them from obtaining memory data of other programs or data of peripheral devices.
  • the processor divides two permission levels: user mode and kernel mode. When a task or a process executes in kernel code by executing a system call, the process is said to be in kernel mode. At this time, the processor is executing in the kernel code with the highest privilege level. In the kernel mode, the processor can access all data in the memory, including peripheral devices, such as memory, network card, etc.
  • the user state can also be called the host user state.
  • the processor runs in the user code with the lowest privilege level, and can only use the conventional processor instruction set, and cannot use the processor instruction set for operating hardware resources.
  • the processor can only access memory in a limited manner, and is not allowed to access peripheral devices, such as input/output (IO) read and write, network card access, memory application, etc.
  • IO input/output
  • the guest user mode is the user mode of the virtual machine.
  • the virtual machine is a virtual operating system running on the physical host.
  • the kernel mode and user mode are also defined in the virtual machine operating system. This is also to limit the permissions in the virtual machine.
  • the division method is the same as The above-mentioned host kernel mode is similar to the host user mode, and will not be repeated here.
  • this application provides a method for storing data based on a variety of different types of memory media to implement a hierarchical memory mechanism. Memory expansion is achieved by supporting different types of memory media, and the data migration method is determined according to the data distribution in the hybrid memory system (also called a hierarchical memory system), so as to realize the migration of data sets with hot and cold attribute identification between different memory media migration processing.
  • DRAM dynamic random access memory
  • FIG. 1 a schematic diagram of the architecture of the application scenarios involved in this application is introduced respectively with reference to FIG. 1 and FIG. 2 .
  • FIG. 1 is a schematic architecture diagram of a cloud service scenario provided by the present application.
  • a system 100 includes hardware 110 , a host kernel state 120 , a host user state 130 and a client user state 140 .
  • hardware 310 includes multiple processors (eg, processor 113 and processor 114 ), and different types of memory media (eg, DRAM and SCM).
  • processors eg, processor 113 and processor 114
  • memory media eg, DRAM and SCM
  • the host kernel state 120 includes a virtual machine monitor (Hypervisor), and the virtual machine monitor includes a kernel virtual machine (Kernel-based Virtual Machine, KVM) 1211 and least recently used (least latest use, LRU) 1212.
  • KVM Kernel-based Virtual Machine
  • LRU least recently used
  • the kernel virtual machine 1211 is used to manage the virtual machine 132 and the virtual machine 133 in the host user state 130 .
  • the virtual machine monitor 121 is also used to manage the least recently used 1212 , and the least recently used 1212 is used to count the activeness of data access in memory pages of the DRAM 311 and the SCM 312 .
  • the host user state 130 includes a policy module 131 and virtual machines (for example, a virtual machine 132 and a virtual machine 133), each of which runs one or more applications in the guest user state 140 (for example, an application 141 and an application 142 run on a virtual machine). In the virtual machine 132, the application 143 and the application 144 run in the virtual machine 133).
  • the policy module 131 is used to determine the data migration mode based on the acquired data distribution.
  • the policy module 131 may also provide the user with a configuration interface for configuring the cycle of data migration, so as to trigger the hybrid memory system to perform the migration operation according to the cycle obtained through the configuration interface.
  • FIG. 2 is a system architecture diagram of a big data and in-memory database provided by the present application.
  • the system 200 includes hardware 210 , host kernel mode 220 and host user mode 230 .
  • the host kernel state 220 includes a virtual machine monitor (Hypervisor) 221 for managing virtualization software in the system 200 and least recently used 2211 management.
  • the virtual machine monitor 221 includes a kernel virtual machine (KVM) 4212 and a least recently used (LRU) 2211, wherein the function of the least recently used 2211 is the same as that of the least recently used 1212 shown in FIG.
  • KVM kernel virtual machine
  • LRU least recently used
  • the host user state 230 runs a business operating system 231, that is, the operating system that runs the big data application 201 or the database application 202.
  • the business operating system 231 includes a scheduler 2311 and a memory manager 2312, wherein the scheduler 2311 is used to obtain dynamic random The data distribution of the accessor 412 and the SCM 413, and determining the data migration method according to the above data distribution, and notifying the memory manager 2312 to execute the migration process of the migration data set according to the data migration method.
  • the scenarios of cloud services, big data, or memory databases shown in Figure 1 or Figure 2 can be deployed in the form of a single server, or can be deployed in the form of a server cluster, where the server cluster includes multiple servers.
  • the hybrid memory system can be any server or server cluster in a scenario where cloud services, big data, or in-memory data are deployed.
  • a server As shown in the figure, the hybrid memory system 300 includes a processor 301 and various types of memory media, for example, a memory medium 302 and a memory medium 303, and the memory medium 302 may be a DRAM, and the memory medium 303 may be an SCM.
  • each type of memory medium may include multiple, for example, the memory medium 302 may include DRAM2021, DRAM2022, DRAM2023 and DRAM2024;
  • the medium 303 may include SCM3031 , SCM3032 , SCM3033 and SCM3034 .
  • the processor 301 is connected to the memory medium 302 and the memory medium 303 through interfaces supporting memory semantics, wherein the interfaces supporting memory semantics include supporting memory interconnection (Compute Express Link TM , CXL), cache coherent interconnection protocol (Cache Coherent Interconnect for Accelerators, CCIX) or unified bus (unified bus, UB or Ubus) at least one interface.
  • interfaces supporting memory semantics include supporting memory interconnection (Compute Express Link TM , CXL), cache coherent interconnection protocol (Cache Coherent Interconnect for Accelerators, CCIX) or unified bus (unified bus, UB or Ubus) at least one interface.
  • the processor 301 further includes multiple processor cores, and an integrated memory controller (integrated memory controller, iMC) 3014 for implementing memory management and control.
  • Multiple processor cores can be further divided into multiple computing clusters, each computing cluster includes at least one processor core, for example, as shown in Figure 2, computing cluster 3011 includes processor core 30111, processor core 30112, processor A core 30113 and a processor core 30114 ; the computing cluster 3012 includes a processor core 30121 , a processor core 30122 , a processor core 30123 and a processor core 30124 .
  • Multiple computing clusters communicate through a network on chip (NoC) 3013, and the network on chip 3013 is used to realize communication between processor cores in different computing clusters.
  • NoC network on chip
  • the network-on-chip 3013 may be a node controller (node controller, NC); A chip or logic circuit used to implement communication between processor cores.
  • NC node controller
  • Each computing cluster is directly or indirectly connected to different memory media through multiple integrated memory controllers 3014 .
  • all processor cores in the processor 301 may also be divided into a computing cluster.
  • the processor 301 may be a CPU, for example, a processor of the X86 architecture or a processor of the ARM architecture.
  • the processor 301 can also be other general-purpose processors, digital signal processors (digital signal processing, DSP), application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or other programmable logic devices, discrete gates or transistor logic Devices, discrete hardware components, system on chip (SoC), graphics processing unit (graphic processing unit, GPU), artificial intelligence (artificial intelligent, AI) chips, etc.
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • a variety of memory media can also be used to store operating systems, computer-executable instructions (or called program codes), kernels, and data, and provide computer-executable instructions to the processor 301 so that the processor 301 executes the above-mentioned computer-executable instructions to achieve corresponding functions .
  • the memory medium 302 may be used to store an operating system, computer-executable instructions, and a kernel, so that the processor 301 may execute the computer-executable instructions in the memory medium 302 to implement specific functions.
  • Memory media 302 may include read only memory and random access memory. Memory medium 302 may also include non-volatile random access memory. Memory media 802 can be volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. Among them, the non-volatile memory can be read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically programmable Erases programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM), which acts as external cache memory.
  • RAM random access memory
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • Double data rate synchronous dynamic random access memory double data date SDRAM, DDR SDRAM
  • enhanced SDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous connection dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the memory medium 802 can also be a storage class memory SCM, and the SCM includes at least one of phase change memory PCM, magnetic random access memory MRAM, resistive random access memory RRAM, ferroelectric memory FRAM, fast NAND or nano random access memory NRAM .
  • the type of the memory medium 303 is similar to the type of the memory medium 302 , and may be any one of the above-mentioned types of memory medium, but in the hybrid memory system 300 , the types of the memory medium 302 and the memory medium 303 are different.
  • the hybrid memory system 300 shown in FIG. 3 only uses one processor 301 as an example for illustration. During specific implementation, the hybrid memory system 300 may include two or more 3014 is connected with different types of memory media. Each integrated memory controller 3014 is connected to a memory medium to form a memory channel.
  • the integrated memory controller 3014 is used as a memory expander.
  • each integrated memory controller 3014 is connected to a memory medium
  • a plurality of integrated memory controllers 3014 can also form a whole, and respectively provide the connection between the processor and the memory medium through different ports
  • the memory controller 3014 can provide multiple ports similar to switches or switching chips, so that the processor 301 can realize the connection with the memory medium 302 and the memory medium 303 through the above-mentioned memory controller 3014 .
  • the hybrid memory system 300 is a hybrid memory system.
  • the hybrid memory system 300 may also include other types of memory media.
  • the type of medium is different from the type of memory medium 302 and memory medium 303, for example, random access memory (random access memory, RAM) or static random access memory (static RAM, SRAM), Synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data date SDRAM, DDR SDRAM) and other types of memory media at least one, at this time, the server 200 includes multiple Types of mixed memory media.
  • the hybrid memory system 300 includes any two or more types of memory media in DRAM, SCM, RAM, SRAM, SDRAM, and DDR SDRAM.
  • SDRAM and SCM may be used as hybrid memory media in the hybrid memory system 300
  • SDRAM and DRAM may also be used as hybrid memory media
  • DRAM, SDRAM and SCM may also be used as memory media.
  • hybrid memory medium in which two memory media constitute the hybrid memory system 300 , DRAM and SCM, is taken as an example for illustration.
  • the operating system in the hybrid memory system 300, specifically the processor running the operating system (for example, the processor 301 in FIG. 2 ), can store multiple memory media according to the hierarchical memory mechanism Divided into different grades.
  • the processor 301 may first divide multiple types of memory media into different grades according to the physical attributes of the memory media, wherein the physical attributes include at least one of delay, cost, lifetime, and capacity, and the processor may classify the various types of memory media according to the physical attributes At least one of them divides the memory medium in the multi-level memory system into multiple levels.
  • the same type of memory media can be divided into one or more levels, for example, different memory media of the same type can be divided into two or more levels according to the physical properties of memory media from different manufacturers.
  • each type of memory medium is divided into a level as an example for illustration.
  • FIG. 4 is a schematic diagram of hierarchical memory in a hybrid memory system provided by the present application.
  • the hybrid memory system 400 divides the memory medium in the system into two levels according to the physical properties of the memory medium, wherein, The set 410 formed by the DRAM 411 and the DRAM 412 is divided into the first level, and the set 420 formed by the SCM 421 and the SCM 421 is divided into the second level.
  • the delay of DRAM is lower than that of SCM
  • the cost of DRAM is higher than that of SCM
  • the capacity of DRAM is lower than that of SCM
  • the capacity of DRAM is lower than that of SCM.
  • the lifespan of the SCM is higher than that of the SCM.
  • different types of memory media may be classified into different grades according to at least one of the foregoing physical properties.
  • the processor 301 running the operating system (operating system, OS) in the hybrid memory system also allocates memory resources for each processor, and records the correspondence between the processor and the memory medium, so that Data reading or writing operations are performed based on the correspondence between the above-mentioned processors and different levels of memory media.
  • OS operating system
  • each processor and/or its associated memory medium may be referred to as a non-uniform memory access (NUMA) node.
  • NUMA non-uniform memory access
  • the memory medium associated with the processor is referred to as a NUMA node.
  • a hybrid memory system 400 includes two processors: a processor 401 and a processor 402, each of which is connected to a different type of memory medium, and the connection mode of each processor to the memory medium Reference may be made to the structure shown in FIG. 3 , and for the sake of brevity, details are not repeated here.
  • the DRAM 411 connected to the processor 401 can be called NUMA node 1
  • the DRAM 412 connected to the processor 402 can be called NUMA node 2
  • the SCM 421 connected to the processor 401 can be called NUMA node 3
  • the processor 402 can be called NUMA node 3.
  • the 402 connected SCM422 is called NUMA node 4.
  • the hybrid memory system 400 can also divide each level of memory medium into memory pages of different sizes, so that applications running in the processor can perform read or write operations on the above memory pages.
  • a memory page whose size is greater than the third threshold can be called a large page or a large page memory
  • a memory page smaller than or equal to the fourth threshold can be called a small page or a small page memory.
  • the third threshold and the fourth threshold may be the same or different, and may be configured according to service requirements during specific implementation.
  • memory pages are often divided into different specifications, for example, 4k, 2MB and 1GB, wherein, 4k memory pages are also called small pages or small page memory, 2MB or 1GB memory pages are called large pages or huge page memory.
  • the same hybrid memory system may include multiple memory pages of different specifications, so as to provide memory access modes of different sizes for different applications.
  • large page memory can be used to access data.
  • the processor can read or write data at the granularity of large page memory, improving data processing efficiency.
  • small page memory can be used for data access.
  • only large page memory may be included in the same hybrid memory system.
  • the memory resources of the memory medium can be fully utilized to realize memory access of cloud services, big data, or memory-type data.
  • large page memory is often used for data processing. For example, divide the memory pages in the memory media according to the size of 2M, and build large page memory pools in different memory media, such as building the first large page memory pool in multiple dynamic random access memory media, multiple SCM
  • the second large page memory pool is built in the memory medium of the above type, and the data scanning and migration operations are all processed in the form of the large page memory pool.
  • the hybrid memory system can also build a small page memory pool, that is, in different types of memory media, multiple small page memories can be created at the same time, and the multiple small page Memory is structured as a small page of memory. For example, create 4k small pages in DRAM and SCM respectively, multiple small pages created by multiple DRAMs form the first small page memory pool, and multiple small pages created by multiple SCMs form the second small page memory pool.
  • a memory medium of a dynamic random access storage type and a memory medium of an SCM type are respectively constructed as large page memory pools as an example for illustration.
  • the hybrid memory system can also use the latest and most recently used LRU linked list to count the hotness and coldness of data stored in each NUMA node.
  • each NUMA node is associated with two LRU linked lists, and each NUMA node can uniformly manage created memory pages (for example, huge page memory) in its corresponding memory medium through the LRU linked list.
  • one LRU linked list is used to record the active list (Active list)
  • the active list includes the information of the memory page where the hot data in the memory medium is located
  • the other LRU linked list is used to record the inactive list (Inactive list)
  • the inactive list includes the memory medium Information about the memory page where the cold data is located.
  • LRU 42112 in Figure 4 is an active page list
  • LRU 42111 is an inactive page list.
  • the entry of each linked list is the structure pointer describing the access information of each memory page in the system and the corresponding memory page.
  • hot data refers to data whose number of times the same data is accessed within a unit period is greater than a preset threshold.
  • cold data refers to data whose number of times the same data is accessed within a unit period is less than or equal to the preset threshold.
  • Active List 500 and Inactive List 510 include three parts: head, tail and body respectively.
  • the address of the memory page, the identification and other information used to uniquely identify the memory page) are loaded into the head 501 of the Active List 500. That is to say, once the kernel scans a memory page including hot data, the address or identifier of the memory page is added to the head of the Active list. Similar to the processing method of the Active list 500, when the memory page associated with any NUMA node is traversed by the kernel as a cold page, an entry is added to the head 511 of the Inactive list 510, and the entry records the memory page Address or ID Tim.
  • multiple entries can be used to record the information of each memory page.
  • the information to be scanned and migrated can be provided according to the above content.
  • the processor 301 can divide the state of the memory page into hot page, cold page and free page according to the degree of hotness and coldness of the stored data, where a hot page refers to any data in the same memory page within a unit cycle.
  • Memory pages whose access times are greater than the first threshold.
  • a cold page refers to a memory page whose number of accesses to any data of the same memory page is less than a second threshold within a unit cycle.
  • a free page is a memory page used to indicate that no data is stored.
  • the first threshold and the second threshold may be the same or different, and when the first threshold or the second threshold is different, the first threshold is greater than the second threshold.
  • the processor 301 includes a translation look aside buffer (TLB) for recording the page table management flag (access bit), and the translation look aside buffer is also called a page table buffer.
  • TLB translation look aside buffer
  • the processor can determine whether a memory page is accessed in a unit cycle, and count the number of times it is accessed, and define the above-mentioned first threshold and second threshold through the distribution of access times of each memory page, and then judge whether the data is hot or cold.
  • the access bit can be periodically (such as 1s) cleared, and the number of times the access bit is set is counted periodically (such as 40s), which is used to distinguish between In the 40s cycle, the page is accessed, so as to identify cold data (not accessed within 40s), hot data (accessed n times within 40s, n greater than or equal to 1), so as to realize the hot and cold identification of memory data .
  • the scheduler 2311 in the host user state 230 can send control commands to the virtual machine monitor 221 in the host kernel 220, for example, scanning requests and migration requests, and according to the data fed back by the host kernel state 220
  • the hot and cold information determines the data migration method, and then controls the migration operation of the migration data set between different memory media.
  • the scheduler 2311 sends a scan request to the kernel of the running operating system, and the kernel will traverse the two LRU linked lists associated with each NUMA node, and determine the visited nodes from the last scan to this scan.
  • the scheduler 4311 can periodically report to the operation running on the processor
  • the system's kernel sends scan requests to get more accurate results that reflect how hot or cold the memory media is.
  • the duration of each cycle can be the same or different, that is, the duration associated with each cycle can be dynamically adjusted, and the dynamic adjustment method includes obtaining the duration or user configuration that meets the optimal performance requirements based on historical statistical information.
  • the data processing method provided by the present application is further introduced in conjunction with FIG. 6, the method is executed by the processor 301 shown in FIG. 3, and the method includes:
  • Data distribution used to indicate the distribution of data in different types of memory media (for example, the first memory medium and the second memory medium) in the multi-level memory system, specifically the memory pages in the first memory medium or the second memory medium degree of heat and cold.
  • a specific method for obtaining the data distribution can be obtained through the above-mentioned LRU linked list to obtain the degree of coldness and heat of each memory page. For the sake of brevity, details will not be described here.
  • the data distribution may also be determined according to the degree of hotness and coldness of the data and the physical properties of the memory medium, where the physical properties include at least one of delay, cost, capacity and lifetime.
  • S620 Determine a data migration mode according to the data distribution, where the data migration mode is used to implement migration processing of the migration data set between the first memory medium and the second memory medium according to the data distribution.
  • the data migration method includes at least one of the following two situations:
  • Case 1 Migrating a data set including hot pages in an SCM-type memory medium to a DRAM-type memory medium.
  • the number of free pages in the DRAM and the number of hot pages in the SCM can be determined first, and then the above-mentioned data set to be migrated operation can be performed. For example, when the number of free pages in the memory medium of the DRAM type is greater than the number of hot pages in the memory medium of the SCM type, the data of all the hot pages are migrated to the free pages of the DRAM. Optionally, part of the data in the hot page may also be migrated to a free page of the DRAM.
  • the data set of the hot page in the SCM type storage medium can be stored in the DRAM type memory medium, reducing the time delay when data is accessed.
  • Case 2 Migrating the data set in the cold page in the memory medium including the DRAM type to the memory medium of the SCM type.
  • the number of free pages in the SCM and the number of cold pages in the DRAM may be determined first, and then the above-mentioned data set to be migrated operation may be performed. For example, when the number of free pages in the memory medium of the DRAM type is less than or equal to the number of hot pages in the memory medium of the SCM type, data of all cold pages is migrated to free pages of the SCM. Optionally, some data in the cold page may also be migrated to the free page of the SCM.
  • the data set in the cold page in the DRAM can be migrated to the SCM type storage medium, so that the cold data with low frequency of access is stored in the low-cost memory medium, thereby ensuring the hot data through the delay Low-cost DRAM-type memory media storage improves the processing efficiency of the entire hybrid memory system.
  • the data set including the hot page in the SCM type memory medium may be further migrated to the DRAM type memory medium.
  • the processor 603 determines the data migration mode, it migrates the cold data to the low-cost SCM type memory medium, and migrates the hot data to the low-latency DRAM type memory medium, so as to realize the hybrid memory system.
  • Hierarchical data storage combined with the classification of data attributes and memory media, realizes reasonable storage of data, which not only reduces the cost of the entire system but also ensures the efficiency of data processing.
  • the processor 603 migrates cold data to a low-cost storage-class type memory medium, it can first select one or more free memory pages from the memory page pool formed by the SCM, for example, in the SCM type memory Select one or more idle large page memories from the large page pool formed by media, and then migrate the cold data to be migrated to the free large page memory selected above.
  • the processor 603 needs to migrate hot data to the DRAM with the lowest latency, it can first select one or more free memory pages from the memory pool of the DRAM type memory medium, for example, in the large page pool of the DRAM Select one or more large page memory in idle state, and then migrate the hot data to be migrated to the free large page memory selected above.
  • the number of free memory pages selected above is related to the size and distribution of the data set to be migrated, and can be configured according to business requirements during specific implementation.
  • the processor may further include a memory copy middleware module, which may also be called memcpy middleware.
  • the memory copy middleware module can determine the NUMA node to which the copied destination address belongs, and then determine the medium type of the memory medium where the destination address to be migrated is located.
  • the memory copy middleware module will use the modified memory copy (memcpy) instruction to implement, which can optimize the writing bandwidth of the SCM to the greatest extent.
  • the memory copy middleware module when reading data from SCM to DRAM, the memory copy middleware module will send the address and data length of the data to the native memcpy function of the kernel (Kernel) for processing, thereby realizing the migration of data sets between different memory media migration operation.
  • kernel Kernel
  • the data processing method provided according to the embodiment of the present application is described in detail above with reference to FIG. 1 to FIG. 6 .
  • the data processing device and the hybrid memory system provided according to the embodiment of the present application will be described below in conjunction with FIG. 7 .
  • FIG. 7 is a schematic structural diagram of a data processing device provided by the present application. As shown in the figure, the device 700 is applied to a hybrid memory system, and the hybrid memory system includes various types of memory media. The various types of memory The medium includes a first memory medium and a second memory medium, and the apparatus 700 includes:
  • An acquisition unit 701, configured to acquire the data distribution of the hybrid memory system
  • a policy unit 702 configured to determine a data migration method according to the data distribution, and the data migration method is used to realize the migration of the migration data set between the first memory medium and the second memory medium according to the data distribution deal with;
  • the migration unit 703 is configured to perform the migration processing of the migration data set according to the migration manner.
  • the device 700 in the embodiment of the present application may be implemented by a central processing unit (central processing unit, CPU), or by an application-specific integrated circuit (ASIC), or a programmable logic device (programmable logic device, PLD), the above-mentioned PLD can be complex program logic device (complex programmable logical device, CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof.
  • the device 700 and its modules can also be software modules.
  • the apparatus 700 further includes an allocation unit 704, wherein,
  • the acquiring unit 701 is further configured to acquire the memory layout of the hybrid memory system, where the memory layout includes the quantity and type of memory deployed in the memory system;
  • the allocating unit 704 is configured to allocate memory resources for the first processor according to the memory layout, wherein the first processor is associated with a latest recently used LRU list, and the LRU list is used to record the first processor
  • the degree of hotness and coldness of data stored in the allocated memory resources, the LRU linked list includes an active list Active list and an inactive list Inactive list, and the Active list is used to identify the memory page where the hot data associated with the first processor is located Information, the Inactive list is used to identify the information of the memory page where the cold data associated with the first processor is located.
  • the acquiring unit 701 is specifically configured to acquire a scan request; and traverse the first memory medium and the second memory medium associated with the first processor in the hierarchical storage system according to the scan request.
  • the data distribution of the memory page in the second memory medium, the data distribution includes the state of the memory page in the first memory medium and the second memory medium, and the state of the memory page includes hot page, cold page or idle Page.
  • the first memory medium includes a first huge page memory pool
  • the second memory medium includes a second huge page memory pool
  • the pools respectively include at least one first memory page, and the size of each first memory page in the at least one first memory page is greater than a first threshold.
  • the acquiring unit 701 is specifically configured to periodically acquire the scan request, and the scan request is used to periodically count the first large page in the first memory medium and the second memory medium The degree of hotness and coldness of the first memory page in the memory page pool and the second large page memory pool.
  • the policy unit 702 is configured to count the number of times the data in the first memory page is read within a unit period; and, when the data in the first memory page is read once , the heat degree of the first memory page is increased by one, and the heat degree is used to indicate how hot or cold the data in the first memory page is accessed within the unit period.
  • the policy unit 702 is specifically configured to determine a hierarchical memory mechanism in the hybrid memory system, where the hierarchical memory mechanism is used to indicate the levels of the multiple different types of memory media in the hybrid memory system,
  • the hybrid memory system includes multiple levels, and the multiple levels include a first level and a second level; and the data migration mode is determined according to the data distribution and the hierarchical memory mechanism.
  • the first memory medium belongs to the first level
  • the second memory medium belongs to the second level
  • the migration unit 703 is specifically configured to select one or more free pages from the first memory pool when the number of free pages in the first memory medium is greater than the number of hot pages in the second storage medium. A memory page, migrating the first data set including the data of the hot page in the second memory medium to the one or more free memory pages in the first memory pool.
  • the first memory medium belongs to the first level
  • the second memory medium belongs to the second level
  • the migration data set further includes a second data set
  • the second data set includes at least one cold data
  • the cold data is data whose number of reads and writes in a unit cycle is less than or equal to the second threshold
  • the migration unit 703 is specifically configured to select one or more free memory from the second memory pool when the number of free pages in the first memory medium is less than or equal to the number of hot pages in the second memory medium page, migrating the second data set including data in the cold page in the first memory medium to the one or more free memory pages in the second memory pool.
  • the migration unit 703 is further configured to migrate the first data set including the data of the hot pages in the second memory medium to one or more free memory pages in the first memory pool.
  • the latency of the first memory medium is lower than that of the second memory medium, and the cost of the first memory medium is higher than that of the second memory medium.
  • the lifetime of the first storage medium is higher than that of the first memory medium, and the capacity of the first storage medium is lower than the capacity of the second memory medium.
  • the first memory medium is a dynamic random access memory DRAM
  • the second memory medium is a storage class memory SCM
  • the SCM includes a phase change memory PCM, a magnetic random access memory MRAM, and a resistive random access memory RRAM, At least one of ferroelectric memory FRAM, fast NAND or nano random access memory NRAM.
  • the device is connected to the plurality of different types of memory media through interfaces supporting memory semantics, and the interfaces include at least one interface supporting Computer Express Link CXL, Cache Coherent Interconnection Protocol CCIX, or Uniform Bus UB.
  • the hybrid memory system is applied to a scenario of deploying large-capacity memory, and the scenario includes at least one of big data, in-memory database, or cloud service.
  • the device 700 according to the embodiment of the present application may correspond to the implementation of the method described in the embodiment of the present application, and the above-mentioned and other operations and/or functions of each unit in the device 700 are respectively to realize the corresponding flow of each method in FIG. 5 , For the sake of brevity, details are not repeated here.
  • the present application also provides a hybrid memory system.
  • the hybrid memory system includes a processor and multiple memory media of different types. For brevity, details are not described here.
  • the present application also provides a processor, the processor includes an integrated circuit, the integrated circuit is connected to multiple different types of memory media, and the integrated circuit is used to realize the functions of each operation step in the method 600 shown in Figure 6, for the sake of brevity , which will not be repeated here.
  • the above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or other arbitrary combinations.
  • the above-described embodiments may be implemented in whole or in part in the form of computer program products.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center that includes one or more sets of available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media.
  • the semiconductor medium may be a solid state drive (SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System (AREA)

Abstract

一种应用于分级内存系统的数据处理方法,该分级内存系统包括不同类型的第一内存介质和第二内存介质,该方法包括:获取该分级内存系统的数据分布;根据该数据分布确定数据迁移方式,数据迁移方式用于根据数据分布实现迁移数据集在不同内存介质之间的迁移处理;并根据上述迁移方式执行迁移数据集的所述迁移处理。通过在分级内存系统中实现分级,并结合数据属性实现冷热数据的分级存储方案,既降低了内存的成本,又保证了数据处理的效率。

Description

数据处理的方法、装置、处理器和混合内存系统 技术领域
本申请涉及计算机领域,具体涉及一种数据处理的方法、装置、处理器和混合内存系统。
背景技术
随着多核处理器的发展,单个处理器的核数逐渐增多,而处理器访问内存的内存通道的数量却未随之增长,例如,单个中央处理器(central processing unit,CPU)的最大核数可达64个,而内存通道数仅为8个,这就导致平均分配给每个处理器核的内存带宽和内存容量随着处理器核数的增长变得越来越小,使得内存性能严重限制CPU的性能发挥,内存墙(memory wall)的问题日益突出。
为了解决内存墙的问题,
Figure PCTCN2022122693-appb-000001
提出了一种开放内存接口(open memory interface,OMI),移除CPU中双倍数据速率控制器(double data rate controller,DDRC)和双倍数据速率物理层(double data rate controller physical layer,DDR PHY),并在内存中实现开放内存接口、DDRC和DDR PHY的功能,上述内存条也称为差分双列直插式内存模组(dual in-line memory module,DDIMM),此时,DDIMM通过开放内存接口与CPU连接。由于移除了CPU中DDRC和DDR PHY,且OMI为基于串行总线接口实现的接口,这就增加了CPU中可用于扩展内存通道的引脚(PIN)的数量,例如,支持开放接口的单个
Figure PCTCN2022122693-appb-000002
POWER9可提供16个内存通道。
但是,上述方案中支持OMI的处理器仅支持动态随机存取存储器(dynamic random access memory,DRAM)类型的内存颗粒,而该类型的内存颗粒的价格较高,对于大数据(例如,Apache Spark TM)、内存型数据库(例如,Redis)或云服务(例如,云基础设施中采用内存超分配机制提供的虚拟机)等需要部署大容量内存的场景而言,成本更高。因此,如何提供一种低成本的内存墙的解决方案成为亟待解决的技术问题。
发明内容
本申请提供了一种数据处理的方法、装置、处理器和混合内存系统,可以提供一种低成本的内存墙的解决方案
第一方面,提供一种应用于混合内存系统的方法,混合内存系统包括多种不同类型的内存介质,例如,DRAM类型的内存介质和SCM类型的内存介质,处理器可以获取混合内存系统的数据分布;并根据上述数据分布确定数据迁移方式,而数据迁移方式用于根据所述数据分布实现迁移数据集在不同内存介质之间的迁移处理;最后,根据迁移方式执行迁移数据集的所述迁移处理。为了解决高成本的内存墙问题,本申请提出采用不同类型的内存介质构成混合内存系统,并基于数据在不同类型的内存介质之间的数据分布确定数据迁移方式,再结合不用类型的内存介质的属性实现数据迁移,在满足了处理器的算力需求的同时,使能低成本的内存介质,在保证数据处理时延的同时,降低整个混合内存系统的成本。
可选地,数据分布是指数据在不同类型的内存介质中存储方式,具体可以采用以下两种方式确定:
方式1,根据数据的冷热程度确定数据分布,即按照数据的冷热程度将其存储在不同类型的存储介质。
方式2,根据数据的冷热程度和内存介质的物理属性确定,其中,物理属性包括时延、 成本、容量和寿命中至少一种。
在一种可能的实现方式中,混合内存系统还包括多个处理器,在初始化节点,可以为每个处理器分配内存资源,并利用LRU链表记录内存页的冷热程度,再基于上述冷热程度执行数据的迁移处理。具体地,先获取混合内存系统的内存布局,内存布局包括内存系统中部署的内存的数量和类型;然后,根据内存布局为每个处理器分配内存资源,利用最新最近使用LRU链表记录每个处理器被分配的内存资源中存储数据的冷热程度,而LRU链表包括活跃清单Active list和不活跃清单Inactive list,其中,Active list用于标识处理器关联的热数据所在内存页的信息,Inactive list用于标识处理器关联的冷数据所在的内存页的信息。通过LRU链表的管理方式,可以获知内存页中存储数据的冷热程度,进而根据内存页的属性标签执行数据在不同内存介质之间的迁移处理,例如,可以将热数据迁移至时延敏感型的DRAM类型的内存页中,将冷数据迁移至低成本的SCM类型的内存页中,即能降低混合内存系统的成本,又可以保证数据处理的效率。另一方便,对于2M等大页内存而言,LRU的链表管理方式提供了一种内存页访问热度的管理方式,便于内存介质使用过程中,根据数据属性执行合理的迁移操作,提升整个系统的处理效率,降低内存访问的处理时延。
在另一种可能的实现方式中,利用LRU链表还可以通过扫描请求遍历分级存储系统中与各个处理器关联的不同内存介质中内存页的数据分布,其中,数据分布包括不同内存介质中内存页的状态,内存页的状态包括热页、冷页或空闲页。也即,通过LRU链表的管理可以进一步获知内存页的状态,区分出热页和冷页,进而根据内存页的状态执行数据迁移操作,使得热数据存储在低时延的内存介质中,冷数据存储在低成本的内存介质中,即保证了数据被访问的效率,又降低了整个系统的内存成本。
在另一种可能的实现方式中,不同内存介质还可以分别构建内存池,每个内存池中分别包括多个大页内存,每个大页内存的大小大于第一阈值。对于内存型数据库、大数据或云服务等场景,通过构建大页内存池,内存访问以大页为粒度进行访问,与小页内存相比,每次读取的数据量随之增多,提升了数据处理的效率。而且,在不同类型的内存介质中,还可以进一步结合内存分级机制进行数据迁移,能够进一步提升整个系统的处理速度,同时,降低内存成本。
在另一种可能的实现方式中,内存介质中数据分布还可以周期性通过扫描请求统计不同类型内存介质中各个内存页的冷热程度。通过周期性统计不同类型内存介质中各个内存页的冷热程度,可以动态的实现数据迁移处理过程,使得数据的冷热属性与内存介质的低时延、低成本等特性关联,保证经常被访问的热数据存储在低时延的内存介质中,提升数据被访问的效率,同时,对于被访问次数少的冷数据,则存储在低成本的内存介质中,由此减低整个系统中内存介质的成本。
在另一种可能的实现方式中,每个内存页的冷热程度可以通过以下方式获取:在单位周期内,统计各个内存页中数据被读取的次数;当任意一个内存页中数据被执行一次读取操作时,该内存页的热度加一,该热度可以指示所述单位周期内该内存页中数据被访问的冷热程度。通过单位周期内对内页中数据被访问次数的统计,可以明确内存页的冷热程度,进而根据数据的冷热属性执行数据迁移操作,合理利用不同类型的内存介质资源。
在另一种可能的实现方式中,迁移数据集包括第一数据集,第一数据集包括至少一个热数据,热数据为单位周期内数据被读写的次数大于第一阈值的数据。
在另一种可能的实现方式中,混合内存系统可以根据以下方式确定数据迁移方式:先确定混合内存系统中分级内存机制,分级内存机制用于指示混合内存系统中多种不同类型的内 存介质的等级,例如,混合内存系统包括第一等级和第二等级;然后,根据数据分布和分级内存机制确定数据迁移方式。通过将混合内存系统中不同类型的内存介质进行分级,可以进一步结合内存介质的物理属性存储数据,例如,根据内存介质的时延、成本、寿命和容量等物理属性将不同类型的内存介质进行分级,再结合数据的冷热程度进行数据存储,可以充分利用内存介质的优势实现热数据的冷数据的分类管理,提升内存介质的利用率。与此同时,为了保证整个系统中数据处理的效率,还可以根据应用需求和内存介质的物理属性,将时延敏感型数据存储在低时延的内存介质中,将非时延明感型数据存储在低成本的内存介质中,既保证了数据处理效率又降低了整个系统中内存介质的成本。
在另一种可能的实现方式中,第一内存介质归属为第一等级,第二内存介质归属为第二等级,则数据迁移方式包括:从第一内存池中选择一个或多个空闲的内存页,将包括第二内存介质中热页的数据的第一数据集迁移至第一内存池中一个或多个空闲的内存页。通过上述方法,可以根据不同内存介质中热页的数量和低时延型内存介质的空闲资源情况执行迁移操作,将热页的数据迁移至低时延的内存介质,加快数据被访问的处理效率。
可选地,在执行迁移操作前,还可以先判断第一内存介质中空闲页的数量是否大于第二存储介质中热页的数量,当第一内存介质中空闲页的数量大于第二存储介质中热页的数量时,将全部热页数据迁移至第一内存介质,或者将热页数据中部分数据迁移至第一内存介质。
在另一种可能的实现方式中,第一内存介质归属为第一等级,第二内存介质归属为第二等级,迁移数据集还包括第二数据集,第二数据集包括至少一个冷数据,冷数据为单位周期内被读写的次数小于或等于第二阈值的数据,则数据迁移方式包括:从第二内存池中选择一个或多个空闲内存页,将包括第一内存介质中冷页中数据的述第二数据集迁移至上述第二内存池中所述一个或多个空闲内存页。也就是说,冷页的冷数据可以被迁移至低成本的内存介质,使得低时延的内存介质优先存储时延敏感性数据,结合内存介质的不同物理属性,充分利用内存介质的资源。
可选地,执行迁移操作前,还可以先判断第一内存介质中空闲页的数量是否小于或等于所述第二内存介质中热页的数量,当第一内存介质中空闲页的数量小于或等于所述第二内存介质中热页的数量时,将包括第一内存介质中冷页中数据的述第二数据集迁移至上述第二内存池中所述一个或多个空闲内存页。作为一种可能的实现方式,除了将全部冷页的数据迁移至第二内存介质外,还可以将冷页数据中部分数据迁移至第二内存介质。
可选地,在上述将冷数据迁移至低成本的内存介质后,还可以将包括第二内存介质中热页的数据的第一数据集迁移至上述第一内存池中一个或多个空闲的内存页。也即,在低时延的内存机制中冷数据被迁移走后,还可以继续判断低成本的内存介质中是否有热数据,进而将热数据迁移至低时延型的内存介质中。通过上述动态的数据迁移过程,可以结合内存介质的物理属性更合理的利用内存介质资源。
作为一种可能的实施例,上述第一内存介质的时延低于第二内存介质的时延,第一内存介质的成本高于所述第二内存介质。
可选地,第一存储介质的寿命高于第一内存介质,第一存储介质的容量低于第二内存介质的容量。
作为一种可能的实施例,上述第一内存介质为动态随机存取存储器DRAM,第二内存介质为存储级内存SCM,SCM包括相变存储器PCM,磁性随机存储器MRAM、电阻型随机存储器RRAM,铁电式存储器FRAM,快速NAND或纳米随机存储器NRAM中至少一种。
作为一种可能的实施例,处理器与多种不同类型的内存介质通过支持内存语义的接口相 连,所述接口包括支持计算机快速链接CXL、缓存一致互联协议CCIX或统一总线UB中至少一种接口。
作为一种可能的实施例,上述混合内存系统为服务器或服务器集群,服务器集群包括两个或两个以上服务器。
作为一种可能的实施例,混合内存系统应用于部署大容量内存的场景,场景包括大数据、内存型数据库或云服务中至少一种。
第二方面,本申请提供一种数据处理的装置,所述数据处理的装置包括用于执行第一方面或第一方面任一种可能实现方式中的数据处理的方法的操作步骤的各个模块。
第三方面,本申请提供一种处理器,所述处理器包括集成电路,所述集成电路与多种不同类型的内存介质相连,所述集成电路用于实现包括用于执行第一方面或第一方面任一种可能实现方式中的数据处理的方法的操作步骤。
第四方面,本申请提供一种分级存储系统,所述分级存储系统包括处理器和多种内存介质,所述处理器和多种内存介质之间通过总线连接并完成相互间的通信,所述多种内存介质中任意一个内存介质用于存储计算机执行指令,所述分级存储系统运行时,所述处理器执行所述存储器中的计算机执行指令以利用所述分级存储系统中的硬件资源执行第一方面或第一方面任一种可能实现方式中所述方法的操作步骤。
第五方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
第六方面,本申请提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
图1为本申请提供的一种云服务场景的架构示意图;
图2为本申请提供的一种大数据和内存型数据库的系统架构图;
图3本申请提供的一种混合内存系统的结构图;
图4为本申请提供的一种混合内存系统中实施分级内存机制的示意图;
图5为本申请提供的一种最近最少使用链表的结构示意图;
图6为本申请提供的一种数据处理的方法流程示意图;
图7为本申请提供的一种数据处理装置的结构示意图。
具体实施方式
为了便于描述,首先对本申请涉及的术语进行解释。
内存级存储介质(storage-class-memory,SCM),既具有内存(memory)的优势,又兼顾了存储(storage)的特点,简单理解,即为新型非易失存储介质。内存级存储介质具有非易失性、极短的存取时间、每比特价格低廉、固态,无移动区的特点。当前的SCM介质技术有很多,其中相变存储器(phase-change memory,PCM)是最为突出和典型的介质,也是最早有产品面世的内存级存储介质技术之一,例如,
Figure PCTCN2022122693-appb-000003
基于3D Xpoint开发的傲腾内存(
Figure PCTCN2022122693-appb-000004
Optane TM Memory)。除此之外,内存级存储介质还包括磁性随机存储器(magnetoresistive random-access memory,MRAM)、电阻型随机存储器(resistive random access memory,RRAM/ReRAM)、铁电式存储器(ferroelectric random access memory,FRAM)、快速NAND(fast NAND),纳米随机存储器(Nano-RAM,NRAM)等其他类型。
内核态,也可以称为主机内核态,由于需要限制不同的程序之间的访问能力,防止他们获取其他程序的内存数据,或者获取外围设备的数据,处理器划分出两个权限等级:用户态和内核态。当一个任务或者一个进程执行系统调用而在内核代码中执行时,该进程被称为处于内核态。此时处理器处于特权级最高的内核代码中执行。在内核态,处理器可以访问内存的所有数据,包括外围设备,例如内存,网卡等。
用户态,也可以称为主机用户态,当进程在执行用户自己的代码时,则称其处于用户态。此时处理器在特权级最低的用户代码中运行,仅能使用常规处理器指令集,不能使用操作硬件资源的处理器指令集。在用户态,处理器只能受限地访问内存,且不允许访问外围设备,如输入输出(input/output,IO)读写、网卡访问、申请内存等。
客户用户态,即虚拟机的用户态,虚拟机为运行在物理主机上的虚拟操作系统,在虚拟机操作系统中同样定义了内核态与用户态,也是为了限定虚拟机中权限,划分方式与上述主机内核态和主机用户态类似,在此不再赘述。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚描述。
为了解决云服务(例如,云基础设施中采用内存超分配机制提供的虚拟机)、大数据(例如,Apache Spark TM)或内存型数据库(例如,Redis)等需要部署大容量内存的场景中使用基于开放内存接口OMI的动态随机存取存储器(dynamic random access memory,DRAM)所带来的成本高的问题,本申请提供一种基于多种不同类型的内存介质实现分级内存机制存储数据的方法,通过支持不同类型的内存介质实现内存扩展,根据混合内存系统(也可以称为分级内存系统)中数据分布确定数据迁移方式,进而实现带有冷热属性标识的迁移数据集在不同内存介质之间的迁移处理。
首先,结合图1和图2分别介绍本申请所涉及的应用场景的架构示意图。
图1为本申请提供的一种云服务场景的架构示意图,如图所示,系统100包括硬件110、主机内核态120、主机用户态130和客户用户态140。
其中,硬件310包括多个处理器(例如,处理器113和处理器114),以及不同类型的内存介质(例如,DRAM和SCM)。
主机内核态120包括虚拟机监视器(Hypervisor),而虚拟机监视器又包括内核虚拟机(Kernel-based Virtual Machine,KVM)1211和最少最近使用(least latest use,LRU)1212。其中,内核虚拟机1211用于管理主机用户态130中虚拟机132和虚拟机133。虚拟机监视器121还用于管理最少最近使用1212,最少最近使用1212用于统计DRAM311和SCM312的内存页中数据被访问的活跃程度。
主机用户态130包括策略模块131和虚拟机(例如,虚拟机132和虚拟机133),每个虚拟机上运行一个或多个客户用户态140的应用(例如,应用141和应用142运行在虚拟机132中,应用143和应用144运行在虚拟机133中)。策略模块131,则用于获取的数据分布确定数据迁移方式。此外,策略模块131还可以为用户提供配置数据迁移的周期的配置接口,从而触发混合内存系统根据上述配置接口获得的周期执行迁移操作。
图2为本申请提供的一种大数据和内存型数据库的系统架构图,如图所示,系统200包括硬件210、主机内核态220和主机用户态230。其中,硬件210与图1中硬件110的结构和功能类似,为了简洁,在此不再赘述。主机内核态220包括虚拟机监视器(Hypervisor)221,用于实现系统200中虚拟化软件的管理,以及最近最少使用2211的管理。虚拟机监视器221包括内核虚拟机(KVM)4212和最近最少使用(LRU)2211,其中,最近最少使用2211的功能与图1所示的最近最少使用1212的功能相同,在此不再赘述。主机用户态230运行业务 操作系统231,即运行大数据应用201或数据库应用202的操作系统,业务操作系统231中又包括调度器2311和内存管理器2312,其中,调度器2311用于获取动态随机存取器412和SCM413的数据分布,以及根据上述数据分布确定数据迁移方式,并通知内存管理器2312根据数据迁移方式执行迁移数据集的迁移处理。
图1或图2所示云服务、大数据或内存型数据库的场景可以以单服务器形态部署,也可以采用服务器集群形式部署,其中,服务器集群中包括多个服务器。
接下来,以图3为例继续介绍本申请实施例提供的一种混合内存系统的结构示意图,该混合内存系统可以是部署云服务、大数据或内存型数据的场景的服务器或服务器集群中任意一个服务器。如图所示,混合内存系统300包括处理器301和多种不同类型的内存介质,例如,内存介质302和内存介质303,而内存介质302可以为DRAM,内存介质303可以为SCM。
值得说明的是,内存介质302和内存介质303所示为两种不同类型的内存介质,每种类型的内存介质可以包括多个,例如,内存介质302可以包括DRAM2021、DRAM2022、DRAM2023和DRAM2024;内存介质303可以包括SCM3031、SCM3032、SCM3033和SCM3034。
处理器301与内存介质302和内存介质303通过支持内存语义的接口相连,其中,支持内存语义的接口包括支持内存互联(Compute Express Link TM,CXL)、缓存一致互联协议(Cache Coherent Interconnect for Accelerators,CCIX)或统一总线(unified bus,UB或Ubus)中至少一种接口。
处理器301又包括多个处理器核,以及用于实现内存管理和控制的集成内存控制器(integrated memory controller,iMC)3014。多个处理器核可以进一步被划分为多个计算集群,每个计算集群包括至少一个处理器核,例如,如图2所示,计算集群3011包括处理器核30111,处理器核30112,处理器核30113和处理器核30114;计算集群3012包括处理器核30121,处理器核30122,处理器核30123和处理器核30124。多个计算集群之间通过片上网络(network on chip,NoC)3013进行通信,片上网络3013用于实现不同计算集群中处理器核之间的通信。对于X86架构的处理器,片上网络3013可以为节点控制器(node controller,NC);对于高级精简指令集计算机器(advanced reduced instruction set computing machines,ARM)架构的处理器,片上网络3013可以为用于实现处理器核间通信的芯片或逻辑电路。每个计算集群通过多个集成内存控制器3014与不同内存介质进行直接或间接连接。可选地,处理器301中所有处理器核也可以被划分为一个计算集群。
处理器301可以是CPU,例如,X86架构的处理器或ARM架构的处理器。处理器301还可以是其他通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件、片上系统(system on chip,SoC)、图形处理器(graphic processing unit,GPU)、人工智能(artificial intelligent,AI)芯片等。通用处理器可以是微处理器或者是任何常规的处理器等。
多种内存介质还可以用于存储操作系统、计算机执行指令(或称为程序代码)、内核和数据,并向处理器301提供计算机执行指令以便处理器301执行上述计算机执行指令以实现相应的功能。示例地,内存介质302可以用于存储操作系统、计算机执行指令、内核,使得处理器301可以执行内存介质302中计算机执行指令以实现具体功能。
内存介质302可以包括只读存储器和随机存取存储器。内存介质302还可以包括非易失性随机存取存储器。内存介质802可以是易失性存储器或非易失性存储器,或可包括易失性 和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。可选地,内存介质802还可以是存储级内存SCM,SCM包括相变存储器PCM,磁性随机存储器MRAM、电阻型随机存储器RRAM,铁电式存储器FRAM,快速NAND或纳米随机存储器NRAM中至少一种。
内存介质303的类型与内存介质302类型类似,也可以为上述各种内存介质类型中任意一种,但在混合内存系统300中,内存介质302和内存介质303的类型不同。
图3所示的混合内存系统300仅以一个处理器301为例进行说明,具体实施时,混合内存系统300中可以包括两个或两个以上处理器,每个处理器分别通过集成内存控制器3014与不同类型的内存介质相连。每个集成内存控制器3014与一个内存介质相连,形成一条内存通道。
作为一种可能的实施例,集成内存控制器3014除了如图3所示,被集成在混合内存系统300的处理器301中外,也可以为处理器301外的片外芯片形式作为混合内存系统300中一个端点设备(endpoint),此时,集成内存控制器3014作为内存扩展器。
作为一种可能的实施例,除了每个集成内存控制器3014分别与一个内存介质相连外,多个集成内存控制器3014也可以构成一个整体,分别通过不同端口提供处理器和内存介质的连接,此时内存控制器3014可以提供类似交换机或交换芯片的多个端口,使得处理器301可以上述内存控制器3014实现与内存介质302和内存介质303的连接。
作为一种可能的实施例,混合内存系统300为混合内存系统,除了如图3所示的内存介质302和内存介质303之外,混合内存系统300中还可以包括其他类型的内存介质,该内存介质的类型与内存介质302和内存介质303的类型不同,例如,还可以在混合内存系统300中添加随机存取存储器(random access memory,RAM)或静态随机存取存储器(static RAM,SRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)等类型的内存介质中至少一种,此时,服务器200中包括多种类型的混合内存介质。也就是说,混合内存系统300中包括DRAM、SCM、RAM、SRAM、SDRAM、DDR SDRAM中任意两种或两种以上类型的内存介质。示例地,混合内存系统300中可以利用SDRAM和SCM作为混合内存介质,也可以利用SDRAM和DRAM作为混合内存介质,还可以利用DRAM、SDRAM和SCM作为内存介质。
为了便于描述,本申请的以下实施例中以DRAM和SCM两种内存介质构成混合内存系统300的混合内存介质为例进行说明。
进一步地,在初始化阶段,混合内存系统300中操作系统(operating system,OS),具体为运行操作系统的处理器(例如,图2中处理器301),可以根据分级内存机制将多种内存介质划分为不同等级。具体地,处理器301可以先根据内存介质物理属性将多种类型的内存介 质划分为不用等级,其中,物理属性包括时延、成本、寿命和容量中至少一种,则处理器可以根据物理属性中至少一种将多级内存系统中内存介质划分为多个等级。此外,同一种类型的内存介质可以被划分为一个或多个等级,例如,可以根据不同厂商的内存介质的物理属性将同类型的不同内存介质划分为两个或两个以上等级。为了便于描述,以下实施例中以每种类型的内存介质被划分为一个等级为例进行说明。
示例地,图4为本申请提供的一种混合内存系统中分级内存的示意图,如图所示,混合内存系统400根据内存介质的物理属性将该系统中内存介质划分为两个等级,其中,动态随机存取器411和动态随机存取器412构成的集合410被划分为第一等级,SCM421和SCM421构成的集合420被划分为第二等级。
值得说明的是,由于介质类型的不同,其物理属性也不同,例如,DRAM的时延低于SCM的时延,DRAM的成本高于SCM的时延,DRAM的容量低于SCM的容量,DRAM的寿命高于SCM的寿命。具体实施时,可以根据上述物理属性中至少一种将不同类型的内存介质划分为不同等级。
事实上,在初始化阶段,混合内存系统中运行操作系统(operating system,OS)的处理器301还为每个处理器分配了内存资源,并记录了处理器和内存介质之间的对应关系,以便基于上述处理器和不同等级的内存介质的对应关系执行数据读取或写入操作。
在混合内存系统中,可以将每个处理器和/或其关联的内存介质称为非一致性内存访问(non-uniform memory access,NUMA)节点。为了便于描述,以下实施例中将与处理器关联的内存介质称为NUMA节点。
示例地,如图4所示,混合内存系统400包括两个处理器:处理器401和处理器402,每个处理器分别与不用类型的内存介质相连,每个处理器与内存介质的连接方式可以参考图3所示的结构,为了简洁,在此不再赘述。通常地,可以将与处理器401相连的DRAM411称为NUMA节点1,将与处理器402相连的DRAM412称为NUMA节点2,将与处理器401连接的SCM421称为NUMA节点3,将与处理器402连接的SCM422称为NUMA节点4。
在业务实施过程中,混合内存系统400还可以将每级内存介质划分为不同大小的内存页,以便处理器中运行的应用程序可以对上述内存页进行读或写操作。而按照内存页的大小,又可以将内存页的大小大于第三阈值的内存页称为大页或大页内存,将小于或等于第四阈值的内存页称为小页或小页内存。其中,第三阈值和第四阈值可以相同也可以不同,具体实施时,可以根据业务需求进行配置。示例地,按照当前计算机领域的发展,内存页往往被划分为不同规格,例如,4k,2MB和1GB,其中,4k的内存页也被称为小页或小页内存,2MB或1GB的内存页则被称为大页或大页内存。具体实施中,同一混合内存系统可以包括多种不同规格的内存页,以便针对不同应用提供不同大小的内存访问方式。例如,对于云服务、大数据或内存型数据库等计算密集型应用,可以利用大页内存实现数据访问,此时,处理器可以以大页内存粒度进行数据的读或写操作,提升数据处理的效率。而对于普通场景,例如,办公应用,则可以采用小页内存实现数据访问。可选地,同一混合内存系统中也可以仅包括大页内存,此时,以便充分利用内存介质的内存资源实现云服务、大数据或内存型数据的内存访问。
作为一种可能的实现方式,对于大容量内存的场景,为了提升数据处理的效率,往往使用大页内存进行数据的处理。例如,将内存介质中内存页按照2M大小划分,并在不同内存介质中分别构建大页内存池,如多个动态随机存取存储类型的内存介质中构建第一大页内存池,多个SCM类型的内存介质中构建第二大页内存池,数据的扫描和迁移操作均以大页内存池的形式进行处理。
可选地,混合内存系统中除了包括大页内存池外,还可以构建小页内存池,也即在不同类型的内存介质中,还可以同时创建多个小页内存,并将多个小页内存构建为一个小页内存。例如,分别在DRAM中和SCM中创建4k大小的小页,将多个DRAM创建的中多个小页构成第一小页内存池,将多个SCM创建的多个小页构成第二小页内存池。
为了便于描述,以下实施例中以动态随机存取存储类型的内存介质和SCM类型的内存介质分别构建大页内存池为例进行说明。
此外,混合内存系统还可以利用最新最近使用LRU链表统计每个NUMA节点中存储数据的冷热程度。具体地,每个NUMA节点关联两个LRU链表,每个NUMA节点可以通过LRU链表对与其对应的内存介质中已创建的内存页(例如,大页内存)进行统一管理。其中,一个LRU链表用于记录活跃清单(Active list),活跃清单包括内存介质中热数据所在内存页的信息,另一个LRU链表用于记录不活跃清单(Inactive list),不活跃清单包括内存介质中冷数据所在内存页的信息的。示例地,图4中LRU 42112为活跃页清单,LRU 42111为不活跃清单。每个链表的表项为描述系统中每一个内存页的访问信息和对应的内存页的结构体指针。通过表项在Active List(热数据)及Inactive List(冷数据)中的不同位置,表征对应的内存页的冷热程度。其中,热数据,是指单位周期内同一数据被访问的次数大于预设阈值的数据。与之相对,冷数据,则是指单位周期内同一数据被访问的次数小于或等于预设阈值的数据。
LRU链表的结构如图5所示,Active List 500和Inactive List 510分别包括头部、尾部和主体三部分,当遍历任意一个NUMA节点的内存页为热页时,将该内存页的信息(如内存页的地址、标识等用于唯一标识内存页的信息)加载至Active List 500的头部501。也就是说,一旦内核扫描到包括热数据的内存页,则将该内存页的地址或标识添加至Active list的头部。与Active list 500的处理方式类似,当内核遍历到任意一个NUMA节点关联的内存页为冷页时,则在Inactive list 510的头部511中添加一个表项,该表项记录了该内存页的地址或标识添。此时,无论Active list 500还是Inactive list 510,均可以利用多个表项记录各个内存页的信息,在执行扫描请求或根据迁移方式执行迁移处理时,可以根据上述内容提供待扫描及待迁移的内存页的信息,进而基于分级内存机制进行内存页中数据迁移。
在混合内存系统中,处理器301可以根据存储数据的冷热程度将内存页的状态划分为热页、冷页和空闲页,其中,热页是指单位周期内,同一内存页的任意一个数据被访问的次数大于第一阈值的内存页。冷页则是指单位周期内,同一内存页的任意一个数据被访问的次数小于第二阈值的内存页。空闲页则是用于指示未存储数据的内存页。其中,第一阈值和第二阈值可以相同或不同,当第一阈值或第二阈值不同时,第一阈值大于第二阈值。
作为一种可能的实施例,处理器301中包括用于记录页表管理标志位(access bit)的转换监视缓冲器(translation look aside buffer,TLB),转换监视缓冲器也称为页表缓冲器,是一块位于处理器内的高速存储单元,里面存放的是一些页表文件(虚拟地址到物理地址的转换表page table)。如果“页表”存储在主存储器中,查询页表所付出的代价将会很大,而位于存储器内部的TLB则可以提高从虚拟地址到物理地址的转换效率。处理器可以确定单位周期内一个内存页是否被访问,并统计被访问的次数,通过各个内存页访问次数的分布,定义上述第一阈值和第二阈值,进而判断数据的冷热。
示例地,当access bit=1时,代表对应内存页被访问过;该access bit可以被周期性(如1s)清零,以及周期性(如40s)统计access bit置位次数,用于区分在40s周期内,页面被访问的情况,从而识别出冷数据(40s内没有被访问过),热数据(40s内被访问过n次,n大于或等于1), 从而实现内存数据的冷热识别。
以图2所示的场景为例,主机用户态230中调度器2311可以向主机内核220中虚拟机监视器221发送控制命令,例如,扫描请求和迁移请求,并根据主机内核态220反馈的数据冷热程度的信息确定数据迁移方式,进而控制迁移数据集在不同内存介质之间执行迁移操作。以扫描请求为例,调度器2311向运行的操作系统的内核发送扫描请求,内核会遍历每个NUMA节点关联的两张LRU链表,从中确定自上一次扫描至本次扫描之间被访问过的内存页,并将访问过的页面加载至Active List中,而未访问过的内存页则被加载至Inactive List中,同时更新冷热程度的统计变量,向主机用户态的调度器2311传递扫描结果,使得调度器2311可以获知每个NUMA节点中内存页的冷热程度。
可选地,由于内存介质中内存页的状态随着内存页中数据的访问次数的变化而不同,为了更准确的获知内存介质中数据分布情况,调度器4311可以周期性向处理器上运行的操作系统的内核发送扫描请求,以获得更准确的反映内存介质冷热程度的结果。其中,每个周期的时长可以相同或不同,也即每个周期所关联的时长可以动态调整,动态调整的方式包括根据历史统计信息获得满足性能最优需求的时长或用户配置。
接下来,结合图6进一步介绍本申请提供的数据处理方法,该方法由图3所示的处理器301执行,该方法包括:
S610、获取混合内存系统的数据分布。
数据分布,用于指示多级内存系统中数据在不同类型的内存介质(例如,第一内存介质和第二内存介质)中的分布情况,具体为第一内存介质或第二内存介质中内存页的冷热程度。具体获取数据分布的方法可以通过上述LRU链表获取各个内存页的冷热程度,为了简洁,在此不再赘述。
作为一种可能的实现方式,数据分布也可以根据数据的冷热程度和内存介质的物理属性确定,其中,物理属性包括时延、成本、容量和寿命中至少一种。
S620、根据数据分布确定数据迁移方式,数据迁移方式用于根据数据分布实现迁移数据集在第一内存介质和第二内存介质之间的迁移处理。
假设如图4所示的混合内存系统400,DRAM类型的内存介质归属为第一等级,SCM类型的内存介质归属为第二等级,则数据迁移方式包括以下两种情况中至少一种:
情况一、将包括SCM类型的内存介质中热页的数据集迁移至DRAM类型的内存介质。
具体地,可以先判断DRAM中空闲页的数量和SCM中热页的数量,然后再执行上述数据集待迁移操作。例如,当DRAM类型的内存介质中空闲页的数量大于SCM类型的内存介质中热页的数量时,将全部热页的数据迁移至DRAM的空闲页。可选地,也可以将热页的数据中部分数据迁移至DRAM的空闲页。
通过上述数据迁移方式,可以将SCM类型的存储介质中热页的数据集存储在DRAM类型的内存介质中,降低数据被访问时的时延。
情况二、将包括DRAM类型的内存介质中冷页中数据集迁移至SCM类型的内存介质。
具体地,可以先判断SCM中空闲页的数量和DRAM中冷页的数量,然后再执行上述数据集待迁移操作。例如,当DRAM类型的内存介质中空闲页的数量小于或等于SCM类型的内存介质中热页的数量时,将全部冷页的数据迁移至SCM的空闲页。可选地,也可以将冷页的数据中部分数据迁移至SCM的空闲页。
通过上述情况二的数据迁移方式,可以将DRAM中冷页中数据集迁移至SCM类型的存储介质,使得被访问频率低的冷数据存储在低成本的内存介质中,进而保证热数据通过时延 低的DRAM类型的内存介质存储,提升整个混合内存系统的处理效率。
作为一种可能的实现方式,对于情况二,还可以进一步将包括SCM类型的内存介质中热页的数据集迁移至DRAM类型的内存介质。
S630、根据数据迁移方式执行迁移数据集的迁移处理。
具体地,处理器603确定数据迁移方式后,将冷数据迁移至低成本的SCM类型的内存介质中,将热数据迁移至低时延的DRAM类型的内存介质中,以此实现混合内存系统中分级数据存储,结合数据属性和内存介质的分类实现数据的合理存储,既降低了整个系统的成本又保证了数据处理的效率。
具体地,当处理器603将冷数据迁移至低成本的存储级类型的内存介质时,可以先从SCM构成的内存页池中选择一个或多个空闲的内存页,例如,在SCM类型的内存介质构成的大页池中选择一个或多个空闲状态的大页内存,然后,然后将待迁移的冷数据迁移至上述选择的空闲的大页内存中。当处理器603需要将热数据迁移至第时延的DRAM时,可以先从DRAM类型的内存介质的内存池中选择一个或多个空闲的内存页,例如,在动态随机存储器的大页池中选择一个或多个空闲状态的大页内存,然后,然后将待迁移的热数据迁移至上述选择的空闲的大页内存中。
值得说明的是,上述选择空闲的内存页的数量与待迁移数据集的大小和分布有关,具体实施时,可以根据业务需求进行配置。
作为一种可能的实施例,为了适应本申请提供的内存迁移方法,处理器中还可以包括内存复制中间件模块,也可以称为memcpy中间件。内存复制中间件模块可以将根据复制的目的地址判断其归属的NUMA节点,进而确定待迁移的目的地址所在内存介质的介质类型。当需要将DRAM的数据复制至SCM中时,内存复制中间件模块会使用修改后的内存复制(memcpy)指令实现,这一实现可以最大程度地优化SCM的写入带宽。另一方向,从SCM读取数据至DRAM时,内存复制中间件模块会将数据的地址及数据长度发送给内核(Kernel)原生的memcpy函数进行处理,进而实现迁移数据集在不同内存介质之间的迁移操作。
通过上述方案的描述可知,通过支持多种不同类型的内存介质,使能SCM内存介质,实现内存类型的扩展,通过内存冷热属性标识,周期性扫描不同内存介质的内存页,进而统计内存页被访问情况,根据冷热属性对应不同性能的分级内存,从系统层面实现分级内存机制,在保证数据处理效率的同时降低内存成本。此外,对于大页内存,提供一种基于LRU链表的冷热管理的数据结构,周期性扫描内存页池中内存页中数据属性,进而确定内存页的状态,基于此实现数据的有效迁移处理。
值得说明的是,对于上述方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请所必须的。
上文中结合图1至图6,详细描述了根据本申请实施例所提供的数据处理的方法,下面将结合图7,描述根据本申请实施例所提供的数据处理的装置和混合内存系统。
图7为本申请提供的一种数据处理的装置的结构示意图,如图所示,该装置700应用于混合内存系统,所述混合内存系统包括多种不同类型的内存介质,所述多种内存介质包括第一内存介质和第二内存介质,所述装置700包括:
获取单元701,用于获取所述混合内存系统的数据分布;
策略单元702,用于根据所述数据分布确定数据迁移方式,所述数据迁移方式用于根据 所述数据分布实现迁移数据集在所述第一内存介质和所述第二内存介质之间的迁移处理;
迁移单元703,用于根据所述迁移方式执行所述迁移数据集的所述迁移处理。
应理解的是,本申请实施例的装置700可以通过中央处理单元(central processing unit,CPU)实现,也可以通过专用集成电路(application-specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。也可以通过软件实现图6所示的数据处理方法时,装置700及其各个模块也可以为软件模块。
可选地,装置700还包括分配单元704,其中,
所述获取单元701,还用于获取所述混合内存系统的内存布局,所述内存布局包括所述内存系统中部署的内存的数量和类型;
所述分配单元704,用于根据所述内存布局为第一处理器分配内存资源,其中,所述第一处理器关联最新最近使用LRU链表,所述LRU链表用于记录所述第一处理器被分配的内存资源中存储数据的冷热程度,所述LRU链表包括活跃清单Active list和不活跃清单Inactive list,所述Active list用于标识所述第一处理器关联的热数据所在内存页的信息,所述Inactive list用于标识所述第一处理器关联的冷数据所在的内存页的信息。
可选地,所述获取单元701,具体用于获取扫描请求;以及,根据所述扫描请求遍历所述分级存储系统中与所述第一处理器关联的所述第一内存介质和所述第二内存介质中内存页的所述数据分布,所述数据分布包括所述第一内存介质和所述第二内存介质中内存页的状态,所述内存页的状态包括热页、冷页或空闲页。
可选地,所述第一内存介质中包括第一大页内存池,所述第二内存介质中包括第二大页内存池,所述第一大页内存池和所述第二大页内存池分别包括至少一个第一内存页,所述至少一个第一内存页中每个第一内存页的大小大于第一阈值。
可选地,所述获取单元701,具体用于周期性获取所述扫描请求,所述扫描请求用于周期性统计所述第一内存介质和所述第二内存介质中所述第一大页内存页池和所述第二大页内存池中所述第一内存页的冷热程度。
可选地,所述策略单元702,用于在单位周期内,统计所述第一内存页中数据被读取的次数;以及,当所述第一内存页中数据被执行一次读取操作时,所述第一内存页的热度加一,所述热度用于指示所述单位周期内所述第一内存页中数据被访问的冷热程度。
可选地,所述策略单元702,具体用于确定所述混合内存系统中分级内存机制,所述分级内存机制用于指示所述混合内存系统中所述多种不同类型的内存介质的等级,所述混合内存系统包括多个等级,所述多个等级包括第一等级和第二等级;以及,根据所述数据分布和所述分级内存机制确定所述数据迁移方式。
可选地,所述第一内存介质归属为第一等级,所述第二内存介质归属为第二等级,
所述迁移单元703,具体用于当所述第一内存介质中空闲页的数量大于所述第二存储介质中热页的数量时,从所述第一内存池中选择一个或多个空闲的内存页,将包括所述第二内存介质中热页的数据的第一数据集迁移至所述第一内存池中所述一个或多个空闲的内存页。
可选地,所述第一内存介质归属为第一等级,所述第二内存介质归属为第二等级,所述迁移数据集还包括第二数据集,所述第二数据集包括至少一个冷数据,所述冷数据为单位周期内被读写的次数小于或等于第二阈值的数据,
所述迁移单元703,具体用于当所述第一内存介质中空闲页的数量小于或等于所述第二 内存介质中热页的数量时,从第二内存池中选择一个或多个空闲内存页,将包括所述第一内存介质中冷页中数据的所述第二数据集迁移至所述第二内存池中所述一个或多个空闲内存页。
可选地,所述迁移单元703,还用于将包括所述第二内存介质中热页的数据的第一数据集迁移至所述第一内存池中一个或多个空闲的内存页。
可选地,所述第一内存介质的时延低于所述第二内存介质的时延,所述第一内存介质的成本高于所述第二内存介质。
可选地,所述第一存储介质的寿命高于所述第一内存介质,所述第一存储介质的容量低于所述第二内存介质的容量。
可选地,所述第一内存介质为动态随机存取存储器DRAM,所述第二内存介质为存储级内存SCM,所述SCM包括相变存储器PCM,磁性随机存储器MRAM、电阻型随机存储器RRAM,铁电式存储器FRAM,快速NAND或纳米随机存储器NRAM中至少一种。
可选地,所述装置与所述多种不同类型的内存介质通过支持内存语义的接口相连,所述接口包括支持计算机快速链接CXL、缓存一致互联协议CCIX或统一总线UB中至少一种接口。
可选地,所述混合内存系统应用于部署大容量内存的场景,所述场景包括大数据、内存型数据库或云服务中至少一种。
根据本申请实施例的装置700可对应于执行本申请实施例中描述的方法,并且装置700中的各个单元的上述和其它操作和/或功能分别为了实现图5中的各个方法的相应流程,为了简洁,在此不再赘述。
本申请还提供一种混合内存系统,示例地,如图3所示,该混合内存系统包括处理器和多种不同类型的内存介质,为了简洁,在此不再赘述。
本申请还提供一种处理器,该处理器包括集成电路,所述集成电路与多种不同类型的内存介质相连,集成电路用于实现图6所示方法600中各个操作步骤的功能,为了简洁,在此不再赘述。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载或执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘(solid state drive,SSD)。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (35)

  1. 一种数据处理方法,其特征在于,所述方法应用于混合内存系统,所述混合内存系统包括多种不同类型的内存介质,所述多种内存介质包括第一内存介质和第二内存介质,所述方法包括:
    获取所述混合内存系统中数据在不同类型的内存介质中的数据分布;
    根据所述数据分布确定数据迁移方式,所述数据迁移方式用于根据所述数据分布实现迁移数据集在所述第一内存介质和所述第二内存介质之间的迁移处理;
    根据所述迁移方式执行所述迁移数据集的所述迁移处理。
  2. 根据权利要求1所述的方法,其特征在于,所述混合内存系统还包括处理器,
    所述处理器关联最新最近使用LRU链表,所述LRU链表用于记录所述处理器被分配的内存资源中存储数据的冷热程度,所述LRU链表包括活跃清单Active list和不活跃清单Inactive list,所述Active list用于标识所述处理器关联的热数据所在内存页的信息,所述Inactive list用于标识所述处理器关联的冷数据所在的内存页的信息。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    获取扫描请求;
    根据所述扫描请求遍历所述分级存储系统中与所述第一处理器关联的所述第一内存介质和所述第二内存介质中内存页的所述数据分布,所述数据分布包括所述第一内存介质和所述第二内存介质中内存页的状态,所述内存页的状态包括热页、冷页或空闲页。
  4. 根据权利要求3所述的方法,其特征在于,所述第一内存介质中包括第一大页内存池,所述第二内存介质中包括第二大页内存池,所述第一大页内存池和所述第二大页内存池中的内存页的大小大于第一阈值。
  5. 根据权利要求3或4所述的方法,其特征在于,所述获取扫描请求,包括:
    周期性获取所述扫描请求,所述扫描请求用于周期性统计所述第一内存介质和所述第二内存介质中的内存页的冷热程度。
  6. 根据权利要求3所述的方法,其特征在于,在获取所述扫描请求之前,所述方法还包括:
    在单位周期内,统计所述第一内存页中数据被读取的次数;
    当所述第一内存页中数据被执行一次读取操作时,所述第一内存页的热度加一,所述热度用于指示所述单位周期内所述第一内存页中数据被访问的冷热程度。
  7. 根据权利要求1至6中任一所述的方法,其特征在于,所述迁移数据集包括第一数据集,所述第一数据集包括至少一个热数据,所述热数据为单位周期内数据被读写的次数大于第一阈值的数据。
  8. 根据权利要求1至7中任一所述的方法,其特征在于,所述根据所述数据分布确定数据迁移方式,包括:
    确定所述混合内存系统中分级内存机制,所述分级内存机制用于指示所述混合内存系统中所述多种不同类型的内存介质的等级,所述混合内存系统包括多个等级,所述多个等级包括第一等级和第二等级;
    根据所述数据分布和所述混合内存机制确定所述数据迁移方式。
  9. 根据权利要求8所述的方法,其特征在于,所述第一内存介质归属为第一等级,所述第二内存介质归属为第二等级,则根据所述数据分布和所述分级内存机制确定所述数据迁移方式,包括:
    从所述第一大页内存池中选择一个或多个空闲的内存页,将包括所述第二内存介质中热页的数据的第一数据集迁移至所述第一大页内存池中所述一个或多个空闲的内存页。
  10. 根据权利要求8所述的方法,其特征在于,所述第一内存介质归属为第一等级,所述第二内存介质归属为第二等级,所述迁移数据集还包括第二数据集,所述第二数据集包括至少一个冷数据,所述冷数据为单位周期内被读写的次数小于或等于第二阈值的数据,则根据所述数据分布和所述分级内存机制确定所述数据迁移方式,包括:
    从第二大页内存池中选择一个或多个空闲内存页,将包括所述第一内存介质中冷页中数据的所述第二数据集迁移至所述第二大页内存池中所述一个或多个空闲内存页。
  11. 根据权利要求10所述的方法,其特征在于,所述方法还包括:
    将包括所述第二内存介质中热页的数据的第一数据集迁移至所述第一内存池中一个或多个空闲的内存页。
  12. 根据权利要求1至11中任一所述的方法,其特征在于,所述第一内存介质的时延低于所述第二内存介质的时延,所述第一内存介质的成本高于所述第二内存介质。
  13. 根据权利要求12所述的方法,其特征在于,所述第一存储介质的寿命高于所述第一内存介质,所述第一存储介质的容量低于所述第二内存介质的容量。
  14. 根据权利要求1至13中任一所述的方法,其特征在于,所述第一内存介质为动态随机存取存储器DRAM,所述第二内存介质为存储级内存SCM,所述SCM包括相变存储器PCM,磁性随机存储器MRAM、电阻型随机存储器RRAM,铁电式存储器FRAM,快速NAND或纳米随机存储器NRAM中至少一种。
  15. 根据权利要求1至14中任一所述的方法,其特征在于,所述处理器与所述多种不同类型的内存介质通过支持内存语义的接口相连,所述接口包括支持计算机快速链接CXL、缓存一致互联协议CCIX或统一总线UB中至少一种接口。
  16. 根据权利要求1至15中任一所述的方法,其特征在于,所述混合内存系统为服务器或服务器集群,所述服务器集群包括两个或两个以上服务器。
  17. 根据权利要求1至16中任一所述的方法,其特征在于,所述混合内存系统应用于部 署大容量内存的场景,所述场景包括大数据、内存型数据库或云服务中至少一种。
  18. 一种数据处理的装置,其特征在于,所述装置应用于混合内存系统,所述混合内存系统包括多种不同类型的内存介质,所述多种内存介质包括第一内存介质和第二内存介质,所述装置包括:
    获取单元,用于获取所述混合内存系统中数据在不同类型的内存介质中的数据分布;
    策略单元,用于根据所述数据分布确定数据迁移方式,所述数据迁移方式用于根据所述数据分布实现迁移数据集在所述第一内存介质和所述第二内存介质之间的迁移处理;
    迁移单元,用于根据所述迁移方式执行所述迁移数据集的所述迁移处理。
  19. 根据权利要求18所述的装置,其特征在于,所述装置包括处理器,所述处理器关联最新最近使用LRU链表,所述LRU链表用于记录所述处理器被分配的内存资源中存储数据的冷热程度,所述LRU链表包括活跃清单Active list和不活跃清单Inactive list,所述Active list用于标识所述处理器关联的热数据所在内存页的信息,所述Inactive list用于标识所述处理器关联的冷数据所在的内存页的信息。
  20. 根据权利要求19所述的装置,其特征在于,
    所述获取单元,具体用于获取扫描请求;以及,根据所述扫描请求遍历所述分级存储系统中与所述第一处理器关联的所述第一内存介质和所述第二内存介质中内存页的所述数据分布,所述数据分布包括所述第一内存介质和所述第二内存介质中内存页的状态,所述内存页的状态包括热页、冷页或空闲页。
  21. 根据权利要求20所述的装置,其特征在于,所述第一内存介质中包括第一大页内存池,所述第二内存介质中包括第二大页内存池,所述第一大页内存池和所述第二大页内存池中的内存页的大小大于第一阈值。
  22. 根据权利要求20或21所述的装置,其特征在于,
    所述获取单元,具体用于周期性获取所述扫描请求,所述扫描请求用于周期性统计所述第一内存介质和所述第二内存介质中的内存页的冷热程度。
  23. 根据权利要求20所述的装置,其特征在于,
    所述策略单元,用于在单位周期内,统计所述第一内存页中数据被读取的次数;以及,当所述第一内存页中数据被执行一次读取操作时,所述第一内存页的热度加一,所述热度用于指示所述单位周期内所述第一内存页中数据被访问的冷热程度。
  24. 根据权利要求18至23中任一所述的装置,其特征在于,
    所述策略单元,具体用于确定所述混合内存系统中分级内存机制,所述分级内存机制用于指示所述混合内存系统中所述多种不同类型的内存介质的等级,所述混合内存系统包括多个等级,所述多个等级包括第一等级和第二等级;以及,根据所述数据分布和所述分级内存机制确定所述数据迁移方式。
  25. 根据权利要求24所述的装置,其特征在于,所述第一内存介质归属为第一等级,所述第二内存介质归属为第二等级,
    所述迁移单元,具体用于从所述第一大页内存池中选择一个或多个空闲的内存页,将包括所述第二内存介质中热页的数据的第一数据集迁移至所述第一大页内存池中所述一个或多个空闲的内存页。
  26. 根据权利要求24所述的装置,其特征在于,所述第一内存介质归属为第一等级,所述第二内存介质归属为第二等级,所述迁移数据集还包括第二数据集,所述第二数据集包括至少一个冷数据,所述冷数据为单位周期内被读写的次数小于或等于第二阈值的数据,
    所述迁移单元,具体用于从第二大页内存池中选择一个或多个空闲内存页,将包括所述第一内存介质中冷页中数据的所述第二数据集迁移至所述第二大页内存池中所述一个或多个空闲内存页。
  27. 根据权利要求26所述的装置,其特征在于,
    所述迁移单元,还用于将包括所述第二内存介质中热页的数据的第一数据集迁移至所述第一内存池中一个或多个空闲的内存页。
  28. 根据权利要求18至27中任一项所述的装置,其特征在于,所述第一内存介质的时延低于所述第二内存介质的时延,所述第一内存介质的成本高于所述第二内存介质。
  29. 根据权利要求28所述的装置,其特征在于,所述第一存储介质的寿命高于所述第一内存介质,所述第一存储介质的容量低于所述第二内存介质的容量。
  30. 根据权利要求18至29中任一所述的方法,其特征在于,所述第一内存介质为动态随机存取存储器DRAM,所述第二内存介质为存储级内存SCM,所述SCM包括相变存储器PCM,磁性随机存储器MRAM、电阻型随机存储器RRAM,铁电式存储器FRAM,快速NAND或纳米随机存储器NRAM中至少一种。
  31. 根据权利要求18至30中任一所述的方法,其特征在于,所述装置与所述多种不同类型的内存介质通过支持内存语义的接口相连,所述接口包括支持计算机快速链接CXL、缓存一致互联协议CCIX或统一总线UB中至少一种接口。
  32. 根据权利要求18至31中任一所述的方法,其特征在于,所述混合内存系统应用于部署大容量内存的场景,所述场景包括大数据、内存型数据库或云服务中至少一种。
  33. 一种处理器,其特征在于,所述处理器包括集成电路,所述集成电路与多种不同类型的内存介质相连,所述集成电路用于实现上述方法权要1至17中任一项所述方法的操作步骤。
  34. 一种混合内存系统,其特征在于,所述混合内存系统包括处理器、多种不同类型的内存介质,所述多种内存介质包括第一内存介质和第二内存介质,所述第一内存介质或所述 第二内存介质用于存储计算机执行指令,所述处理器执行所述计算机执行指令以实现上述权利要求1至17中任一项所述方法的操作步骤。
  35. 一种计算机可读存储介质,所述计算机可读存储介质中包括指令,当其在计算机上运行时,使得计算机执行权利要求1至17中任一所述的方法。
PCT/CN2022/122693 2021-09-30 2022-09-29 数据处理的方法、装置、处理器和混合内存系统 WO2023051715A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22875096.4A EP4390648A1 (en) 2021-09-30 2022-09-29 Data processing method and apparatus, processor, and hybrid memory system
US18/611,664 US20240231669A1 (en) 2021-09-30 2024-03-20 Data processing method and apparatus, processor, and hybrid memory system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111160263.5 2021-09-30
CN202111160263.5A CN115904212A (zh) 2021-09-30 2021-09-30 数据处理的方法、装置、处理器和混合内存系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/611,664 Continuation US20240231669A1 (en) 2021-09-30 2024-03-20 Data processing method and apparatus, processor, and hybrid memory system

Publications (1)

Publication Number Publication Date
WO2023051715A1 true WO2023051715A1 (zh) 2023-04-06

Family

ID=85750345

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/122693 WO2023051715A1 (zh) 2021-09-30 2022-09-29 数据处理的方法、装置、处理器和混合内存系统

Country Status (4)

Country Link
US (1) US20240231669A1 (zh)
EP (1) EP4390648A1 (zh)
CN (1) CN115904212A (zh)
WO (1) WO2023051715A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117149915A (zh) * 2023-10-31 2023-12-01 湖南三湘银行股份有限公司 用于云端数据库迁移到开源数据库的方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116483740B (zh) * 2023-06-21 2023-09-05 苏州浪潮智能科技有限公司 内存数据的迁移方法、装置、存储介质及电子装置
CN116700935B (zh) * 2023-08-04 2023-11-03 苏州浪潮智能科技有限公司 一种内存数据迁移方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104699424A (zh) * 2015-03-26 2015-06-10 华中科技大学 一种基于页面热度的异构内存管理方法
CN106227598A (zh) * 2016-07-20 2016-12-14 浪潮电子信息产业股份有限公司 一种缓存资源的回收方法
CN107193646A (zh) * 2017-05-24 2017-09-22 中国人民解放军理工大学 一种基于混合主存架构的高效动态页面调度方法
CN108804350A (zh) * 2017-04-27 2018-11-13 华为技术有限公司 一种内存访问方法及计算机系统
CN110532200A (zh) * 2019-08-26 2019-12-03 北京大学深圳研究生院 一种基于混合内存架构的内存系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104699424A (zh) * 2015-03-26 2015-06-10 华中科技大学 一种基于页面热度的异构内存管理方法
CN106227598A (zh) * 2016-07-20 2016-12-14 浪潮电子信息产业股份有限公司 一种缓存资源的回收方法
CN108804350A (zh) * 2017-04-27 2018-11-13 华为技术有限公司 一种内存访问方法及计算机系统
CN107193646A (zh) * 2017-05-24 2017-09-22 中国人民解放军理工大学 一种基于混合主存架构的高效动态页面调度方法
CN110532200A (zh) * 2019-08-26 2019-12-03 北京大学深圳研究生院 一种基于混合内存架构的内存系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117149915A (zh) * 2023-10-31 2023-12-01 湖南三湘银行股份有限公司 用于云端数据库迁移到开源数据库的方法
CN117149915B (zh) * 2023-10-31 2024-03-29 湖南三湘银行股份有限公司 用于云端数据库迁移到开源数据库的方法

Also Published As

Publication number Publication date
CN115904212A (zh) 2023-04-04
US20240231669A1 (en) 2024-07-11
EP4390648A1 (en) 2024-06-26

Similar Documents

Publication Publication Date Title
US11966581B2 (en) Data management scheme in virtualized hyperscale environments
WO2023051715A1 (zh) 数据处理的方法、装置、处理器和混合内存系统
WO2021004231A1 (zh) 一种闪存设备中的数据存储方法及闪存设备
US9852069B2 (en) RAM disk using non-volatile random access memory
TWI781439B (zh) 映射未經分類之記憶體存取至經分類之記憶體存取
US12066951B2 (en) Page table hooks to memory types
US11663133B2 (en) Memory tiering using PCIe connected far memory
WO2023227004A1 (zh) 内存访问热度统计方法、相关装置及设备
WO2023051000A1 (zh) 内存管理方法、装置、处理器及计算设备
CN116342365A (zh) 用于经由使用可用设备存储器扩展系统存储器的技术
US20240078187A1 (en) Per-process re-configurable caches
CN110597742A (zh) 用于具有持久系统存储器的计算机系统的改进存储模型
US20240231654A1 (en) Method and Apparatus for Controlling Internal Memory Bandwidth, Processor, and Computing Device
WO2014051544A2 (en) Improved performance and energy efficiency while using large pages
US20230315293A1 (en) Data management scheme in virtualized hyperscale environments
WO2023241655A1 (zh) 数据处理方法、装置、电子设备以及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22875096

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022875096

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022875096

Country of ref document: EP

Effective date: 20240318

NENP Non-entry into the national phase

Ref country code: DE